[LUM#22] No AI without humans

Computers capable of learning are no longer the stuff of science fiction; they are now a reality thanks to machine learning. And in order to learn, machines need humans to provide them with colossal amounts of data. To ensure the quality of this data, statistician Joseph Salmon is banking on cooperation.

Pl@ntNet has 25 million users who provide photo data and return the names of plants. © IRD – PlantNet

How do autonomous cars read road signs? How can a smartphone recognize the song of a nightingale or the leaf of an oak tree? All of this is now possible and even commonplace thanks to machine learning. "It's the very basis of artificial intelligence, " explains Joseph Salmon, a statistics researcher at the Alexander Grothendieck Institute in Montpellier.1.

To be able to differentiate between a nightingale and a chickadee, or an oak tree and a poplar tree, the computer had to "look" at many tree leaves or "listen" to many bird songs. These images and sounds are data, the lifeblood of AI. "Machine learning requires large amounts of data to function properly, especially in classification tasks."

And the quality of this information, known as input data, is crucial to the success of the tasks to be performed. "Garbage in, garbage out, " sums up the specialist. Garbage in, garbage out. In other words, feeding the computer faulty or nonsensical data will result in equally nonsensical responses. To illustrate this computer scientist's aphorism, Joseph Salmon cites a well-known example, that of Barack Obama, who was reportedly recognized by AI as a monkey. The reason: the database used by the recognition algorithms contains mainly faces of white people.

No AI without cooperation

To ensure he has access to large quantities of high-quality data, Joseph Salmon relies on cooperation. In 2019, he was awarded a research and teaching chair in artificial intelligence funded by the ANR. With Camelot, as it is called, the researcher and his colleagues hope to tackle the challenges of identifying biodiversity through participatory science and crowd-sourcing, in other words, calling on the general public to provide data.

This strategy has already been used to develop the Pl@ntNet recognition app, a citizen science project designed to automatically identify plants from photos, in which Joseph Salmon participated. "Pl@ntNet has 25 million users who provide photo data and return the names of plants. Crowdsourcing makes it possible to synthesize and use all this knowledge, " says the statistician, for whom "there is no AI without cooperation." In fact, for the researcher, artificial intelligence is ultimately not artificial at all. "We take all the energy of human beings that we have synchronized, and AI simply links it together as a collective effort."

Neural network

In fact, learning to learn is nothing more than what the human brain does. To give machines this ability, mathematicians and statisticians working on these issues draw inspiration from how the brain works in order to mimic its learning processes. "The mathematical tool used is called a neural network. Dozens or even hundreds of layers of neurons can be combined, each receiving and interpreting information from the previous layer. This is known as deep learning," explains Joseph Salmon. Mathematics is used here to create an algorithm designed to minimize error. "The neural network is a mathematical function that starts with an image at the input and must give a name at the output, which requires programming gigantic functions, " explains Joseph Salmon.

Character recognition

In the late 1990s, the first major application of deep learning was automated bank check recognition. The principle is simple: the computer must automatically recognize the amount, written by hand in numbers and letters, on each check. The work involved is complex, because handwritten numbers are never exactly the same. "To enable a computer to recognize a handwritten number, it had to be provided with a large set of annotated images. In this learning game, the computer must give the correct answer while minimizing errors, " explains the statistician.

During learning, the algorithm aims to reduce the gap between the results obtained and the expected results in order to refine its recognition. This process is similar to that of the brain, which is capable of analyzing a multitude of pieces of information that are imprecise in themselves and interpreting their combination to ultimately recognize a 2 as a 2 with certainty. Or an oak leaf as an oak leaf.

Find UM podcasts now available on your favorite platform (Spotify, Deezer, Apple Podcasts, Amazon Music, etc.).

  1. Imag (UM, CNRS, Inria)
    ↩︎