[LUM#22] No AI Without Humans

Computers capable of learning are no longer the stuff of science fiction—they’re all about machine learning. And to learn, machines need humans to provide them with massive amounts of data. To ensure the quality of that data, statistician Joseph Salmon is banking on cooperation.

Pl@ntNet has 25 million users who provide photo data and identify plant species. © IRD – PlantNet

How do self-driving cars know how to read road signs? How can a smartphone recognize a nightingale’s song or an oak leaf? If all of this is now possible—and even commonplace—it’s thanks to machine learning. “It’s the very foundation of artificial intelligence, explains Joseph Salmon, a statistics researcher at the Alexander Grothendieck Institute in Montpellier1.

To be able to tell a nightingale from a chickadee or an oak from a poplar, the computer had to “look at” a lot of tree leaves or “listen to” a lot of bird songs. These images and sounds are data—the lifeblood of AI. “Machine learning requires large amounts of data to function properly, especially for classification tasks.”

And the quality of this information—known as input data—is crucial to the success of the tasks at hand. “Garbage in, garbage out, the expert sums it up. Garbage in, garbage out. In other words, feeding the computer flawed or nonsensical data will result in equally nonsensical responses from it. To illustrate this computer science adage, Joseph Salmon cites a well-known example: Barack Obama was reportedly identified by an AI as a monkey. The reason: the database used by the recognition algorithms contains mostly faces of white people.

No AI without cooperation

So, to ensure he has access to large amounts of high-quality data, Joseph Salmon relies on collaboration. In 2019, he was awarded a research and teaching chair in artificial intelligence funded by the ANR. Through Camelot, as the project is called, the researcher and his colleagues aim to tackle the challenges of identifying biodiversity through citizen science and crowdsourcing—in other words, by engaging the general public to collect data.

This strategy has already been used to develop the Pl@ntNet plant identification app, a citizen science project designed to automatically identify plants using photos, in which Joseph Salmon participated. “Pl@ntNet has 25 million users who provide photo data and submit plant names. Crowdsourcing allows us to synthesize and utilize all this knowledge,” emphasizes the statistician, for whom “there is no AI without cooperation.” Moreover, for the researcher, artificial intelligence is ultimately nothing artificial. “We take all the energy of the humans we’ve synchronized; AI simply serves as the link in a collective effort.”

Neural network

In fact, learning how to learn is exactly what the human brain does. To give machines this ability, mathematicians and statisticians working on these issues draw inspiration from how the brain functions in order to mimic its learning processes. “The mathematical tool used is called a neural network. We can combine dozens or even hundreds of layers of neurons, each receiving and interpreting information from the previous layer. This is known as deep learning,” explains Joseph Salmon. Mathematics allows us to create an algorithm designed to minimize error. “The neural network is a mathematical function that takes an image as input and must produce a name as output, which requires programming massive functions, explains Joseph Salmon.

Character recognition

In the late 1990s, the first major application of deep learning was the automated recognition of bank checks. The principle is simple: the computer must automatically recognize the amount—written by hand in both numerals and words—on each check. The task itself is complex, however, because handwritten numbers are never exactly identical. “To enable a computer to recognize a handwritten digit, we had to provide it with a large set of annotated images. In this learning process, the computer must give the correct answer while minimizing errors, explains the statistician.

During training, the algorithm aims to reduce the gap between the results obtained and the expected results in order to refine its recognition. This process is similar to that of the brain, which is capable of analyzing a multitude of pieces of information that are imprecise on their own and interpreting their combination to ultimately recognize a 2 as a 2 with certainty. Or an oak leaf as an oak leaf.

UM podcasts are now available on your favorite platform (Spotify, Deezer, Apple Podcasts, Amazon Music, etc.).

  1. Imag (UM, CNRS, Inria)
    ↩︎