[LUM#13] In the coronavirus family, I'm looking for...
How do we know that SARS-CoV-2 descends from the same coronavirus "ancestor" as the bat coronavirus? ? Thanks to phylogenetics. This science can trace, and sometimes predict, the different stages in a virus's evolution. Researcher Stéphane Guindon, creator of the PhyML software, explains.

With the emergence of Covid-19, the PhyML bioinformatics software, used to track more traditional viral epidemics such as seasonal flu, is running at full capacity. "The software is constantly evolving. We are implementing algorithms to speed up processing, handle more and more data sets, and develop new features," explains Stéphane Guindon, a researcher at the Montpellier Computer Science, Robotics, and Microelectronics Laboratory* and the software's designer.
Trace the origin
Software capable of tracing the genealogy, or phylogeny, of any organism with DNA. "A phylogenetic tree is somewhat equivalent to a family tree, which allows us to trace the family ties between individuals from the same family. This 'genealogy' is reconstructed here by comparing the genomes of different species." The genome of a virus consists of more than 30,000 "chemical building blocks, " or nucleotides, which must be analyzed in order to reconstruct the virus's route and trace what is known as a "chain of transmission."
RNA sequences—modified DNA—from SARS-CoV-2 were compared by the software with all those contained in existing databases, establishing a strong link with a virus present in bats and pangolins. Analysis of these sequences also determined that the current epidemic originated from a single zoonotic event, in other words, a single case of animal-to-human transmission. "If there had been several events, the virus tree would not have the same shape; it would have several sub-trees, each corresponding to a transfer of a viral strain from animal to human."
Monitoring and predicting the evolution of the virus
Phylogenetics has helped answer other important questions about the epidemic, notably by analyzing the link between viral sequence diversity and the number of viruses in circulation. "From the beginning of March, phylogenetic analyses measuring genetic diversity showed us that the virus population size was doubling every 5 to 7 days," says the researcher. Phylogenetic analyses using PhyML could also help to better predict the long-term evolution of SARS-CoV-2. For example, every year phylogenetics helps to determine which strains of the influenza virus are most likely to develop the following winter, thereby contributing to the design of effective vaccines.
350,000 computing hours per year
Developed in 2003 by Stéphane Guindon, PhyML was one of the first free software programs available via the bioinformatics platform hosted by Lirmm. "We make our servers available to research laboratories because these phylogenetic analyses are very computationally intensive. Some can take hours, days, or even weeks. " Each year, the platform performs approximately 350,000 hours of computing for laboratories located in France, but also and above all throughout Europe, China, and the United States.
Find UM podcasts now available on your favorite platform (Spotify, Deezer, Apple Podcasts, Amazon Music, etc.).
*Lirmm (UM – CNRS)