[LUM#13] In the coronavirus family, I’d like to ask…
How do we know that SARS-CoV-2 descends from the same “ancestral” coronavirus as that of the bats? Thanks to phylogenetics. A science capable of tracing—and sometimes predicting—the different stages of a virus’s evolution. An explanation from researcher Stéphane Guindon, creator of the PhyML software.

With the emergence of COVID-19, the bioinformatics software PhyML—used to track more traditional viral epidemics such as seasonal flu—is running at full capacity. “The software is constantly evolving; we’re implementing algorithms to speed things up, process more and more datasets, and develop new features…” explains Stéphane Guindon, a researcher at the Montpellier Laboratory of Computer Science, Robotics, and Microelectronics* and the software’s designer.
Trace the origin
Software capable of tracing the genealogy, or phylogeny, of any organism with DNA. “A phylogenetic tree is somewhat equivalent to a family tree, which allows us to trace the kinship ties between individuals from the same family. This ‘genealogy’ is reconstructed here by comparing the genomes of different species.” A virus’s genome consists of more than 30,000 “chemical building blocks,” known as nucleotides, which must be analyzed to reconstruct the virus’s path and trace what is known as a “chain of transmission.”
RNA sequences—a modified form of DNA—from SARS-CoV-2 were compared by the software to all those contained in existing databases, revealing a strong link to a virus found in bats and pangolins. Analysis of these sequences also determined that the current outbreak originated from a single zoonotic event—in other words, a single instance of animal-to-human transmission. “If there had been multiple events, the virus tree would not have the same shape; it would have multiple sub-trees, each corresponding to a transmission of a viral strain from animal to human.”
Monitoring and predicting the spread of the virus
Phylogenetics has helped answer other important questions about the epidemic, particularly by analyzing the link between viral sequence diversity and the number of viruses in circulation. “As early as the beginning of March, phylogenetic analyses measuring genetic diversity indicated that the virus’s population size was doubling every 5 to 7 days,” the researcher explains. Phylogenetic analyses using PhyML could also help better predict the long-term evolution of SARS-CoV-2. For example, every year, phylogenetics helps determine which strains of the influenza virus are most likely to circulate the following winter, thereby contributing to the development of effective vaccines.
350,000 computing hours per year
Developed in 2003 by Stéphane Guindon, PhyML was among the first free software programs available via the bioinformatics platform hosted by Lirmm. “We make our servers available to research laboratories because these phylogenetic analyses are very computationally intensive. Some can take hours, days, or even weeks. ” Each year, the platform performs approximately 350,000 hours of computation for laboratories located in France, but also and especially throughout Europe, China, and the United States…
UM podcasts are now available on your favorite platform (Spotify, Deezer, Apple Podcasts, Amazon Music, etc.).
*Lirmm (UM – CNRS)