In the coronavirus family, I ask...

How do we know that SARS-CoV-2 is descended from the same coronavirus "ancestor" as that of the bat? ancestor? Thanks to phylogeny. A science capable of tracing, and sometimes predicting, the different stages in a virus's evolution. Explanation with researcher Stéphane Guindon, creator of the PhyML software.

With the emergence of Covid-19, the PhyML bioinformatics software, used to monitor more traditional viral epidemics such as seasonal flu, is running at full speed. " The software is constantly evolving, and we are implementing algorithms to go faster, process more and more data sets and develop new functionalities..." explains Stéphane Guindon, researcher at the Montpellier Laboratory of Computer Science, Robotics and Microelectronics* and designer of the software.

Tracing the origin

Software capable of tracing the genealogy, or phylogeny, of any organism with DNA. "A phylogenetic tree is the equivalent of a family tree, which allows us to trace the relationships between individuals from the same family. This "genealogy" is reconstructed here by comparing the genomes of different species." The genome of a virus is made up of over 30,000 "chemical building blocks" - nucleotides - which need to be analyzed in order to reconstruct the itinerary of the virus and trace what is known as a "chain of transmission".

RNA sequences - modified DNA - of SARS-CoV-2 were compared by the software with all those contained in existing databases, establishing a strong link of proximity with a virus present in bats and pangolins. The analysis of these sequences also enabled us to determine that the current epidemic began with a single zoonotic event, i.e. a single case of animal-to-human transfer. "If there had been several events, the virus tree would not have the same form, but would present several sub-trees, each corresponding to a transfer of a viral strain from animal to human.

Tracking and forecasting virus evolution

Phylogeny has enabled us to answer other important questions about the epidemic, notably by analyzing the link between viral sequence diversity and the number of viruses in circulation. " As early as the beginning of March, phylogenetic analyses measuring genetic diversity told us that the virus population size was doubling every 5 to 7 days", explains the researcher. Phylogenetic analyses by PhyML could also help us to better predict the evolution of SARS-CoV-2 over the long term. For example, each year, phylogeny helps determine which strains of influenza virus are most likely to develop the following winter, thus contributing to the design of effective vaccines.

350,000 computing hours per year

Developed in 2003 by Stéphane Guindon, PhyML was one of the first free software packages to be made available via the bioinformatics platform hosted by Lirmm. "We make our servers available to research laboratories because these phylogenetic analyses are very time-consuming. Some can take hours, days or even weeks to complete. Every year, the platform carries out some 350,000 hours of computation for laboratories in France, but also and above all throughout Europe, China and the United States...

*Lirmm (UM - CNRS)