PhyML, a software package from Montpellier to trace the Covid-19 epidemic

The PhyML bioinformatics software, developed over the last fifteen years at the Montpellier Laboratory of Computer Science, Robotics and Microelectronics (Lirmm) and made available to the international scientific community, is today an important tool for tracing the origins of the Covid-19 pandemic and tracking the evolution of the virus.

Recent phylogeny of the SARS-CoV-2 virus

Stéphane Guindon, CNRS researcher at Lirmm

With the arrival of SARS-CoV-2, the virus responsible for Covid-19, the PhyML software, already used to track more traditional viral epidemics such as seasonal flu, is now running at full speed. "We ' re working hard, the software is constantly evolving, we're implementing algorithms to go faster and process more and more data sets, develop new functionalities..." explains Stéphane Guindon, CNRS researcher at the Montpellier Laboratory of Computer Science, Robotics and Microelectronics (Lirmm).

Developed during his thesis in 2003 by this bioinformatician and his thesis supervisor, Olivier Gascuel, PhyML was one of the first free software packages to be made available via the bioinformatics platform hosted by Lirmm. "We make our servers available to research laboratories, because these phylogenetic analyses require a lot of computing time. Some can take hours, days or even weeks to complete. Every year, the ATGC platform carries out some 350,000 hours of computation for laboratories in France, but also and above all throughout Europe, China, the United States...

Tracing a transmission chain

But what is PhyML used for, and why is it so useful in times of epidemics? Because this bioinformatics software is capable of tracing the genealogy, or phylogeny, of any set of organisms with DNA. "A phylogenetic tree is the equivalent of a family tree, which enables us to trace the relationships between individuals in the same family. This "genealogy" is reconstructed here by comparing the genomes of different species. In the case of a virus, we analyze the ~30,000 nucleotides that make up its genome in order to trace a chain of transmission.

DNA sequences of SARS-CoV-2 were compared by the software with all those contained in existing databases, establishing a strong link of proximity with a virus present in bats and pangolins. "It ' s a virus that circulates in bats but doesn't cause mortality in this species, unlike what we're seeing in humans at the moment," notes Stéphane Guindon. Analysis of these sequences has also shown that the current epidemic began with a single zoonotic event, in other words, a single case of animal-to-human transfer. "If there had been several events, the virus tree would not have the same shape, but would present several sub-trees, each corresponding to a transfer of a viral strain from animal to human.

Tracking and forecasting virus evolution

Phylogeny has answered other important questions about the epidemic. " As early as the beginning of March, the first phylogenetic analyses indicated that the population size of the virus was doubling every 5 to 7 days", explains the researcher. Phylogenetic analyses by PhyML could also help to better predict the evolution of SARS-CoV-2 over the long term. For example, each year, phylogeny helps determine which strains of influenza virus are most likely to develop the following winter, thus contributing to the design of effective vaccines.