PhyML, a Montpellier-based software program designed to track the COVID-19 pandemic

The bioinformatics software PhyML, developed over the past fifteen years at the Montpellier Laboratory of Computer Science, Robotics, and Microelectronics (Lirmm) and made available to the international scientific community, is now an important tool for tracing the origins of the COVID-19 pandemic and monitoring the virus’s evolution.

Recent phylogeny of the SARS-CoV-2 virus

Stéphane Guindon, CNRS researcher at Lirmm

With the emergence of SARS-CoV-2, the virus responsible for COVID-19, the PhyML software—which was already being used to track more common viral outbreaks such as seasonal flu—is now running at full capacity. “We’reworking hard; the software is constantly evolving. We’re implementing algorithms to speed things up and process more and more datasets, as well as developing new features…” explains Stéphane Guindon, a CNRS researcher at the Montpellier Laboratory of Computer Science, Robotics, and Microelectronics (Lirmm).

Developed during his doctoral thesis in 2003 by this bioinformatician and his advisor, Olivier Gascuel, PhyML was among the first free software programs available via the bioinformatics platform hosted by Lirmm. “Wemake our servers available to research laboratories because these phylogenetic analyses are very computationally intensive. Some can take hours, days, or even weeks at times.” Each year, the ATGC platform thus performs approximately 350,000 hours of computation for laboratories located in France, but also and especially throughout Europe, in China, and in the United States…

Trace a transmission chain

But what is PhyML used for, and why is it so useful during an epidemic? Because this bioinformatics software is capable of tracing the genealogy—or phylogeny—of any group of organisms with DNA. “Aphylogenetic tree is somewhat equivalent to a family tree, which allows us to trace the kinship ties among individuals of the same family. This ‘genealogy’ is reconstructed here by comparing the genomes of different species. In the case of a virus, we analyze the ~30,000 nucleotides that make up its genome in order to trace a chain of transmission.”

The software compared DNA sequences of SARS-CoV-2 with all those contained in existing databases, revealing a strong link to a virus found in bats and pangolins. “It isa virus that circulates among bats but does not cause mortality in this species, unlike what we are currently observing in humans, notes Stéphane Guindon. Analysis of these sequences also determined that the current epidemic originated from a single zoonotic event—in other words, a single instance of animal-to-human transmission.If there had been multiple events, the virus tree would not have the same shape; it would have several sub-trees, each corresponding to a transmission of a viral strain from animal to human.”

Monitoring and predicting the spread of the virus

Phylogenetics has helped answer other important questions about the epidemic. “As early as the beginning of March, the first phylogenetic analyses indicated that the virus’s population size was doubling every 5 to 7 days,” the researcher explains. Phylogenetic analyses using PhyML could also help better predict the long-term evolution of SARS-CoV-2. For example, every year, phylogenetics helps determine which strains of the influenza virus are most likely to circulate the following winter, thereby contributing to the development of effective vaccines.