PhyloInfer - AI-Driven Phylogenetic Tree Reconstruction from Raw Virus Sequencing Data

ZKI-PH_PhD2025_05 (ZKI-PH2 & MF1)

Date:  06/03/2025

Background:

Current phylogenetic tree reconstruction methods typically rely on multiple sequence alignment (MSA) as a crucial pre-processing step. While effective, MSA can be time-consuming and may introduce biases, especially for diverse viral sequences. Recent advances in machine learning, particularly in processing sequence data, have opened new possibilities for direct phylogenetic inference from raw sequences. However, these approaches often lack integration with established phylogenetic methods, do not fully leverage the wealth of knowledge in scientific literature, and many still rely on alignment-based methods, which could be suboptimal for highly-diverse sequences. Rapid and accurate phylogenetic analysis is crucial for tracking viral evolution, understanding transmission patterns, and informing public health interventions during outbreaks. A tool that can quickly generate reliable phylogenetic trees from raw sequencing data would significantly enhance our ability to respond to emerging viral threats, guide vaccine development, and support evidence-based public health policies.

Aim/s:

The aim of the project is the development of an AI-driven tool that accurately infers phylogenetic trees directly from raw sequencing data of clinical or metagenomic virus samples, bypassing the MSA, while making informed decisions about tree reconstruction methods based on state-of-the-art scientific literature.

AI methods:

In order to develop an efficient AI-driven alignment-free phylogenetic tree reconstruction tool, robust algorithms for quality control and feature extraction from raw sequencing data will be deployed, ensuring that the input to the AI model is of high quality. A deep learning model will be designed and trained, likely based on transformer architecture or graph neural networks, capable of processing raw sequence data to infer phylogenetic relationships directly.

Keywords:

Phylogenetic analysis, Public Health, virus transmission, deep learning