Large-Scale Online Bayesian Phylogenetics
||Large-Scale Online Bayesian Phylogenetics|
||SNIC Small Compute|
||Jakub Truszkowski <firstname.lastname@example.org>|
||2022-11-26 – 2023-12-01|
Bayesian approaches to phylogenetic inference have gained considerable popularity in recent years due to their flexibility and the ability to quantify the uncertainty of inferences. Unfortunately, commonly-used MCMC algorithms are often slow to converge, which means that current methods cannot be reliably applied to data sets beyond a few thousand taxa. Moreover, current methods assume that all sequences of interest are known in advance, forcing practitioners to re-run computationally costly analyses every time new sequences become available.
To address these problems, we will develop scalable _online_ methods that continuously update the posterior distribution on phylogenies as new sequences become available. Our methodology will rely on Sequential Monte Carlo (SMC) algorithms, which maintain a sample from the posterior distribution and use reweighting and resampling to account for new data as it arrives. We will leverage recent developments in scalable phylogenetic inference and the inherently parallelizable structure of SMC methods to develop algorithms capable of processing data sets consisting of tens of thousands of taxa and hundreds of genes. To facilitate rapid development and efficient code synthesis, we will implement our methods in a Probabilistic Programming (PP) framework that is currently being developed by one of the co-PIs(DB). The algorithms developed in this project will form the basis of a system that will build and maintain comprehensive phylogenies of all plant species. The proposed project will have considerable impact on plant systematics and might have impact on other fields where Bayesian phylogenetics is used, such as infectious disease epidemiology and phylogeography.