Insect tree of life from metabarcoding data: exploring methods for building very large phylogenies
Title: Insect tree of life from metabarcoding data: exploring methods for building very large phylogenies
DNr: NAISS 2024/5-624
Project Type: NAISS Medium Compute
Principal Investigator: Nicolas Chazot <nicolas.chazot@slu.se>
Affiliation: Sveriges lantbruksuniversitet
Duration: 2024-11-28 – 2025-12-01
Classification: 10611 10612
Keywords:

Abstract

DNA metabarcoding has enabled large-scale biodiversity studies of groups containing considerable numbers of unknown taxa, such as insects. However, a key challenge still lies in taxonomically characterizing samples from exceptionally diverse and poorly known regions, such as the tropics. Our previous results suggests that phylogenetic placement methods, where query sequences are classified based on their placement in a reference tree, outperform other classification methods when the reference database has a poor taxonomic cover. Resolving the evolutionary relationships between taxa will also allow fine-grained ecological analyses without the need for species-level taxonomic classification, thus greatly enhancing the possibilities of understanding community assembly processes in highly diverse regions. We are working with insect samples from Malaise traps deployed across the world and over multiple years, generating an unusually large and taxonomically complex dataset. To characterise these samples with phylogenetic placement methods, we need to build a fine-grained reference tree spanning across all Insecta, with up towards one million sequences. Beyond phylogenetic placement, which only resolves the evolutionary relationship between single query sequences and the reference taxa, we want to build upon this method to resolve the relationships also between query sequences. In theory, a large supertree can be generated by re-estimating parts of the reference tree, including all query sequences that were placed in that region of the tree. A range of methods could be suitable for inferring the subtrees, and in this project, we will test both Maximum Likelihood and Bayesian methods, comparing the topology and time calibration of the tree. Finally, we aim to develop a user-friendly pipeline for iteratively growing the phylogenetic tree by repeating phylogenetic placement of new query sequences, re-inferring parts of the tree where the query sequences placed, “glueing” the subtrees into the main tree, and re-estimating branch lengths to time-calibrate the tree. Such a pipeline will allow us to build phylogenetic trees beyond the restrictions of single existing tree inference methods.