Machine learning for metagenomics annotations
Title: Machine learning for metagenomics annotations
SNIC Project: SNIC 2021/22-446
Project Type: SNIC Small Compute
Principal Investigator: Dag Ahren <>
Affiliation: Lunds universitet
Duration: 2021-06-07 – 2022-01-01
Classification: 10203


Metagenomic analyses commonly start with DNA extraction from samples and subsequent sequencing. Afterwards, the raw metagenomic reads are filtered based on their quality and assembled to obtain contigs. Using these contigs, genes are predicted and annotated. Common applied approaches for gene annotation are based on sequence similarity measures when comparing against reference databases (Tamames et al., 2019). In recent years machine learning approaches, such as deep learning, have seen a large increase in their usage for many bioinformatic analysis pipelines (Li et al., 2019). With the development of deep learning models establishing itself as a promising approach to bioinformatic tasks, its suitability for gene annotation in metagenomics as an alternative to sequence similarity-based approaches is of great interest. Therefore, the study at hand is focussing on developing a deep learning approach to identify and classify genes associated with methanogenic pathways in soil microbes based on metagenomic gene prediction data. The genes will be classified according to the type of methanogenic pathway they belong to by using labels of the KEGG database.