||SNIC Small Compute|
||Joel Gustafsson <email@example.com>|
||2021-01-25 – 2022-02-01|
The analysis of full-length genomes is a daunting task. Most methods for comparing genomes rely on the alignment of specific regions or genes. This, however, will only provide information about that specific region, and not an overview of the entire genome. Our goal is to create a method which gives a more complete picture of the similarity between genomes. Specifically, we train statistical models on the genomes, which are easier to compare than sequences. Utilizing the models, we compute pairwise distances. The pairwise distances give a distance matrix upon which we can perform various machine-learning techniques. Our preliminary results illustrate that by using this approach, we can detect evolutionary relationships between genomes. Moreover, the models are sensitive enough to recognise both the phylogeny of the organisms as well as genomic similarities based on horizontal gene transfer and symbiosis. These results illustrate that our method can be used to classify novel organisms as well as be used for classification of metagenomic data. Thus, our approach can be applied to medical as well as environmental monitoring, ensuring that appropriate responses are taken to handle novel as well as known pathogens.