Parallel Construction of Variable-Length Markov Chains
Title: Parallel Construction of Variable-Length Markov Chains
DNr: NAISS 2024/22-664
Project Type: NAISS Small Compute
Principal Investigator: Alexander Schliep <alexander.schliep@cse.gu.se>
Affiliation: Göteborgs universitet
Duration: 2024-05-06 – 2025-06-01
Classification: 10203
Keywords:

Abstract

The variable-length Markov chain is an extension of the Markov chain where the memory of the model can vary. The chains have applications in, e.g. bioinformatics where they are used to model genome sequences. However, existing methods are either slow or highly memory-intensive. Moreover, a faster implementation has since its development in 2005 been lost. We have completed development of the current state-of-the-art method for learning VLMCs even from very large data. We plan to explore further applications of the variable-length Markov chains in the domain of alignment-free sequence comparisons and analyze large biological data sets for example in viral genomics with the software we have developed.