Machine learning for protein structure prediction
Title: Machine learning for protein structure prediction
DNr: Berzelius-2025-6
Project Type: LiU Berzelius
Principal Investigator: Arne Elofsson <arne@bioinfo.se>
Affiliation: Stockholms universitet
Duration: 2025-02-01 – 2025-08-01
Classification: 10203
Homepage: https://bioinfo.se/
Keywords:

Abstract

Our research team is at the forefront of computational protein science, focusing on protein structure prediction, protein-protein interaction (PPI) detection, and protein design. These areas are critical for understanding fundamental biological processes and advancing applications in biotechnology, medicine, and synthetic biology. We leverage cutting-edge computational tools such as AlphaFold and OpenFold, which have revolutionised the field of structural biology. By combining these tools with novel algorithms and extensive model development, we aim to enhance prediction accuracy, explore novel protein architectures, and generate biologically relevant insights. A major innovation in our research has been the development of Mamba, a structured state-space model designed to replace the transformer-based attention mechanism in AlphaFold. Mamba addresses key bottlenecks in long-sequence protein modelling by achieving linear scalability in sequence length, significantly reducing inference time and memory usage. However, implementing such innovations requires rigorous architectural modifications, retraining, and validation. These computationally intensive processes demand extensive resources, often involving 32 GPUs for over a week per retraining cycle. Beyond structural predictions, our work extends to improving PPI detection and prediction. Collaborating with experimental partners, we validate computational predictions using native mass spectrometry (nMS) and cryo-electron tomography (cryo-ET). Recent studies have focused on benchmarking methods to improve homomeric and heteromeric interaction predictions and extend capabilities to include RNA and other macromolecules. These efforts are crucial for building a holistic understanding of cellular machinery. Additionally, our contributions to computational methods include Hessian-Informed Flow Matching (HI-FM), which improves the representation of molecular energy landscapes in stochastic systems. This approach has shown success in modelling equilibrium distributions and holds promise for applications in molecular dynamics and small-molecule binding predictions. Collaboration plays a key role in our success. Partnering with NBIS, we have optimised pipelines for AlphaFold on the Berzelius supercomputer, including a GPU-accelerated MMseq2 implementation. These innovations, along with several high-impact publications in 2024, underscore our commitment to advancing the field of protein science. Enhanced resource allocation on Berzelius would further enable us to overcome computational bottlenecks, increase the pace of discovery, and maintain our competitive edge.