Machine learning for protein structure prediction
Title: Machine learning for protein structure prediction
DNr: Berzelius-2024-220
Project Type: LiU Berzelius
Principal Investigator: Arne Elofsson <arne@bioinfo.se>
Affiliation: Stockholms universitet
Duration: 2024-06-01 – 2025-02-01
Classification: 10203
Homepage: https://bioinfo.se/
Keywords:

Abstract

We are dedicated to advancing the prediction of protein-protein interactions, continuously pushing the boundaries of knowledge. Our approach combines state-of-the-art methodologies with practical applications, allowing us to uncover invaluable biological insights into specific systems while contributing to the broader understanding of protein interactions. The pace of advancement in this field is exhilarating, and over the past year, we have made significant strides, mainly owing to the exceptional computational resources provided by SNIC/KAW. Our journey began with the development of the Fold and Dock pipeline (Bryant et al., 2022), enabling us to predict the structure of an extensive set of the human proteome (Burke et al., 2023). Building upon this foundation, we introduced the groundbreaking MolPC method, which empowers us to predict large protein complexes (Bryant et al., 2023). Currently, we are focused on refining and enhancing these methods through the following projects: Optimizing the Evoformer in AlphaFold2.3 for Detecting Interacting Proteins Our KAW project, "Learning the Language of Life," aims to predict interactions among all human proteins. Performing this pairwise would result in approximately 200 million comparisons, which is too slow for AlphaFold but feasible with methods like D-script. However, the accuracy of these faster tools is insufficient. Retraining the Scoring Function in AlphaFold2.3 While AlphaFold2.3's scoring function generally performs well, it often fails with difficult targets. Issues include inconsistently scoring very similar models and failing to identify excellent models within a pool of poor ones. Our approach uses a graph neural network (GNN) that incorporates information from AF2.3 and external data, specifically trained on these challenging cases to predict model quality. We have found that employing a Siamese-twin architecture shows promise. Implementing a Flow-Match Model into Structure Prediction Pipelines AlphaFold3 (code not released) replaced the structure module of AlphaFold2 with a diffusion-based model. We aim to integrate a flow-match model into our structure prediction pipelines to enhance performance. Here our work of stable autonomous flows in collaboration with Hossein Azizpour should be useful. Improved Prediction of Host-Pathogen Interactions Using Novel Species Pairing Predicting protein-protein interactions with AlphaFold heavily relies on co-evolutionary signals between protein pairs. However, these signals are absent for host-pathogen pairs because they do not co-evolve. To address this, we employ a novel approach by pairing species to enhance prediction accuracy for host-pathogen interactions. In addition to these method-developing projects, we continue our collaborative projects with more of a biological focus. I will just mention one project here: the predictions of sperm-egg interactions resulted in a well-predicted complex that helps our understanding of how the two membranes fuse (Elofsson, Elife 2023)