Deep learning for protein prediction
Title: |
Deep learning for protein prediction |
DNr: |
NAISS 2024/5-311 |
Project Type: |
NAISS Medium Compute |
Principal Investigator: |
Arne Elofsson <arne@bioinfo.se> |
Affiliation: |
Stockholms universitet |
Duration: |
2024-07-01 – 2025-07-01 |
Classification: |
10203 10610 10601 |
Homepage: |
http://bioinfo.se/ |
Keywords: |
|
Abstract
We are dedicated to advancing the prediction of protein-protein interactions, continuously pushing the boundaries of knowledge. Our approach combines state-of-the-art methodologies with practical applications, allowing us to uncover invaluable biological insights into specific systems while contributing to the broader understanding of protein interactions.
The pace of advancement in this field is exhilarating, and over the past year, we have made significant strides, largely owing to the exceptional computational resources provided by SNIC/KAW. Our journey began with the development of the Fold and Dock pipeline (Bryant et al., 2022), enabling us to predict the structure of an extensive set of the human proteome (Burke et al., 2023). Building upon this foundation, we introduced the groundbreaking MolPC method, which empowers us to predict large protein complexes (Bryant et al., 2023).
Currently, we are focused on refining and enhancing these methods through the following projects:
Optimizing the Evoformer in AlphaFold2.3 for Detecting Interacting Proteins
In our KAW project, "Learning the Language of Life," we aim to predict interactions among all human proteins. Performing this pairwise would result in approximately 200 million comparisons, which is too slow for AlphaFold but feasible with methods like D-script. However, the accuracy of these faster tools is insufficient.
Retraining the Scoring Function in AlphaFold2.3
While the scoring function of AlphaFold2.3 generally performs well, it often fails with difficult targets. Issues include inconsistently scoring very similar models and failing to identify excellent models within a pool of poor ones. Our approach involves using a graph neural network (GNN) that incorporates information from AF2.3 and external data, specifically trained on these challenging cases to predict model quality. We have found that employing a Siamese-twin architecture shows promise.
Implementing a Flow-Match Model into Structure Prediction Pipelines
AlphaFold3 (code not released) replaced the structure module of AlphaFold2 with a diffusion-based model. We aim to integrate a flow-match model into our structure prediction pipelines to enhance performance. Here our work of stable autonomous flows in collaboration with Hossein Azizpour should be useful.
Improved Prediction of Host-Pathogen Interactions Using Novel Species Pairing
Predicting protein-protein interactions with AlphaFold heavily relies on co-evolutionary signals between protein pairs. However, these signals are absent for host-pathogen pairs because they do not co-evolve. To address this, we employ a novel approach by pairing species in a way that enhances prediction accuracy for host-pathogen interactions.
In addition to these method-developing projects, we continue our collaborative projects with more of a biological focus. I will just mention one project here, the predictions of sperm-egg interactions resulted in a well-predicted complex that helps our understanding of how the two membranes fuse together (Elofsson, Elife 2023)