Optimized End-to-end learning for protein docking
Title: Optimized End-to-end learning for protein docking
DNr: Berzelius-2023-328
Project Type: LiU Berzelius
Principal Investigator: Arne Elofsson <arne@bioinfo.se>
Affiliation: Stockholms universitet
Duration: 2023-12-01 – 2024-06-01
Classification: 10203
Homepage: https://bioinfo.se/


We are working in the field of predicting protein-protein interactions, constantly pushing the boundaries of scientific discovery. Our innovative approach combines cutting-edge methods with practical applications, enabling us advance our understanding of protein-protein interactions. The pace of progress in this field is nothing short of exhilarating, and we have made significant contributions in the past year, thanks in large part to the exceptional computational resources provided by SNIC/KAW. Our groundbreaking work began with the development of the Fold and Dock pipeline (Bryant et al. 2022), which allowed us to predict the structure of an extensive set of the human proteome (Burke et al., 2023). Building on this success, we then introduced the revolutionary MPC method, empowering us to predict large protein complexes (Bryant et al., 2023). Currently, we are trying to refine and enhance these methods. Optimization of pDockQ: The renowned pDockQ, a cornerstone of our Fold and Dock pipeline, has become the gold standard for predicting the quality of protein complexes in the field. We are committed to continuous improvement, and we have recently introduced pDockQ2 (Zhu et al., 2023), surpassing its predecessor. Notably, pDockQ2 excels in assessing the quality of individual chains for multimeric protein complexes. Enhanced prediction of antibody-antigen complexes: Leveraging the AFsample strategy (Wallner, 2023), we're developing pDockQ3, a state-of-the-art scoring function based on an advanced analysis of quality predictions in AlphaFold. While AlphaFold's current reliance on predicted TM scores works well in most cases, it fails when only a tiny portion of one chain is accurately predicted, hindering the ranking of model quality. To tackle this, we're exploring applying a recurrent neural network model to estimate all possible superpositions. Our objective is to provide unparalleled insights into antibody-antigen interactions and other difficult cases of protein-protein interaction predictions. Predicting the pairing of homologous protein pairs: One limitation of AlphaFold is that it can not distinguish interactions between interacting and non-interacting homologs, as we described in our modeling approach of the proteasome. Here we are combining statistical potentials with AlphaFold to enable such predictions. Preliminary data looks promising. Development of a fast PPI methods: AlphaFold is too slow to predict the interaction between all pairs of human proteins. Faster methods exist, but they are less reliable. We are developing a pipeline optimizing a combination of tools to enable much larger predictions. The method is based on the evoformer in AlphaFold but ignores the structural model and recycling. In this way, we can speed up predictions substantially. Use of a diffusion model in AlphaFold. The structural model in AlphaFold is not trained as a diffusion model, but in principle, it could be. It contains eight layers to turn the information from the pairwise interactions into a structural model. Replacing the structural model with a diffusion model would enable an improved prediction of structural variety. In addition to these method-developing projects, we continue our collaborative projects with more of a biological focus.