DETERMINING PROTEIN CONFORMATIONAL ENSEMBLES BY COMBINING MACHINE LEARNING AND SASX/SANS
Title: DETERMINING PROTEIN CONFORMATIONAL ENSEMBLES BY COMBINING MACHINE LEARNING AND SASX/SANS
DNr: Berzelius-2023-244
Project Type: LiU Berzelius
Principal Investigator: Erik Lindahl <erik.lindahl@scilifelab.se>
Affiliation: Kungliga Tekniska högskolan
Duration: 2023-09-28 – 2024-04-01
Classification: 10603
Homepage: http://www.biophysics.se
Keywords:

Abstract

Protein flexibility, motion, and conformational transitions form the bedrock of biological processes. Characterizing the conformations, their functional annotation, and the inter-conversion rates linking states are all thus essential prerequisites for understanding the molecular basis of protein function and thereby invaluable to drug-development efforts. However, relatively few biological processes have been thoroughly simulated or mapped. Recently, artificial intelligence (AI) in combination with bioinformatics has made revolutionary progress in this field [Perrakis et al. EMBO reports, 2021]. For example, AlphaFold2 (AF2) [Jumper et al. Nature, 2021] first extracts the information contained in the co-evolving residue pair of proteins throughout thousands of years of history by multiple sequence alignment (MSA). Then, AF2 feeds this information into a deep neural network to predict the structure of a protein from the amino acid sequences. While AF2 was primarily designed to determine a single structural model of proteins, a few studies have recently demonstrated its capability of generating an ensemble of protein states covering the conformational landscape of protein function [del Alamo et al. eLife, 2022]. Reducing the depth of the input multiple sequence alignments by stochastic subsampling can lead to the generation of accurate models in multiple conformations. Despite the success of AF2 in providing protein structural ensembles, given the lack of physics in machine learning algorithms, AF2 often results in non-Boltzmann distributed or sometimes even physically unrealistic models, requiring external validation. We are building methodologies to tackle this issue by coupling AF2 with experimental data such as cryo-EM and small angle X-ray scattering (SAXS) or small angle neutron scattering (SANS) to determine protein conformational enables, as can be found in the reports of our previous allocations. However, our current implementation is limited to protein monomers. We request another round of Berzilous allocation to extend our methodology to multimer using the latest development based on AF2-multimer. For example, a recent study has shown how denoising the MSA profile can help generate diverse and accurate predictions [ Bryant, and Noé et al., bioRxiv, 2023] for multimeric proteins. We will couple this approach with SANX/SANS data to determine experimentally verified structural ensembles of a multimeric protein. In total, we propose to test this with 3 different protein complexes that are estimated to require 23,200 GPU node-hour for 6 months.