Characterising macromolecular interactions with artificial intelligence and cryo-EM
Abstract
Project 1:
Proteins, ranging from channels, transporters, and receptors to different enzymes, achieve their biological functions by undergoing specific sequences of conformational transitions, where each state has unique properties e.g. binding to drugs. To characterize these functions, we need an understanding of the structural details of these functional states and the transition pathways involved. artificial intelligence models, like AlphaFold2 [Jumper et al. Nature, 2021] and AlpaFold3 [Abramson et al. Nature, 2024], has made huge progress in this field. We, led by PhD student Samuel Samuel Eriksson Lidbrink [Lidbrink et al., bioRxiv, 2024], and others [del Alamo et al. eLife, 2022] in the field have shown how subsampling the multiple sequence alignment (MSA) can guide AlphaFold2 to sample alternative states. Although AlphaFold3 has improved performance over AlphaFold2 using a new diffusion module and can handle much more diverse input, AlphaFold3 is still primarily designed to predict a single or a few structures per protein. However, the diffusion module in AlphaFold3 may have additional capabilities. We are exploring ways to repurpose the diffusion module for sampling conformational transitions between protein states, e.g. by removing the noise in the sampling process which turns the stochastic differential equation guiding the sampling process into an ordinary differential equation that can be solved backward [Karras et al. arXiv, 2022]. We also seek to explore whether we can integrate stochastic masking of columns in the MSA used in AFSample2 [Kalakoti et al. bioRxiv 2024] to improve our prediction quality.
Project 2:
In addition to predicting protein conformational landscapes, this proposal entails another crucial element of protein function - the binding of small molecules which has huge potential in drug discovery. Although AlphaFold3 has demonstrated its capability in predicting protein-ligand complexes, it was only possible due to rich experimental structural data in the RCSB database. While more such data can make these models more powerful, determining experimental structures of such complexes is often difficult using the traditional time-consuming approach of hunting for suitable crystals for X-ray analysis. Recent breakthroughs in single-particle cryo-EM have overcome this limitation and enabled us to obtain atomic resolution structures of complex biomolecular systems. Though cryo-EM can now provide very high-resolution data of the overall system (less than 2 Å in many cases), unfortunately, resolutions of ligands are often significantly low for accurate modeling. Parallel to developments in cryo-EM, computational methods for modeling and refining structures into EM maps have been developed, but their main focus has been to build accurate protein structures. Therefore, with the support of a Marie Skłodowska-Curie Actions postdoctoral fellowship to Dr. Nandan Haloi, we are exploiting the increased power of data-driven research and building a machine-learning (ML) model to improve low-resolution maps and refine structural models.