Sampling Protein Sequences and Structures
Title: Sampling Protein Sequences and Structures
DNr: Berzelius-2024-201
Project Type: LiU Berzelius
Principal Investigator: Benjamin Murrell <benjamin.murrell@ki.se>
Affiliation: Karolinska Institutet
Duration: 2024-05-14 – 2024-12-01
Classification: 10203
Keywords:

Abstract

The objective of our current research direction is to construct probabilistic models that encapsulate the relationships between protein sequences and structures, employing techniques from deep learning. The core idea is to leverage these models to enhance our understanding of protein function and protein evolution, make contributions to protein design, and to speed up protein sequence analysis tasks that are, when using conventional techniques, slow. We implement our models in the Julia language, which provides exceptional developmental flexibility. We are using a variety of approaches, including standard transformer architectures for sequence modeling, but also SE3 equivariant transformer models, inspired by AlphaFold (see here for our Julia implementation: https://github.com/MurrellGroup/InvariantPointAttention.jl), as well as a novel approach to Diffusion models that makes it very easy to construct diffusions over complex and constrained geometries: https://github.com/MurrellGroup/Diffusions.jl Our lab has some local GPU infrastructure which we are using for development and small-scale experimentation, but the purpose of this proposal is to investigate the possibility of scaling our models and analyses using Berzelius resources. We thus currently apply for the default allocation, and once we have completed our initial investigations we will apply for a larger resource allocation, if necessary. The first preprint, stemming in part from the previous project, is available at: https://www.biorxiv.org/content/10.1101/2024.05.11.593685v1.full