Learning heterogeneity representations in molecular image data
| Title: |
Learning heterogeneity representations in molecular image data |
| DNr: |
Berzelius-2025-257 |
| Project Type: |
LiU Berzelius |
| Principal Investigator: |
Björn Forsberg <bjorn.forsberg@liu.se> |
| Affiliation: |
Linköpings universitet |
| Duration: |
2025-10-27 – 2026-05-01 |
| Classification: |
10203 |
| Keywords: |
|
Abstract
Cryo-electron microscopy (cryo-EM) is pivotal in modern structural biology since it can reveal the molecular structure of a wide range of macromolecular complexes at near-atomic resolution. One of its main challenges remains the reliable interpretation of heterogeneous and noisy data. This project focuses on developing and testing new AI and machine learning methods to better capture and interpret structural variability in cryo-EM datasets.
Our goal is to design models that can infer meaningful biology from the variation in conformation and binding, and to separate these meaningful variations from experimental noise. To do this, we will develop and benchmark a range of 3D deep learning approaches, CNNs, VAEs, diffusion-based models, and discriminators that can assess 3D map consistency and quality leveraged against novel data represenations.
The work will include:
• Learning continuous and discrete modes of heterogeneity from experimental and synthetic 3D data, steered from statistical first principles.
• Exploring noising and de-noising strategies to regularize 3D reconstructions and improve interpretability, based on novel feature extraction.
• Building discriminative models that can quantify conformational diversity and reconstruction fidelity.
• Integrating these methods into existing cryo-EM workflows (RELION, cryoSPARC) to evaluate practical benefits.
This aims to fundamentally examine, assess, explore and re-shape how data representations and correlated features can be leveraged for more automated detection of natural biological covariates, for increased fidelity in automated processing of biomolecular structure data.