Multimodal representations of proteins (SubCellVS)
Title: Multimodal representations of proteins (SubCellVS)
DNr: Berzelius-2026-66
Project Type: LiU Berzelius
Principal Investigator: Emma Lundberg <emma.lundberg@scilifelab.se>
Affiliation: Kungliga Tekniska högskolan
Duration: 2026-03-31 – 2026-10-01
Classification: 10610
Keywords:

Abstract

Multimodal representations of proteins have led to significant advances in our understanding of scientific discovery and its development. Current models such as ESM3 [1] and ProtT5 [2] primarily focus on protein sequence, structure, and function but perform poorly at predicting subcellular localization [3]. Subcellular protein localization is vital for understanding the function of different cellular systems and is essential for disease characterization and drug discovery. Often, disease-causing mutations lead to mislocalizations that can only be captured with microscopy. Hence, there’s a need for a more comprehensive multimodal representation of proteins, including their localization in the cell. In previous work [4; currently in review in Nature], we built a vision-only protein representation model and showed that it robustly learned protein localization patterns and also outperformed state-of-the-art models across various datasets for cell-cycle and drug perturbation prediction. In this work, we aim to enhance our understanding of proteins by building a multimodal protein representation model and an independent cell representation model that are agnostic to input channel combinations. We have collected over 2.2 million single-cell images of protein localization patterns of over 15k proteins across more than 50 cell lines, across 5 different datasets. We have also collected the protein sequence information of those proteins. Altogether, the model will be trained on the biggest multimodal dataset for protein localization and representation. We aim to train a multimodal protein representation model that captures all multimodal aspects of proteins. The model will help better characterize the proteins by putting them in the context of cells and advance our understanding of diseases. The model will be extensively evaluated across a diverse set of vision and sequence tasks, and the benchmark will be released to the public to encourage further development in the field. Sub-Project: Macrophages are infiltrating immune cells that play a central role in innate immunity, as well as in normal tissue development, maintenance of homeostasis, and tissue repair. This project aims to provide a currently under-studied spatial context for macrophage research by examining protein localization and morphological features. We leveraged a high-throughput, multiplexed imaging technique targeting morphological markers combined with ~500 macrophage-specific markers. To analyze this complex imaging dataset, we will use our in-house machine learning model, SubCell [4], to extract and interpret subcellular morphological features and predict protein localization patterns across macrophage states, and will provide a novel lens for studying macrophage plasticity and for future integrative approaches in cell biology 1. Hayes, Thomas, et al. "Simulating 500 million years of evolution with a language model." Science 387.6736 (2025): 850-858. 2. Elnaggar, Ahmed, et al. "ProtTrans: towards cracking the language of life’s code through self-supervised learning." IEEE Transactions on Pattern Analysis and Machine Intelligence 44 (2021): 7112-7127. 3. Zoe Wefers, et al. “A comprehensive benchmark of sequence-based subcellular localization predictors for human proteins.” Submitted to Nature 2025. 4. Gupta, Ankit, et al. "SubCell: Proteome-aware vision foundation models for microscopy capture single-cell biology." bioRxiv (2025): 2024-12.