Self-supervised learning for histopathology applications
Title: Self-supervised learning for histopathology applications
SNIC Project: Berzelius-2021-61
Project Type: LiU Berzelius
Principal Investigator: Karin Stacke <>
Affiliation: Linköpings universitet
Duration: 2021-10-22 – 2022-05-01
Classification: 20603


The lack of labeled data in the domain of medical images is a major hurdle in terms of applying deep learning to solve tasks related to, for example, cancer diagnostics. One approach to lessen the need for labeled data is to train a deep learning model in a self-supervised way, i.e., the model is trained on a proxy task where the labels can be automatically created. The resulting model can then serve as an initialization for the target task, where much less labelled data is now. This project aims to evaluate one such method, SimCLR on the applications for histopathology data. The goal is both to evaluate how different datasets and settings during pre-training affect the resulting representations, as well as evaluate how the performance on the target task depends on the amount of labelled data (i.e., how little labeled data do we need if we have a good pre-training?). These types of methods are computationally expensive, since the self-supervised models requires large datasets, large batch-sizes (i.e., requires multiple GPUs), and longer training times. It is therefore of great value to access the Berzelius resources, as many of these experiments could not be conducted on single-GPU environments. The outcome of this project is a journal article in The Journal of Machine Learning for Biomedical Imaging (MELBA). Much of the results that will be presented in this journal paper has already been generated using the Berzelius system during the pilot phase. The experiments are run in a Python virtual environment, using PyTorch ( with multi-GPU and mixed precision support.