Benchmarking Self-Supervised Representation Learning Approaches for Generalization
Title: Benchmarking Self-Supervised Representation Learning Approaches for Generalization
DNr: Berzelius-2024-269
Project Type: LiU Berzelius
Principal Investigator: Karl Henrik Johansson <kallej@kth.se>
Affiliation: Kungliga Tekniska högskolan
Duration: 2024-08-14 – 2025-03-01
Classification: 10201
Homepage: https://www.kth.se/profile/amkm
Keywords:

Abstract

Unsupervised methods learn representations from unlabeled data by identifying intrinsic structures within the data. Techniques such as autoencoders and self-supervised learning have gained popularity. Autoencoders learn to compress and reconstruct data, capturing essential features in the process, while self-supervised learning tasks like contrastive learning use auxiliary tasks to drive representation learning, making them effective even without labeled data. A variant of unsupervised learning, self-supervised learning creates supervisory signals from the data itself. Techniques like masked language modeling (e.g., BERT) in NLP and contrastive learning in computer vision generate labels through tasks like predicting missing parts of the input or distinguishing between different transformations of the data. This approach has become highly effective for pre-training models on large datasets before fine-tuning on specific tasks. In another line of work, recent works such as our previous work on object-centric representations – which was carried out in the previous project period – have shown that inductive biases such as object-centric bias might still be relevant when combined with large pre-trained vision models, depending on the downstream task. However, what all the previous approaches have in common is that these single studies often focus on too narrow comparisons and seldom try to fix all factors of variations so that precise and interesting conclusions can be made. For example, controlling for model or training size is essential in order to distinguish the representation learning capabilities of different binding mechanisms. In the previous project, we mainly focused on the trade-offs and advantages of inductive biases for the case of object-centric representations vs. large pre-trained models which deepened our understanding of existing models as a whole. In the next project, we will build up on the previous work and will continue in the same direction to better understand the essential requirements of learning generalizable and universal representations. We aim to dive deeper and answer the following question: “What drives the representation learning capabilities?” To answer this, we would like to analyze how exactly each component of the models affects the quality of representations. We aim to investigate the impact of various design choices systematically and thus enable systematic progress towards better compositional representation learning approaches.