Modeling Part-whole Hierarchies with Object-Centric Learning
Title: Modeling Part-whole Hierarchies with Object-Centric Learning
DNr: Berzelius-2025-125
Project Type: LiU Berzelius
Principal Investigator: Karl Henrik Johansson <kallej@kth.se>
Affiliation: Kungliga Tekniska högskolan
Duration: 2025-03-31 – 2025-10-01
Classification: 10201
Homepage: https://www.kth.se/profile/amkm
Keywords:

Abstract

Deep neural networks have achieved outstanding success in many tasks ranging from computer vision (Krizhevsky et al., 2012), natural language processing (Vaswani et al., 2017), playing games (Silver et al., 2016) and more recently protein folding (Jumper et al., 2021). With more and more applications being driven by deep learning it becomes more and more important to understand their inner workings and the key components driving their success. It facilitates model interpretability, guides effective architecture design, and enables troubleshooting, leading to improved performance, robustness, and informed decision-making in the rapidly evolving field of artificial intelligence. In this project we will continue focusing on understanding the key components of Object-centric learning, a recent subfield of deep learning which has already found widespread attention (Locatello et al., 2020; Singh et al., 2021; Löwe et al., 2023; Jiang et al., 2023; Didolkar et al., 2024). Object-centric learning emphasizes the explicit representation and understanding of individual entities within a scene, promoting structured information processing and enhancing model interpretability. Through this investigation, we aim to provide a comprehensive understanding of modeling part-whole hierarchies using object-centric models. By leveraging the capabilities of object-centric representation learning, our project seeks to optimize, adapt, and efficiently apply these models across a myriad of domains. Our approach explicitly captures relationships between objects and their constituent parts, potentially enhancing interpretability, improving robustness to distributional shifts, and promoting better generalization to unseen compositions, thereby advancing the state-of-the-art in this field. Continuation of Previous Project In parallel with our new project, we aim to first continue our previous project on investigating compositional generalization. In the previous project, we introduced a benchmark designed to systematically evaluate the ability of object-centric models to generalize to novel object compositions in visually complex settings. Using a controlled dataset (Compositional CLEVRTex (Karazija et al., 2021)), we compared standard vision encoders (e.g., DINOv2 (Oquab et al., 2023)) with object-centric models (e.g., DINOSAURv2 (Seitzer et al., 2022)). Our results highlighted that OC models achieve superior generalization with significantly lower computational requirements, reinforcing the advantages of structured inductive biases for learning efficient representations. Looking forward, we aim to continue to deepen our investigation into the compositional generalization capabilities of OC models. With further empirical experiments and a broader range of datasets, we intend to refine our understanding of structured inductive biases and their role in systematic generalization. The previous project culminated in a submission to the ICLR 2025 Workshop on Spurious Correlation and Shortcut Learning: Foundations and Solutions, and we aim to continue it and make it into a full conference paper.