Self-supervised 4D scene understanding for autonomous driving
Title: Self-supervised 4D scene understanding for autonomous driving
DNr: Berzelius-2025-177
Project Type: LiU Berzelius
Principal Investigator: Adam Lilja <adamlil@chalmers.se>
Affiliation: Chalmers tekniska högskola
Duration: 2025-05-23 – 2025-12-01
Classification: 10207
Keywords:

Abstract

This project aims to develop scalable self-supervised learning methods for 3D and 4D (spatiotemporal) occupancy estimation in the context of autonomous driving. Occupancy representations provide a unified framework for modeling both observed and occluded regions of the driving scene and are critical for robust downstream tasks such as planning, tracking, and prediction. Our current research focuses on learning dense occupancy fields from multi-modal sensor data (e.g., LiDAR, radar, and cameras) without relying on dense human annotations. A key goal is to enable scene representations that are both geometrically consistent over time and capable of expressing multiple plausible hypotheses in uncertain or occluded regions. Building on our prior work, we are investigating how to leverage temporal consistency, multi-view geometry, and auxiliary self-supervised signals to improve the spatial and temporal coherence of learned occupancy maps. In particular, we explore transformer-based architectures and contrastive or masked modeling objectives that can scale to city-scale driving datasets. These models require substantial GPU resources due to the need for high-resolution 3D outputs, long time horizons, and heavy input fusion from multiple sensor modalities. The requested compute resources will support model training, validation, and large-scale experimentation. The results of this work aim to advance foundational scene understanding capabilities for autonomous vehicles and contribute to the development of safe and generalizable driving systems.