Advanced 3D Perception for Environment Understanding and Representation
| Title: |
Advanced 3D Perception for Environment Understanding and Representation |
| DNr: |
Berzelius-2026-200 |
| Project Type: |
LiU Berzelius |
| Principal Investigator: |
Patric Jensfelt <patric@kth.se> |
| Affiliation: |
Kungliga Tekniska högskolan |
| Duration: |
2026-07-01 – 2027-01-01 |
| Classification: |
20208 |
| Homepage: |
https://www.kth.se/profile/patric |
| Keywords: |
|
Abstract
Within this research project we have 7 PhD students (WASP PhD students or WASP affiliated) + 3 PostDoc alongside their research engineers, working on perception, 3D environment understanding, and representation.
3D perception is a multidisciplinary research field dedicated to extracting spatial, semantic, and dynamic information from images, video, LiDAR, and point clouds. It combines computer vision, geometric reasoning, machine learning, and robotic perception to build representations that can support localization, scene reconstruction, motion understanding, semantic retrieval, and autonomous decision making.
Several projects in this proposal focus on the geometric side of environment understanding: estimating camera pose, learning image distances for direct localization, synthesizing novel views, reconstructing scenes from continuous streams, and learning robust representations from partial observations. These methods are intended to reduce reliance on brittle preprocessing pipelines and to make localization and mapping more adaptable across datasets, sensors, and platforms.
Other projects address semantic, dynamic, and decision-level understanding. This includes self-supervised scene flow for moving 3D point clouds, scalable 3D scene graph representations for large environments, open-vocabulary segmentation that adapts to changing robotic domains, road-rule-aware reasoning for autonomous driving, and LiDAR pre-training that transfers visual foundation model knowledge to 3D backbones. Across these directions, a common goal is to learn reliable representations from large-scale, heterogeneous data with limited manual annotation, while supporting practical deployment in robotics and autonomous driving.