Semantic Scene Understanding for Mobile Robots
Title: Semantic Scene Understanding for Mobile Robots
DNr: Berzelius-2025-203
Project Type: LiU Berzelius
Principal Investigator: Timon Homberger <timonh@kth.se>
Affiliation: Kungliga Tekniska högskolan
Duration: 2025-06-09 – 2026-01-01
Classification: 10201
Keywords:

Abstract

Successful navigation and manipulation with robotic systems in indoor and outdoor environments greatly benefit from semantic understanding of their environments. This can happen in the form of dense semantic reconstructions or instance level detection and reconstruction in the form of e.g. a scene-graph. Many current state-of-the-art methods rely on an accurate pose estimate to aid consistent geometrical reconstruction of environments. This project aims to explore using 3D geometrical information with language aligned and non-language aligned semantic information to create locally consistent representations that are useful for scene understanding, object retrieval and manipulation, while not strictly requiring global consistency. This may lead to more flexible methods for exploration and scene understanding in a variety of environment types. In the project we plan to use combinations of existing vision foundation models and large language models, while also aiming to fine-tune foundational models for domain-specific experiments. Furthermore we plan to explore learning based reconstruction from spatial-semantic data. Finally, this project aims to explore using monocular image streams only, without the commonly used depth sensors measurements. By exploiting foundational models for monocular depth estimation and possibly inter-frame consistency constraints, the goal is to build a system that is able to provide high levels of semantic scene understanding with bounded reliance on ample sensor data.