Advanced 3D Perception for Environment Understanding and Representation
Title: Advanced 3D Perception for Environment Understanding and Representation
DNr: Berzelius-2023-364
Project Type: LiU Berzelius
Principal Investigator: Patric Jensfelt <patric@kth.se>
Affiliation: Kungliga Tekniska högskolan
Duration: 2024-01-01 – 2024-07-01
Classification: 10207
Homepage: https://www.kth.se/profile/patric
Keywords:

Abstract

Within the Research group we have 3 PhD students, working on perception, 3D environment understanding and representation. 3D perception is a multidisciplinary research field dedicated to the extraction of spatial information from mainly two-dimensional images, enabling the creation of three-dimensional representations of the visual world. Combining computer vision with geometric reasoning, this facilitates applications ranging from object recognition and scene reconstruction to autonomous navigation systems. In addition to camera images, many tasks such as autonomous driving have multi-modal sensor data available such as additional lidar and radar measurements. Fusing this multi-modal information to a reliable scene representation is essential in traffic scenarios with multiple agents, where the risk of collision is a major concern. The students are working to find better representations of the data using neural networks with the goal of better modeling and predicting the multiagent scenarios. Similarly, it is important for it to accurately localize itself in the environment. Tasks involving localization and camera pose estimation are often extremely challenging, especially in ambiguous environments. Recently, various vision transformer architectures have been proposed for the purpose of estimating the relative camera pose between different viewpoints of a common scene. Additionally, epipolar geometry can be used to further refine these predictions. Nevertheless, the use and combination of these two approaches are not yet well-established in the field. Recently, neural rendering has become a successful strategy to learn the 3D structure of a scene with neural networks, without the need for expensive 3D annotations. Using differentiable rendering, those methods utilize the neural scene representation for rendering of images of a scene, calculating the error to the ground-truth observations and then use the back-propagated error to improve the learned 3D scene representation. The research goal for our group is to decrease the required training time and to extend those methods from static to dynamic scenes.