Natural Language Processing for Autonomous Driving Systems
||Natural Language Processing for Autonomous Driving Systems|
||Georg Hess <firstname.lastname@example.org>|
||Chalmers tekniska högskola|
||2022-12-02 – 2023-07-01|
This project aims at learning coherent embeddings for text, images, and point clouds for automotive data. Specifically, we will use a pre-trained image and text encoder (CLIP trained model) and extend their embedding space to a third modality, namely lidar scans, by training a point cloud encoder in a self-supervised fashion from a large number of image-point cloud pairs. This way, we can learn the semantics of point clouds without any labeling/annotation cost.
We ran a successful pilot project with our previous Berzelius application, resulting in a submission to CVPR, a top-tier computer vision conference, and hope to now extend our work further. There are multiple unanswered questions, e.g., whether this is a useful pre-training step for perception models for boosting performance on object detection and/or tracking. This has the potential to reduce the need for costly and laborious manual labeling.
Further, we want to train larger models with more capacity and use multiple datasets to improve robustness of our method.
We believe this research project has great potential, and we once again aim to publish our results in a top-tier ML-conference.