Natural Language Processing for Autonomous Driving Systems
||Natural Language Processing for Autonomous Driving Systems|
||Georg Hess <email@example.com>|
||Chalmers tekniska högskola|
||2022-05-17 – 2022-12-01|
This project aims at learning coherent embeddings for text, images and point clouds for automotive data. Specifically, we will use a pre-trained image and text encoder (CLIP trained model) and extend their embedding space to a third modality, namely lidar scans, by training a point cloud encoder in a self-supervised fashion from a large number of image-point cloud pairs. This way we can learn the semantics of point clouds without any labeling/annotation cost.
If successful, this method shows how to extent existing text-image algorithms to a third modality, something which has not be done before. Further, this would allow us to create text queries to extract relevant scenes from a data lake, useful for dataset exploration and curation. For instance, we would allow ADAS practitioners to easily find "a scene with a stroller passing the street" and verify their model's correctness on specific types of scenarios.
Second, we hope to show that this is a useful pre-training step for perception models, i.e. boosting performance on object detection and/or tracking. This has the potential to reduce the need for costly and laborious manual labeling.
Last, inspired by recent text-to-image generation methods such as OpenAI's DALLE-2, our method would enable us to do point cloud to image generation, creating photo-reaslistic driving scenes from lidar scans. In extension, this could open new avenues for synthetic data generation.
We believe this research project has great potential and we aim to publish our results in a top-tier ML-conference.