Self-supervised and Spatio-Temporal Methods in Deep Learning
||Self-supervised and Spatio-Temporal Methods in Deep Learning|
||Karl Åström <email@example.com>|
||2021-10-25 – 2022-05-01|
One of the fundamental limitations of many deep learning applications, such as perception systems in autonomous vehicles, is the need for massive amounts of manually annotated training and validation data. The process of generating such data is, in general, very expensive and time-consuming. The aim of this research project is to investigate various methods that can reduce this problem. Self-supervised learning is a powerful paradigm with avoids this problem entirely. Instead of being manually created by humans, the supervision signal is generated directly from the data. Most research on single image self-supervised computer vision has been focused on creating powerful global representations of the entire image. However many applications, such as object detection, rely on distinguishing and separating fine details in the image. Formulating an optimal self-supervised algorithm for these kinds of tasks is still an open question.
Another issue is that many problems cannot be directly formulated in a self supervised manner. That is where the hybrid approach of semi-supervised learning can be very useful. It consists of mixing a small amount of human supervision with a large amount of self-supervision. The specific way of mixing supervised, and self-supervised objectives is however non-trivial, and the optimal approach highly depends on the target application.
Finally, another approach is to use additional sources of information, such as more sensors or temporal relationships, to automatically generate high quality annotations that can be used to train the target algorithms in a fully supervised manner. This separation of concerns allows for vastly more powerful algorithms in the auto-annotation pipeline. For example, it is possible to look into the future, use ensembles of large networks due to lack of computational restraints, apply heavy post-processing, etc.