Self-supervised and Spatio-Temporal Methods in Deep Learning
Title: Self-supervised and Spatio-Temporal Methods in Deep Learning
SNIC Project: Berzelius-2021-60
Project Type: LiU Berzelius
Principal Investigator: Karl Åström <>
Affiliation: Lunds universitet
Duration: 2021-10-25 – 2022-05-01
Classification: 10207


One of the fundamental limitations of many deep learning applications, such as perception systems in autonomous vehicles, is the need for massive amounts of manually annotated training and validation data. The process of generating such data is, in general, very expensive and time-consuming. The aim of this research project is to investigate various methods that can reduce this problem. Self-supervised learning is a powerful paradigm with avoids this problem entirely. Instead of being manually created by humans, the supervision signal is generated directly from the data. Most research on single image self-supervised computer vision has been focused on creating powerful global representations of the entire image. However many applications, such as object detection, rely on distinguishing and separating fine details in the image. Formulating an optimal self-supervised algorithm for these kinds of tasks is still an open question. Another issue is that many problems cannot be directly formulated in a self supervised manner. That is where the hybrid approach of semi-supervised learning can be very useful. It consists of mixing a small amount of human supervision with a large amount of self-supervision. The specific way of mixing supervised, and self-supervised objectives is however non-trivial, and the optimal approach highly depends on the target application. Finally, another approach is to use additional sources of information, such as more sensors or temporal relationships, to automatically generate high quality annotations that can be used to train the target algorithms in a fully supervised manner. This separation of concerns allows for vastly more powerful algorithms in the auto-annotation pipeline. For example, it is possible to  look into the future, use ensembles of large networks due to lack of computational restraints, apply heavy post-processing, etc.