Diffusion Based Video Prediction
Title: Diffusion Based Video Prediction
SNIC Project: Berzelius-2022-66
Project Type: LiU Berzelius
Principal Investigator: Stefan Bauer <stefan.a.bauer@gmail.com>
Affiliation: Kungliga Tekniska högskolan
Duration: 2022-04-04 – 2022-11-01
Classification: 10201
Homepage: https://arxiv.org/abs/2105.14257


Score-based methods represented as stochastic differential equations on a continuous time domain have recently proven successful as a non-adversarial generative model. In particular they achieved new state-of-the-art performance on image generation while offering theoretical guarantees. Training such models relies on denoising score matching, which can be seen as multi-scale denoising autoencoders. In this project augment the denoising score-matching framework to enable representation learning without any supervised signal. GANs and VAEs learn representations by directly transforming latent codes to data samples. In contrast, the introduced diffusion based representation learning relies on a new formulation of the denoising score-matching objective and thus encodes information needed for denoising. We illustrate how this difference allows for manual control of the level of details encoded in the representation. Using the same approach, we propose to learn an infinite-dimensional latent code which achieves improvements of state-of-the-art models on semi-supervised image classification in representation learning. We now want to significantly extend this work to develop diffusion based models for video prediction. Using the existing proposal on Berzelius we already get very promising result on the relatively small MNIST dataset and in this project proposal we now want to scale this approaches to high-dimensional datasets, run the ablations on model architecture and prepare it for a submission to NeurIPS or ICLR.