Cross-dataset generalization for video: are recurrent models less texture biased?
Title: Cross-dataset generalization for video: are recurrent models less texture biased?
SNIC Project: Berzelius-2021-50
Project Type: LiU Berzelius
Principal Investigator: Sofia Broomé <>
Affiliation: Kungliga Tekniska högskolan
Duration: 2021-09-23 – 2022-04-01
Classification: 10207


Goal: Running experiments for a paper about the generalization abilities of deep video models. To be submitted to CVPR in November 2021, but I might need to run experiments for a period of time after that, possibly until March. The paper is an empirical investigation and comparison of the cross-dataset-generalization abilities of, specifically, recurrent CNNs and 3D CNNs, which are heavy to train. Studying cross-dataset-generalization for video models also implies studying their texture bias, which is novel work in the video domain (Geirhos et al., ICLR 2019, did this for single image data). I am studying the empirical consequences of the inherently different temporal modeling of these two models. Importance of project: The topic is novel and important since deep learning for video is far behind deep learning for single images. On a personal level, the project is important because it is my final paper to include in my thesis next year. Expected goal fulfilment: A finished write-up of the many experiments that the study requires, submitted to either CVPR22 (Deadline Nov 2021) or ECCV22 (Deadline March 2022). Software and methods to be used: Python, Pytorch with GPU-computation and various other conda-installed libraries. For the heavier datasets, I am dependent on distributed computing (in my case, via Pytorch lightning), which I have already tested within the Berzelius pilot phase. Ideally, if time allows, I will start to use the Singularity framework for the software environment. Otherwise, I will continue to use conda.