Multimodal transformer for speech, text and images

System

NSC Web

Front Page

Getting Access

Support Email

support@nsc.liu.se

Feedback

Give Feedback

Multimodal transformer for speech, text and images

Title:	Multimodal transformer for speech, text and images
DNr:	Berzelius-2022-44
Project Type:	LiU Berzelius
Principal Investigator:	Birger Moell <bmoell@kth.se>
Affiliation:	Kungliga Tekniska högskolan
Duration:	2022-04-25 – 2022-11-01
Classification:	10208
Homepage:	https://www.kth.se/is/tmh/division-of-speech-music-and-hearing-1.780110
Keywords:

Abstract

Our goal is to train a Swedish data2vec model, a multimodal transformer model for text, speech and images. We also aim to make the model available open source. We believe that a multimodal transformer model has the potential to be a next step towards more generalisable models and can be useful for the Swedish research community and general public. Link to data2vec paper https://arxiv.org/abs/2202.03555

National Supercomputer Centre at Linköping University

Abstract