Multimodal deep learning
Title: Multimodal deep learning
DNr: Berzelius-2025-117
Project Type: LiU Berzelius
Principal Investigator: Ekta Vats <ekta.vats@it.uu.se>
Affiliation: Uppsala universitet
Duration: 2025-05-01 – 2025-11-01
Classification: 10210
Homepage: https://www.ektavats.se
Keywords:

Abstract

The project focuses on developing multimodal vision-language models, which integrate and interpret information across various modalities, particularly visual (images, videos) and linguistic (text) data. The objective is to develop scalable and efficient multimodal models and their underlying learning algorithms while exploring their potential to perform computer vision tasks that require a deep understanding of both visual content and language. There is a strong focus on deep learning, image analysis, representation learning and language modeling within the project.