Multimodal deep learning
Title: |
Multimodal deep learning |
DNr: |
Berzelius-2025-117 |
Project Type: |
LiU Berzelius |
Principal Investigator: |
Ekta Vats <ekta.vats@it.uu.se> |
Affiliation: |
Uppsala universitet |
Duration: |
2025-05-01 – 2025-11-01 |
Classification: |
10210 |
Homepage: |
https://www.ektavats.se |
Keywords: |
|
Abstract
The project focuses on developing multimodal vision-language models, which integrate and interpret information across various modalities, particularly visual (images, videos) and linguistic (text) data. The objective is to develop scalable and efficient multimodal models and their underlying learning algorithms while exploring their potential to perform computer vision tasks that require a deep understanding of both visual content and language. There is a strong focus on deep learning, image analysis, representation learning and language modeling within the project.