Scalable Multimodal Identity Modeling and Analysis
Title: Scalable Multimodal Identity Modeling and Analysis
DNr: Berzelius-2026-152
Project Type: LiU Berzelius
Principal Investigator: Niklas Carlsson <niklas.carlsson@liu.se>
Affiliation: Linköpings universitet
Duration: 2026-05-01 – 2026-11-01
Classification: 10201
Homepage: https://www.ida.liu.se/~nikca89/publications.html
Keywords:

Abstract

Our primary goal is to develop scalable methods for multimodal identity modeling and analysis in complex visual data, with a particular focus on robust tracking and identification across challenging real-world scenarios. Building on recent advances in multi-object tracking and our prior work on zero-shot tracking and global identity fusion, we aim to generalize these approaches into a broader framework that integrates video, text, and geometric representations. In the continuation phase, we extend our system beyond traditional tracking by incorporating human-centric UV texture modeling, 3D reconstruction, and cross-modal reasoning, enabling identity-consistent tracking across viewpoints, occlusions, and long temporal horizons. This unified framework supports applications in domains such as sports analytics, UAV/CCTV-based surveillance, and large-scale video understanding, while also connecting to emerging research directions in privacy-aware multimodal learning and social media data analysis. Our approach leverages modern deep learning architectures, including Transformer-based models with customized memory and attention mechanisms, and is designed to scale across multiple tracking and generative modeling paradigms. We emphasize compatibility with existing tracking frameworks while advancing toward more general multimodal identity representations. The project relies heavily on large-scale GPU-based experimentation, including training on long video sequences and multimodal datasets combining visual, textual, and 3D information. Using the Berzelius supercomputing infrastructure, we will conduct extensive experiments to develop and evaluate these models at scale. This includes training advanced multimodal models, optimizing data pipelines, and supporting multi-GPU experimentation for increasingly complex architectures. The expected outcomes include improved robustness in identity tracking, new methods for multimodal representation learning, and a series of high-impact scientific publications. Overall, the project represents a transition from a task-specific tracking system to a broader, scalable research framework for multimodal identity understanding, providing a foundation for future work at the intersection of computer vision, machine learning, and privacy-aware data analysis.