Representation Learning and Transfer Dynamics in Structured Data Regimes
Title: |
Representation Learning and Transfer Dynamics in Structured Data Regimes |
DNr: |
NAISS 2025/5-275 |
Project Type: |
NAISS Medium Compute |
Principal Investigator: |
Stefano Sarao Mannelli <s.saraomannelli@chalmers.se> |
Affiliation: |
Chalmers tekniska högskola |
Duration: |
2025-05-28 – 2026-06-01 |
Classification: |
10210 10308 10105 |
Homepage: |
https://stefsmlab.github.io/research/ |
Keywords: |
|
Abstract
We request compute resources on both Alvis and Tetralith to support an ongoing research project focused on understanding the interplay between data structure and representation transfer in deep neural networks (DNNs). This work aims to improve our theoretical and practical understanding of transfer learning by analyzing how specific structural features—such as input geometry and input-output correlations—affect the learned representations and their reusability across tasks.
We train a wide range of architectures, including ResNet, VGG, and Vision Transformers, on CIFAR-10, ImageNet, CelebA, and structurally modified variants of these datasets. Training is repeated across controlled variations to systematically probe representation dynamics, making the experimental pipeline highly compute-intensive.
GPU compute on Alvis (C3SE) is essential for the training phase. We anticipate medium to high utilization of the NAISS Medium Compute allocation—approximately 4,000 GPU-hours/month—leveraging the A100 and A40 GPUs for large-scale training of ViTs and deeper ResNets. These models benefit substantially from the memory bandwidth and parallelism of modern accelerators and must be retrained repeatedly across different transfer settings. The Alvis GPU cluster, specifically designed for AI/ML workloads, is well matched to our experimental demands.
CPU compute on Tetralith (NSC) is equally critical for the project. We expect to utilize a substantial but within-limit share of the Tetralith allocation (approximately 100,000–200,000 core-hours/month). This usage supports two main tasks:
1. Post-training evaluation using metrics such as centered kernel alignment (CKA), intrinsic dimensionality, and information imbalance. These analyses require large memory footprints and high-throughput parallel processing—best handled by the high-memory, multi-core CPU nodes on Tetralith.
2. Theoretical model simulations involving integro-differential equations that describe the macroscopic training dynamics of neural networks. These computations demand fine-grained numerical solvers across parameter grids and benefit from Tetralith’s large memory nodes, fast interconnect, and CPU parallelism.
Our research group includes one Assistant Professor (PI), two PhD students, and one postdoc. Our work spans the theoretical foundations and practical applications of deep learning. We require both GPU and CPU compute resources to support complementary aspects of our research—training deep neural networks at scale and solving the corresponding theoretical models that describe their behavior.
The combination of Alvis and Tetralith provides an ideal infrastructure for this dual-natured research. Our use of both resources remains within the NAISS Medium Compute bounds (20,000 GPU-hours/month on Alvis and 400,000 core-hours/month on Tetralith) and reflects a medium to high, but sustainable usage level. The outcomes will directly support ongoing and upcoming publications at the intersection of deep learning theory, representation learning, and transfer learning.
To illustrate the scope and relevance of our work, below are three representative publications spanning from theoretical to applied perspectives:
1. A Theory of Initialisation's Impact on Specialisation (arXiv:2503.02526, to appear in ICLR 2025)
2. Probing transfer learning with a model of synthetic correlated datasets, Machine Learning: Science and Technology (MLST)
3. How to choose the right transfer learning protocol? A qualitative analysis in a controlled set-up, Transactions on Machine Learning Research (TMLR)