Scalable Learning and Optimization for Graph-Structured Data and Reliable Systems
Title: Scalable Learning and Optimization for Graph-Structured Data and Reliable Systems
DNr: Berzelius-2026-108
Project Type: LiU Berzelius
Principal Investigator: Aristides Gionis <argioni@kth.se>
Affiliation: Kungliga Tekniska högskolan
Duration: 2026-03-26 – 2026-10-01
Classification: 10201
Keywords:

Abstract

Modern machine learning research increasingly relies on large pre-trained models and algorithms operating on complex data structures such as graphs and high-dimensional embeddings. In our group, our research focuses on scalable algorithms for graph-structured data, data analysis methods, and the reliability of large language models (LLMs). Exploring these research directions requires extensive empirical evaluation on large datasets and computationally intensive training procedures. GPU resources are therefore essential for developing and validating our methods. One major research direction in the group focuses on scalable algorithms for graph data. Many problems in modern data science can naturally be modeled as graph problems, including community detection, nearest-neighbor search, and representation learning. Our work investigates algorithmic and learning-based approaches to improve the efficiency and robustness of such methods. This research challenge involves experiments with graph-diffusion models, clustering techniques, and graph neural network architectures. Evaluating these approaches requires repeated large-scale experiments across different datasets and parameter settings. A second research direction studies adaptive data analysis and clustering in dynamic environments where data arrives over time. In these settings, models must update incrementally while remaining consistent with previously computed solutions. Understanding the trade-off between solution quality and stability requires systematic experimentation across multiple datasets and baselines. We also investigate the reliability of large language models, particularly their tendency to produce high-confidence errors under misleading or complex contexts. Our work explores training strategies such as supervised fine-tuning and reinforcement learning to improve reasoning robustness and uncertainty calibration. These experiments involve training and evaluating open-weight language models with billions of parameters, which requires substantial GPU computation. Across these projects, GPU resources are critical for large-scale experimentation and model training. Access to a shared GPU server will allow our group to run extensive experiments, reproduce results across datasets, and explore computationally demanding methods that would otherwise be difficult to evaluate. Our experiments rely on publicly available datasets and standard benchmarks, including reasoning and question-answering tasks. All data will be handled according to responsible research practices, and the resulting models and findings will be used exclusively for academic research and scientific publications.