Graph-based, spatial, temporal, and generative machine learning
Title: |
Graph-based, spatial, temporal, and generative machine learning |
DNr: |
Berzelius-2025-314 |
Project Type: |
LiU Berzelius |
Principal Investigator: |
Fredrik Lindsten <fredrik.lindsten@liu.se> |
Affiliation: |
Linköpings universitet |
Duration: |
2025-10-01 – 2026-04-01 |
Classification: |
10210 |
Keywords: |
|
Abstract
This is a joint proposal for 7 separate projects with the same PI including 5 PhD students and one postdoc.
These projects investigate method development related to graph-based, spatio-temporal, and generative machine learning, with important applications such as material science and weather forecasting.
It is a continuation of previous projects which have used Berzelius, and which have resulted in publications in top-tier AI/ML venues such as NeurIPS, ICLR, ICML and AISTATS , and we aim to continue to publish at top tier venues withing the area of AI/ML. The projects are outlined below.
Probabilistic Weather Modelling with Machine Learning:
Machine learning has recently shown promising potential for weather modelling. Traditionally, this has been done with numerical methods based on differential equations but this is very slow and computationally expensive compared to machine learning based systems. In this project we will continue working on improving our previous work [1] within probabilistic limited area modelling. Extending this work to more realistic settings [2] with high-resolution real weather data will significantly increase the computational demands for both training and inference. Access to powerful GPUs and efficient multi-node training is therefore essential. The resources provided by Berzelius are particularly well-suited to meet these requirements and will be a key enabler for the success of this project. Additionally, we will work on using machine learning for the data assimilation task which solves the filtering problem to give initial conditions for forecasting models based on observations and previous forecasts together. We will also investigate the use of probabilistic machine learning models for ocean modelling, potentially coupling the two systems as is commonly done in numerical weather modelling. Inspired by the success of machine learning for short to medium range weather forecasts we have also initated a project to work on machine learning for sub-seasonal to seasonal. Together with SMHI we are working on downscaling global climate predictions. Using machine learning to get high resolution regional climate predictions has the potential to be a lot faster and less computationally expensive than using a traditional regional climate model. This project is funded by WASP through the NEST-project main: Multi-dimensional Alignment and Integration of Physical and Virtual Worlds (https://wasp-sweden.org/multi-dimensional-alignment-and-integration/). In the different subprojects we have collaborators from SMHI, Danish Meteorological Institute, Finnish Center for Artificial Intelligence and California Institute of Technology.
[1] Erik Larsson, Joel Oskarsson, Tomas Landelius, and Fredrik Lindsten. Diffusion-LAM: Probabilistic Limited Area Weather Forecasting with Diffusion. In ICLR 2025 Workshop on Tackling Climate Change with Machine Learning, 2025. Available at: https://www.climatechange.ai/papers/iclr2025/36
[2] Simon Adamov, Joel Oskarsson, Leif Denby, Tomas Landelius, Kasper Hintz, Simon Christiansen, Irene Schicker, Carlos Osuna, Fredrik Lindsten, Oliver Fuhrer, Sebastian Schemm. Building Machine Learning Limited Area Models: Kilometer-Scale Weather Forecasting in Realistic Settings. preprint, under review, 2025.
By: Erik Larsson
Learning physical trajectories for flow-based generative models applied to weather forecasting:
Diffusion and flow-based models have demonstrated great success in modelling complex stochastic spatio-temporal problems. The key innovation in these methods is their simulation-free training objective that regresses a neural network against a user-defined vector field. However, these vector fields are often chosen as linear interpolations of the data and does as such not produce physically meaningful trajectories for the intermediate steps. In this project we aim to build on these ideas to develop new ways of generating probabilistic forecasts that better capture these trajectories. This project is a collaboration with a researcher at the University of Amsterdam building on their method from ICML 2025 and the insights from our previous work from ICLR 2025.
This project is funded by WASP through the NEST-project main: Multi-dimensional Alignment and Integration of Physical and Virtual Worlds (https://wasp-sweden.org/multi-dimensional-alignment-and-integration/).
By: Martin Andrae
Solving inverse problems with pretrained generative models
Generative models provide a strong prior over data distributions and have been used in solving inverse problems. Many existing methods are developed for and demonstrated to work well with natural images. This project aims to develop posterior sampling algorithms specifically for scientific problems such as novel material generation and drug discovery. Current optimization based are not well suited for these modalities and present an avenue for specialized method development. Training and inferring with these modern generative models are computationally intensive, for which Berzelius could be a valuable infrastructure resource.
By: Adhithyan Kalaivanan
Guided discrete diffusion for inverse design of materials
Our previous work [1], a collaboration with the Theoretical Physics division at Linköping University, developed a new generative method for generation of materials and demonstrated the effectiveness of the method in generating materials in a proof-of-concept. While the previous work focused on the development of the method, we now aim to put this model to use for exploring the true possibilities of discovering new materials with useful properties beyond a proof-of-concept. This involves some method development for guiding the model towards materials with desired properties, but also more large-scale experiments like training on a larger dataset and generating more materials for qualitative evaluation.
By: Filip Ekström Kelvinius and Dong Qian
[1] Ekström Kelvinius, F., Andersson, O. B., Parackal, A. S., Qian, D., Armiento, R., & Lindsten, F. (2025). WyckoffDiff-A Generative Diffusion Model for Crystal Symmetry, ICML 2025
By: Filip Ekström Kelvinius and Dong Qian
Geospatial embeddings in machine learning weather models
Recent works in geospatial AI have created learned embeddings of different locations on earth [1, 2]. By harnessing large amounts of satellite observations, such pre-trained embeddings can encode detailed information about the characteristics of earth’s surface at specific locations. Meanwhile, a challenge in machine learning models for weather is to accurately take into account complex information about the surface when predicting the dynamics of the atmosphere. This is the motivation for this project, to investigate if and how such learned geo-physical embeddings can be used to improve the local forecasts of such weather models.
[1] Klemmer, Konstantin, et al. "Satclip: Global, general-purpose location embeddings with satellite imagery." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 39. No. 4. 2025.
[2] Brown, Christopher F., et al. "AlphaEarth Foundations: An embedding field model for accurate and efficient global mapping from sparse label data." arXiv preprint arXiv:2507.22291 (2025).
By: Shashi Nagarajan
Machine Learning for Source Area Mapping of Greenhouse Gas Emissions
In this project, we aim to implement and evaluate state-of-the-art machine learning (ML) models for the gas transport inference problem, with a particular focus on source-area mapping. This is a challenging computational task: by estimating gas transport within a three-dimensional region of interest, recorded concentrations can be traced backward in time to identify their origins. One strategy involves using computational fluid dynamics (CFD) simulations to approximate air transport under specified 3D geometries and wind conditions, for which Berzelius could be a valuable infrastructure resource.
By: Dong Qian
Denoising Diffusion-based Sequential Monte Carlo Sampler
Diffusion models are a class of generative models known for their state-of-the-art performance across different tasks. Their key idea is to employ a noise diffusion process to gradually transform a complex data distribution into a simpler one. Samples can then be generated by approximating the time-reversal of this forward diffusion process. In this project, we aim to leverage this framework to approximately sample from a given a target distribution. Specifically, we consider a coupled system of SDE and ODE that govern samples and their associated weights, with the weights serving to correct potential biases. This setup is closely related to Sequential Monte Carlo.
By: Dong Qian