Uncertainty for model based reinforcement learning
Title: Uncertainty for model based reinforcement learning
DNr: NAISS 2025/22-195
Project Type: NAISS Small Compute
Principal Investigator: Emilio Jorge <emilio.jorge@chalmers.se>
Affiliation: Chalmers tekniska högskola
Duration: 2025-02-13 – 2026-03-01
Classification: 10207
Keywords:

Abstract

We aim to develop novel approaches that are capable of producing in in a way that appropriately reflects underlying uncertainty for reinforcement learning. We are looking into approximate posterior sampling methods using Langevin/Hamiltonian dynamics for both neural networks and other representations to guide agents in their actions. Depending on the environments used, both gpu and cpu resources are more suitable. In the case of more advanced environments and larger neural networks, then GPU is a significant speedup and will be used. In the case of smaller environments, then CPU resources are more suitable as GPUs don't really give a speedup. The cpu experiments are mostly finished, they will probably only be needed if some experiments from my paper that is currently in submission needs anything rerun. Unfortunately some cpu experiments are very slow, they may take 2-3 days but only 1-3 cores. I have reduced the hours there to reflect this. The GPU experiments are of two kinds. All experiments will use single GPU and do not require any unusual v-ram, as such the cheaper GPUs can be used. First I want to calculate isoperimetry constants for a few datasets. These experiments take somewhere around 4-12 hours depending on the dataset. I need to do minimum 10 repetitions for each of the 6 datasets, and a few different configurations would be good. Secondly, I am running reinforcement learning experiments, these are a bit unclear on how long each of them takes. But 8 hours each, 10 repetitions for say 4 different environments, with some different configurations and experimentation as needed. I expect to submit my thesis during the spring, as such the project could be made shorter if that is of benefit to you. For example until the end of June.