Maximum Entropy Diffusion Policies for Offline Reinforcement Learning

System

NSC Web

Front Page

Getting Access

Support Email

support@nsc.liu.se

Feedback

Give Feedback

Maximum Entropy Diffusion Policies for Offline Reinforcement Learning

Title:	Maximum Entropy Diffusion Policies for Offline Reinforcement Learning
DNr:	Berzelius-2024-29
Project Type:	LiU Berzelius
Principal Investigator:	Per Mattsson <per.mattsson@it.uu.se>
Affiliation:	Uppsala universitet
Duration:	2024-01-22 – 2024-08-01
Classification:	20202
Keywords:

Abstract

The core idea is to construct a tractable stochastic differential equation (SDE) and then sample actions with its reverse-time process, as a diffusion policy. Since the proposed SDE is tractable we can obtain the log probability of the policy as well as an estimation of the action given a sampled diffusion step. The entropy term introduces diversity to pre-collected actions, yielding a robust value function estimation and expressive policy generation for offline reinforcement learning.

National Supercomputer Centre at Linköping University

Abstract