Maximum Entropy Diffusion Policies for Offline Reinforcement Learning
| Title: |
Maximum Entropy Diffusion Policies for Offline Reinforcement Learning |
| DNr: |
Berzelius-2024-29 |
| Project Type: |
LiU Berzelius |
| Principal Investigator: |
Per Mattsson <per.mattsson@it.uu.se> |
| Affiliation: |
Uppsala universitet |
| Duration: |
2024-01-22 – 2024-08-01 |
| Classification: |
20202 |
| Keywords: |
|
Abstract
The core idea is to construct a tractable stochastic differential equation (SDE) and then sample actions with its reverse-time process, as a diffusion policy. Since the proposed SDE is tractable we can obtain the log probability of the policy as well as an estimation of the action given a sampled diffusion step. The entropy term introduces diversity to pre-collected actions, yielding a robust value function estimation and expressive policy generation for offline reinforcement learning.