Maximum Entropy Diffusion Policies for Offline Reinforcement Learning
|Maximum Entropy Diffusion Policies for Offline Reinforcement Learning
|Per Mattsson <firstname.lastname@example.org>
|2024-01-22 – 2024-08-01
The core idea is to construct a tractable stochastic differential equation (SDE) and then sample actions with its reverse-time process, as a diffusion policy. Since the proposed SDE is tractable we can obtain the log probability of the policy as well as an estimation of the action given a sampled diffusion step. The entropy term introduces diversity to pre-collected actions, yielding a robust value function estimation and expressive policy generation for offline reinforcement learning.