Integrating Human Feedback and LLMs in Reward Design for Autonomous Driving
Title: Integrating Human Feedback and LLMs in Reward Design for Autonomous Driving
DNr: Berzelius-2023-348
Project Type: LiU Berzelius
Principal Investigator: Amy Loutfi <>
Affiliation: Örebro universitet
Duration: 2023-12-20 – 2024-07-01
Classification: 10207


The challenge of reward design in autonomous driving is complex, involving the creation of multiple sub-rewards with appropriate importance weights. Traditional methods like Inverse Reinforcement Learning face issues like lack of interpretability, data hunger, and overfitting. Our project hypothesizes that reward design can be modeled as an alignment problem, ensuring that AI behavior aligns with human notions of safety, reliability, and performance. We propose to use Large Language Models (LLMs) to generate interpretable and contextually relevant rewards for (simulated) autonomous driving. These rewards will be more adaptable and transferable to similar setups, addressing the existing challenges in reward design. A key aspect of our approach is the iterative generation and refinement of reward functions through LLMs, improved by human feedback on generated behavior trajectories. Our objectives include aligning rewards with human preferences, incorporating safety constraints, mitigating unintended behavior, and improving the generalizability of these systems. Our project will leverage the latest advancements in LLMs and human-in-the-loop methodologies to create a more efficient, safe, and reliable autonomous driving experience. This approach not only promises significant advancements in autonomous vehicle technology but also opens avenues for further research in AI-driven technologies where human preferences and safety are paramount.