Scalable Zero-Shot Reinforcement Learning

System

NSC Web

Front Page

Getting Access

Support Email

support@nsc.liu.se

Feedback

Give Feedback

Scalable Zero-Shot Reinforcement Learning

Title:	Scalable Zero-Shot Reinforcement Learning
DNr:	Berzelius-2024-467
Project Type:	LiU Berzelius
Principal Investigator:	Stefan Stojanovic <stesto@kth.se>
Affiliation:	Kungliga Tekniska högskolan
Duration:	2024-11-30 – 2025-06-01
Classification:	10201
Keywords:

Abstract

This project focuses on advancing the field of zero-shot reinforcement learning (ZSRL), a recent and promising framework in reinforcement learning. ZSRL aims to enable goal-independent learning, where an agent, after completing an initial reward-free learning phase, can solve any task within a given environment instantly, without requiring further planning or learning. This marks a significant shift away from traditional reward-focused reinforcement learning, paving the way for agents that can seamlessly adapt to arbitrary instructions. Current RL methods, by contrast, are constrained to solving narrow families of related tasks or rely on extensive task-specific planning. Our goal is to improve the robustness and scalability of ZSRL methods by developing algorithms that can generalize effectively across the space of possible goals. One key challenge we aim to address is enhancing the sample efficiency of these algorithms, enabling them to learn and adapt with fewer interactions. To demonstrate the effectiveness of our approaches, we plan to evaluate the proposed algorithms on a diverse set of environments, each demanding a significant number of interactions to achieve practical relevance. Access to GPU resources is essential for this work, as training and evaluating ZSRL agents on large-scale environments require substantial computational power. The outcomes of this project have the potential to push the boundaries of generalization in reinforcement learning, contributing to the development of versatile, goal-driven agents capable of operating effectively in complex, dynamic environments.

National Supercomputer Centre at Linköping University

Abstract