Learning Dynamic Algorithms for Automated Planning
Title: Learning Dynamic Algorithms for Automated Planning
SNIC Project: SNIC 2022/5-341
Project Type: SNIC Medium Compute
Principal Investigator: Jendrik Seipp <jendrik.seipp@liu.se>
Affiliation: Linköpings universitet
Duration: 2022-07-01 – 2023-07-01
Classification: 10201
Homepage: https://rlplab.com/projects/


We apply for computational resources for a project that has recently been funded by WASP. Connecting the fields of model-based reasoning and data-driven learning has recently been identified as one of the key research goals in artificial intelligence. Our project will contribute to this endeavor, focusing on the area of automated planning. We will learn heuristic functions that guarantee optimal solutions and planning algorithms that dynamically adapt to the given task. Automated planning is the task of finding a sequence of actions that lets an agent achieve their goals. We focus on the classical planning setting, where the agent's world is fully observable and all action outcomes are deterministic. Today's classical planning systems are powerful enough to solve challenging, real-world planning tasks. The strongest method for solving classical planning tasks is state-space search, where we search for a solution path in a directed graph of world states. This method strongly relies on heuristics, that is, functions that estimate how far a state is away from the nearest goal. Planning researchers have manually devised a large collection of different heuristics and state-space search algorithms. These algorithms are able to solve challenging planning tasks, but they usually require a lot of work to be devised, understood and implemented. More importantly, the design spaces for heuristics and search algorithms are huge and it is infeasible to manually explore them in any meaningful way. In this project, we will use machine learning to learn heuristics and search algorithms automatically. For the first objective, we address one of the main drawbacks of heuristic functions approximated by machine learning models: they usually give no guarantees about the learned function. However, for optimal planning we need heuristic functions that never overestimate the true goal distance. Therefore, instead of learning the heuristic directly, we will learn policies for building heuristics that never overestimate goal distances by design. Ours will be the first study that learns a policy for building heuristics, not heuristics directly, and the first study that uses machine learning to build admissible heuristics. Both of these ideas are likely to spark other research that continues in this direction. For the second objective, we consider not only planning heuristics, but full planning algorithms. Today's planners are highly-parameterized systems and different planning tasks require different planner configurations to be solved efficiently. Consequently, researchers have developed domain-independent systems for automatically choosing which planner configuration to run for a new task. However, they all use the same static configuration for the whole solving process, even though different parts of the search space might require different algorithms. For example, it can be beneficial to switch between several different heuristics while solving a single planning task. Therefore, we will learn planning algorithms that dynamically adapt their behavior to different parts of the search space while they search for solutions. Ours will be the first principled study of a fully dynamic, domain-independent planner and it will be exciting to see how far we can push the state of the art with it.