Next-Token Prediction for Planning - Training and Applications
Title: |
Next-Token Prediction for Planning - Training and Applications |
DNr: |
Berzelius-2024-399 |
Project Type: |
LiU Berzelius |
Principal Investigator: |
Jendrik Seipp <jendrik.seipp@liu.se> |
Affiliation: |
Linköpings universitet |
Duration: |
2024-10-08 – 2025-05-01 |
Classification: |
10201 |
Keywords: |
|
Abstract
Planning is a fundamental problem in AI, with applications in robotics, logistics, and many other fields. The aim of planning is to find a sequence of actions that transforms an initial state into a desired goal state. Traditionally, this has been done using classical planners. To enable these planners to handle larger problems, various approaches have been used, such as guiding the search with heuristics or pruning the search space.
However, with the prevalence of large language models attempts have been made to use them pre-trained for planning, but this has largely failed (https://arxiv.org/abs/2206.10498), even with specialized LLMs designed for reasoning (https://www.arxiv.org/abs/2409.13373). In particular, these LLM approaches tend to scale badly to longer plans and domains without intuitive solutions. Similarly, research was recently released which trained an LLM only on planning tasks within fixed domains (https://ojs.aaai.org/index.php/ICAPS/article/view/31510), which also generalizes poorly to new problem sizes and additionally suffered from overfitting issues during training. As planning problems generally have several possible solutions, we propose to train an LLM on datasets where *several* possible plans are included for each problem. We hypothesise that this will allow the LLM to better generalize to new problem sizes while also addressing the overfitting issues.
Additionally, many current approaches for using LLMs for planning are naive, hoping that the LLM successfully predicts the entire plan at once. We further propose to utilize the trained LLM as a tool along with a classical planner, for example using the LLM to initialize planners or to prune actions which are unlikely according to the LLM. This will allow the classical planners to handle otherwise untenable problem sizes, while guaranteeing plan correctness. As planners generally operate under strict time constraints, we will also investigate less capable, but faster, next-token prediction models to see if the speed-up compensates for the weaker estimates.
We aim to publish at least one conference paper at an A* conference on the topic, and to apply the techniques in the international Tuples competition (https://tuples.ai/competition-challenge/) hosted by Airbus. We will also release the code and models used in the project to the public to allow for further research.