Learning Domain Policies for Classical Planning
Two prominent approaches to solving a problem in classical planning are: using heuristic search to solve the given problem; or using a policy tailored towards the domain of the given problem. The first approach is often very computationally expensive whilst the second is very efficient. However, finding a policy that generalises over a domain is frequently computationally infeasible. In other words, policy-based approaches requires an expensive learning step whose result is then reused for all problems in the domain. This aspect is also found in most machine learning methods. Currently, policies are often learned using SAT programs that do not scale well.
We showed in an earlier project that deep learning can be used successfully to learn both optimal and suboptimal policies for many domains. Additionally, since we used an architecture based on graph neural networks, the expressive power of the trained models are well-understood theoretically and empirically. We will continue this line of research in this project and investigate if: (1) different outputs can be learned; (2) the expressive power can be increased; and (3) it can be combined with neurosymbolic programming.