AI-powered automated theorem proving
Title: AI-powered automated theorem proving
DNr: Berzelius-2025-166
Project Type: LiU Berzelius
Principal Investigator: Hamed Nemati <hnnemati@kth.se>
Affiliation: Kungliga Tekniska högskolan
Duration: 2025-05-17 – 2025-12-01
Classification: 10210
Keywords:

Abstract

The principal goal of this project is to gain insights into the challenges presented by LLM-based theorem proving and improve on the current state-of-the-art approaches. We also seek to adapt LLM-based theorem proving to software verification, if possible. The impact resulting from meeting the goal is primarily an improvement of the current understanding of AI reasoning and, in practical terms, better automated tools (in the form of LLMs) for proving mathematical theorems. This project is primarily motivated by the fact that AI has provided tremendous progress in scientific domains such as biology and physics; therefore, it is only natural to seek understanding in whether AI can also provide valuable assistance in our pursuit of "mother of all sciences” — mathematics. Our decision to focus on theorem proving (i.e., mathematics formalized using computers) is based on the fact that LLMs are known to "hallucinate", i.e., they provide false results with apparent confidence. This makes verification exercises, which rely on AI as an oracle to facilitate analysis, costly and unreliable as data grows. To address this issue, computer-based proof assistants such as Lean and Isabelle/HOL can be used to verify whether a proof is correct. In doing so, we avoid the risk of mistakenly accepting an incorrect proof by ensuring that language models generate proofs that can be validated by a proof assistant. We also decide to focus on LLM-based approaches, which in this context means we use certain pre-trained open source foundation models as basis and improve its capability in generating formal proofs using supervised fine-tuning and/or other routines, because LLMs have been thus far proven to be the most productive approach for mathematical reasoning, indicated by the state-of-the-art papers ([1], [2], [3]) on several benchmarks for formalized mathematics such as miniF2F [4] and PutnamBench [5]. [1] Wang, Haiming, et al. "Kimina-Prover Preview: Towards Large Formal Reasoning Models with Reinforcement Learning." arXiv preprint arXiv:2504.11354 (2025). [2] Dong, Kefan, and Tengyu Ma. "STP: Self-play LLM Theorem Provers with Iterative Conjecturing and Proving." arXiv e-prints (2025): arXiv-2502. [3] Lin, Yong, et al. "Goedel-Prover: A Frontier Model for Open-Source Automated Theorem Proving." arXiv preprint arXiv:2502.07640 (2025). [4] Zheng, Kunhao, Jesse Michael Han, and Stanislas Polu. "Minif2f: a cross-system benchmark for formal olympiad-level mathematics." arXiv preprint arXiv:2109.00110 (2021). [5] Tsoukalas, George, et al. "Putnambench: A multilingual competition-mathematics benchmark for formal theorem-proving." AI for Math Workshop@ ICML 2024. 2024.