Efficient Neural Symbolic Reasoning with Large Language Model
Title: |
Efficient Neural Symbolic Reasoning with Large Language Model |
DNr: |
Berzelius-2025-110 |
Project Type: |
LiU Berzelius |
Principal Investigator: |
Yue Nick Zhongqi <zhongqi@chalmers.se> |
Affiliation: |
Chalmers tekniska högskola |
Duration: |
2025-03-19 – 2025-10-01 |
Classification: |
10210 |
Keywords: |
|
Abstract
Recent advancements demonstrate that training large language models (LLMs) with reinforcement learning (RL), even without supervised fine-tuning, can significantly enhance their reasoning capabilities. Despite this promising direction, two critical limitations remain unresolved. First, large-scale RL training of LLMs is exceptionally compute-intensive, restricting scalability and accessibility. Second, while LLMs trained through RL exhibit emergent reasoning skills, they continue to commit elementary errors, particularly in basic arithmetic tasks. We hypothesize that such errors arise primarily due to the LLMs' insufficient intrinsic understanding of symbolic operations (e.g., arithmetic operators +-*/). To address these challenges, this project proposes a novel approach to improve both the computational efficiency and the accuracy of RL-based training for LLMs. Our method introduces symbolic operation priors directly into the LLM, thus embedding fundamental knowledge about arithmetic and related symbolic manipulations. Furthermore, we propose restricting the RL action space specifically to symbolic operations and their associated operands. By limiting the complexity and scope of potential actions, we significantly reduce computational demands during RL training. This targeted action space not only streamlines the decision-making process but also enhances the precision of learned behaviors, enabling the LLM to better leverage symbolic reasoning skills for complex tasks. Overall, this research aims to establish a more efficient and accurate training paradigm for LLMs through RL by combining symbolic priors and a constrained RL action space. This dual strategy promises to unlock higher-performance language models capable of robust reasoning across diverse domains, thus advancing the practical utility of LLMs in real-world applications.