Javascript must be enabled for the correct page display

Boosting Accuracy and Efficiency of Budget Forcing in LLMs via Reinforcement Learning for Mathematical Reasoning

Tarunokusumo, Ravindra Aribowo (2025) Boosting Accuracy and Efficiency of Budget Forcing in LLMs via Reinforcement Learning for Mathematical Reasoning. Bachelor's Thesis, Artificial Intelligence.

[img]
Preview
Text
bAI2025TarunokusumoRA.pdf

Download (1MB) | Preview
[img] Text
Toestemming.pdf
Restricted to Registered users only

Download (244kB)

Abstract

Test-time scaling methods have seen a rapid increase in popularity for its computational efficiency and parameter-independent training to improve reasoning performance on Large Language Models. One such method is called budget forcing, a decoding intervention strategy which allocates extra compute budget for thinking and elicits the inherent self-correcting behavior of the model. However, this relies on supervised fine-tuning (SFT) on long-context reasoning traces which causes performance degradation on smaller models due to verbose responses. For this reason, we offer a framework integrating reinforcement learning (RL) to improve token efficiency and boost the performance of a 1.5B model for mathematical reasoning. We demonstrate this using only 1.5K training samples and found that our SFT+RL model performed better on the GSM8K dataset with varying compute budgets. Our main findings showed an overall higher accuracy while significantly reducing its token usage by over 40% compared to the SFT model, revealing how RL can recover the losses due to long-context training and altogether improving performance in mathematical reasoning.

Item Type: Thesis (Bachelor's Thesis)
Supervisor name: Fernandes Cunha, R. and Tashu, T.M.
Degree programme: Artificial Intelligence
Thesis type: Bachelor's Thesis
Language: English
Date Deposited: 08 Aug 2025 06:07
Last Modified: 08 Aug 2025 06:07
URI: https://fse.studenttheses.ub.rug.nl/id/eprint/36698

Actions (login required)

View Item View Item