Tarunokusumo, Ravindra Aribowo (2025) Boosting Accuracy and Efficiency of Budget Forcing in LLMs via Reinforcement Learning for Mathematical Reasoning. Bachelor's Thesis, Artificial Intelligence.
|
Text
bAI2025TarunokusumoRA.pdf Download (1MB) | Preview |
|
|
Text
Toestemming.pdf Restricted to Registered users only Download (244kB) |
Abstract
Test-time scaling methods have seen a rapid increase in popularity for its computational efficiency and parameter-independent training to improve reasoning performance on Large Language Models. One such method is called budget forcing, a decoding intervention strategy which allocates extra compute budget for thinking and elicits the inherent self-correcting behavior of the model. However, this relies on supervised fine-tuning (SFT) on long-context reasoning traces which causes performance degradation on smaller models due to verbose responses. For this reason, we offer a framework integrating reinforcement learning (RL) to improve token efficiency and boost the performance of a 1.5B model for mathematical reasoning. We demonstrate this using only 1.5K training samples and found that our SFT+RL model performed better on the GSM8K dataset with varying compute budgets. Our main findings showed an overall higher accuracy while significantly reducing its token usage by over 40% compared to the SFT model, revealing how RL can recover the losses due to long-context training and altogether improving performance in mathematical reasoning.
| Item Type: | Thesis (Bachelor's Thesis) |
|---|---|
| Supervisor name: | Fernandes Cunha, R. and Tashu, T.M. |
| Degree programme: | Artificial Intelligence |
| Thesis type: | Bachelor's Thesis |
| Language: | English |
| Date Deposited: | 08 Aug 2025 06:07 |
| Last Modified: | 08 Aug 2025 06:07 |
| URI: | https://fse.studenttheses.ub.rug.nl/id/eprint/36698 |
Actions (login required)
![]() |
View Item |
