Sawala, Lukasz (2025) Low-Latency Language-Action Foundation Models via Upside-Down RL. Bachelor's Thesis, Artificial Intelligence.
|
Text
BachelorThesisLukasz-5-1.pdf Download (2MB) | Preview |
|
|
Text
toestemming sawala.pdf Restricted to Registered users only Download (179kB) |
Abstract
This paper explores the Upside-Down Reinforcement Learning (UDRL) algorithm, an offline RL paradigm, introducing novel transformer-based architectures to create a scalable and controllable framework for efficient low-resource command-conditioned behavior in complex state-action spaces. Two architectures are proposed: UDRLt and UDRLt-MLP, both leverag- ing lightweight transformers for efficient control in continuous action spaces. Results show that UDRLt-MLP significantly outperforms the Decision Transformer baseline and achieves higher alignment with desired outcomes, even under out-of-distribution commands, while requiring only a fraction of computational resources. In more challenging transfer settings like AntMaze, fine- tuning and iterative self-improvement via rollout-based imitation partially recover performance, though limitations in dataset quality persist. A self-imitation algorithm is proposed to mitigate data scarcity issues. The findings highlight UDRL’s potential as a foundation for scalable and aligned control systems while identifying issues and future research directions.
| Item Type: | Thesis (Bachelor's Thesis) |
|---|---|
| Supervisor name: | Cardenas Cartagena, J. D. and Sabatelli, M. |
| Degree programme: | Artificial Intelligence |
| Thesis type: | Bachelor's Thesis |
| Language: | English |
| Date Deposited: | 20 Aug 2025 09:26 |
| Last Modified: | 20 Aug 2025 09:26 |
| URI: | https://fse.studenttheses.ub.rug.nl/id/eprint/36798 |
Actions (login required)
![]() |
View Item |
