Michalak, Kurt Felix (2025) Exploring Retrieval in Hybrid SSM-Transformers. Bachelor's Thesis, Artificial Intelligence.
|
Text
Thesispaper.pdf Download (8MB) | Preview |
|
|
Text
akkoord michalak.pdf Restricted to Registered users only Download (180kB) |
Abstract
Hybrid large language models (LLMs) combining state-space models (SSMs) with transformer self-attention layers offer promising computational efficiency while maintaining performance. However, the specific roles of different components remain unclear. This paper investigates retrieval capabilities in hybrid LLMs through systematic experiments on RecurrentGemma-2B, RecurrentGemma-9B, and Jamba-Mini-1.6. Using the Needle-in-a-Haystack benchmark and attention manipulation techniques, we demonstrate that retrieval depends exclusively on self-attention layers. Complete attention ablation causes total retrieval failure across all models, confirming that SSM layers do not contribute to retrieval. Methods to improve SSMs' retrieval abilities fail to recover retrieval capabilities in ablated models. Sparsification experiments reveal that attention layers can be significantly reduced without substantial performance degradation. Systematic attention weight manipulation shows that successful retrieval requires needle token exposure during generation and sufficient context during prefill or generation stages. These findings establish that self-attention layers serve as specialized retrieval modules while SSM and MLP layers handle general language capabilities.
| Item Type: | Thesis (Bachelor's Thesis) |
|---|---|
| Supervisor name: | Jaeger, H. and Abreu, S. |
| Degree programme: | Artificial Intelligence |
| Thesis type: | Bachelor's Thesis |
| Language: | English |
| Date Deposited: | 01 Sep 2025 14:56 |
| Last Modified: | 01 Sep 2025 14:56 |
| URI: | https://fse.studenttheses.ub.rug.nl/id/eprint/36904 |
Actions (login required)
![]() |
View Item |
