Exploring Retrieval in Hybrid SSM-Transformers

Michalak, Kurt Felix (2025) Exploring Retrieval in Hybrid SSM-Transformers. Bachelor's Thesis, Artificial Intelligence.

Preview

Text
Thesispaper.pdf
Download (8MB) | Preview

Text
akkoord michalak.pdf
Restricted to Registered users only
Download (180kB)

Abstract

Hybrid large language models (LLMs) combining state-space models (SSMs) with transformer self-attention layers offer promising computational efficiency while maintaining performance. However, the specific roles of different components remain unclear. This paper investigates retrieval capabilities in hybrid LLMs through systematic experiments on RecurrentGemma-2B, RecurrentGemma-9B, and Jamba-Mini-1.6. Using the Needle-in-a-Haystack benchmark and attention manipulation techniques, we demonstrate that retrieval depends exclusively on self-attention layers. Complete attention ablation causes total retrieval failure across all models, confirming that SSM layers do not contribute to retrieval. Methods to improve SSMs' retrieval abilities fail to recover retrieval capabilities in ablated models. Sparsification experiments reveal that attention layers can be significantly reduced without substantial performance degradation. Systematic attention weight manipulation shows that successful retrieval requires needle token exposure during generation and sufficient context during prefill or generation stages. These findings establish that self-attention layers serve as specialized retrieval modules while SSM and MLP layers handle general language capabilities.

Item Type:	Thesis (Bachelor's Thesis)
Supervisor name:	Jaeger, H. and Abreu, S.
Degree programme:	Artificial Intelligence
Thesis type:	Bachelor's Thesis
Language:	English
Date Deposited:	01 Sep 2025 14:56
Last Modified:	01 Sep 2025 14:56
URI:	https://fse.studenttheses.ub.rug.nl/id/eprint/36904

Actions (login required)

View Item