Zani, Davide (2025) Exploring the Behavior of Outlier Channels in Hybrid-LLM Architectures. Bachelor's Thesis, Artificial Intelligence.
|
Text
BSc-Thesis.pdf Download (13MB) | Preview |
|
|
Text
Toestemming.pdf Restricted to Registered users only Download (200kB) |
Abstract
Transformer-based Large Language Models excel through self-attention mechanisms that capture complex dependencies across sequences, but their quadratic computational scaling is inefficient. State Space Models (SSMs) offer an alternative approach with linear scaling, yet they are limited by their fixed-size latent state, which limits their memory abilities. Hybrid architectures like RecurrentGemma emerge as a promising alternative, combining the efficiency of SSMs with the performance of attention. All architectures are faced with a common challenge for quantization: outlier channels. While these have been studied in pure transformer and SSM architectures, their distribution and impact in hybrid models remains underexplored. This study investigates outlier channels in RecurrentGemma-2B using statistical analysis and controlled clipping experiments across six benchmarks. We identified that most outlier channels are concentrated in MLP blocks and deeper layers, with late layers containing 44.4% more outliers than early layers. Clipping experiments revealed that aggressive outlier suppression caused severe performance degradation, while conservative thresholds maintained near-baseline performance. Mathematical reasoning tasks showed greatest sensitivity to outlier removal, whereas common-sense reasoning tasks were more robust. Results demonstrate that outlier channels are essential for model functionality.
| Item Type: | Thesis (Bachelor's Thesis) |
|---|---|
| Supervisor name: | Abreu, S. and Jaeger, H. |
| Degree programme: | Artificial Intelligence |
| Thesis type: | Bachelor's Thesis |
| Language: | English |
| Date Deposited: | 28 Jul 2025 11:01 |
| Last Modified: | 28 Jul 2025 11:01 |
| URI: | https://fse.studenttheses.ub.rug.nl/id/eprint/36560 |
Actions (login required)
![]() |
View Item |
