Postmus, Joris (2024) Steering Large Language Models using Conceptors: An Alternative to Point-Based Activation Engineering. Bachelor's Thesis, Artificial Intelligence.
|
Text
BScThesisJorisPostmus.pdf Download (1MB) | Preview |
|
Text
toestemming.pdf Restricted to Registered users only Download (129kB) |
Abstract
While large language models (LLMs) have revolutionized the field of artificial intelligence, reliably controlling their outputs remains a pressing challenge. This project aims to improve a proposed technique called activation engineering where the outputs of pre-trained LLMs are controlled by directly manipulating the models’ activations at inference time. Traditionally, this manipulation involves the addition of a steering vector onto the models’ activations at a specific stage in the processing pipeline. In contrast to representing the steering target as a single point (vector) in high-dimensional space, we explore the use of conceptors, mathematical objects that can represent a set of activation vectors as ellipsoidal regions in their high-dimensional space. For steering purposes, we use conceptors as (soft) projection matrices that can tune a given activation vector toward a steering target. Due to the aforementioned properties, we hypothesize that conceptors provide more precise control over capturing and steering toward complex activational representations compared to point-based methods. Our experiments show that compared to traditional point-based steering methods, conceptors, especially when combined with a performance enhancement called mean-centering, achieve higher accuracy across multiple steering tasks. These findings suggest that conceptors are a promising tool for effectively controlling the outputs of LLMs, paving the way for further research to fully establish their utility.
Item Type: | Thesis (Bachelor's Thesis) |
---|---|
Supervisor name: | Abreu, S. and Jaeger, H. |
Degree programme: | Artificial Intelligence |
Thesis type: | Bachelor's Thesis |
Language: | English |
Date Deposited: | 23 Aug 2024 10:20 |
Last Modified: | 11 Sep 2024 07:31 |
URI: | https://fse.studenttheses.ub.rug.nl/id/eprint/33973 |
Actions (login required)
View Item |