Javascript must be enabled for the correct page display

Steering Large Language Models using Conceptors: An Alternative to Point-Based Activation Engineering

Postmus, Joris (2024) Steering Large Language Models using Conceptors: An Alternative to Point-Based Activation Engineering. Bachelor's Thesis, Artificial Intelligence.

[img]
Preview
Text
BScThesisJorisPostmus.pdf

Download (1MB) | Preview
[img] Text
toestemming.pdf
Restricted to Registered users only

Download (129kB)

Abstract

While large language models (LLMs) have revolutionized the field of artificial intelligence, reliably controlling their outputs remains a pressing challenge. This project aims to improve a proposed technique called activation engineering where the outputs of pre-trained LLMs are controlled by directly manipulating the models’ activations at inference time. Traditionally, this manipulation involves the addition of a steering vector onto the models’ activations at a specific stage in the processing pipeline. In contrast to representing the steering target as a single point (vector) in high-dimensional space, we explore the use of conceptors, mathematical objects that can represent a set of activation vectors as ellipsoidal regions in their high-dimensional space. For steering purposes, we use conceptors as (soft) projection matrices that can tune a given activation vector toward a steering target. Due to the aforementioned properties, we hypothesize that conceptors provide more precise control over capturing and steering toward complex activational representations compared to point-based methods. Our experiments show that compared to traditional point-based steering methods, conceptors, especially when combined with a performance enhancement called mean-centering, achieve higher accuracy across multiple steering tasks. These findings suggest that conceptors are a promising tool for effectively controlling the outputs of LLMs, paving the way for further research to fully establish their utility.

Item Type: Thesis (Bachelor's Thesis)
Supervisor name: Abreu, S. and Jaeger, H.
Degree programme: Artificial Intelligence
Thesis type: Bachelor's Thesis
Language: English
Date Deposited: 23 Aug 2024 10:20
Last Modified: 11 Sep 2024 07:31
URI: https://fse.studenttheses.ub.rug.nl/id/eprint/33973

Actions (login required)

View Item View Item