Home Knowledge Base Steering Vectors (Activation Engineering)

Steering Vectors (Activation Engineering) are the interpretability and control technique that modifies model behavior at inference time by adding learned direction vectors to internal activations — enabling researchers to amplify, suppress, or redirect specific model behaviors and mental states without retraining, by directly writing to the model's "thoughts" during forward passes.

What Are Steering Vectors?

Why Steering Vectors Matter

Finding Steering Vectors

Method 1 — Contrastive Activation Difference:

Method 2 — Linear Probe Direction:

Method 3 — SAE Feature Directions:

Applying Steering Vectors

Addition: h_new = h_old + α × v_concept

Layer Selection:

Demonstrated Results

Limitations and Challenges

Steering Vectors vs. Other Control Methods

MethodCostPermanencePrecisionSafety Risk
System promptVery lowPer-sessionLowLow
Fine-tuningHighPermanentMediumMedium
RLHFVery highPermanentHighMedium
Steering vectorsVery lowPer-inferenceMediumLow-Medium
SAE feature ablationLowPer-inferenceHighLow

Steering vectors are the first hint of a cognitive remote control for AI systems — by demonstrating that concepts, emotions, and behavioral tendencies can be reliably amplified or suppressed through activation manipulation, steering vector research is building the foundation for interpretability-based alignment tools that may one day enable precise, verifiable control over AI behavior without the opacity of behavioral fine-tuning.

steering vectoractivationcontrol

Explore 500+ Semiconductor & AI Topics

From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.