Home Knowledge Base LoRA (Low-Rank Adaptation)

LoRA (Low-Rank Adaptation) is the parameter-efficient fine-tuning technique that adapts a large pre-trained model to new tasks by injecting small, trainable low-rank decomposition matrices into each Transformer layer — freezing the original weights entirely while training only 0.1-1% of the total parameters, achieving fine-tuning quality comparable to full-parameter training at a fraction of the memory and compute cost.

The Low-Rank Hypothesis

Full fine-tuning updates every parameter in the model, but research shows that the weight changes (delta-W) during fine-tuning occupy a low-dimensional subspace. LoRA exploits this: instead of updating a d×d weight matrix W directly, it learns a low-rank decomposition delta-W = B × A, where A is d×r and B is r×d, with rank r << d (typically 8-64). This reduces trainable parameters from d² to 2dr — a massive compression.

How LoRA Works

1. Freeze: All original model weights W are frozen (no gradients computed). 2. Inject: For selected weight matrices (typically query and value projections in attention, plus up/down projections in MLP), add parallel low-rank branches: output = Wx + (BA)x. 3. Train: Only matrices A and B are trained. A is initialized with random Gaussian values; B is initialized to zero (so the initial delta-W = 0, preserving the pre-trained model exactly). 4. Merge: After training, the learned delta-W = BA can be merged into the original weights: W_new = W + B*A. The merged model has zero additional inference latency.

Key Hyperparameters

QLoRA

Quantized LoRA loads the frozen base model in 4-bit quantization (NF4 data type) while training the LoRA adapters in full precision. This enables fine-tuning a 65B parameter model on a single 48GB GPU — a task that would otherwise require 4-8 GPUs with full fine-tuning.

Practical Advantages

LoRA is the technique that made LLM customization accessible to everyone — enabling fine-tuning of billion-parameter models on consumer hardware while preserving the full quality of the pre-trained foundation.

lora low rank adaptationparameter efficient fine tuning peftlora adapter trainingqlora quantized loralora rank alpha

Explore 500+ Semiconductor & AI Topics

From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.