Retrieval-Augmented FT & Activation Steering (v0.62.0)

Five surfaces wired together: RAFT data format, RA-DIT two-stage pipeline, citation-faithful fine-tuning, activation steering vectors (CAA / ITI / RepE), and the GRACE codebook.

`data.format: raft` — Stanford RAFT data

yaml
data:
  train: ./raft_data.jsonl
  format: raft
  max_length: 4096

Per-row schema:

json
{
  "query": "What is the capital of France?",
  "golden_doc": "France is a country in Europe...",
  "distractor_docs": ["Spain is...", "Germany is..."],
  "answer": "Paris"
}

The model learns to use retrieved context (golden + distractors) to answer queries — context-aware completion, not isolated Q&A.

`training.ra_dit_stage` — two-stage pipeline

RA-DIT (Retrieval-Augmented Dual Instruction Tuning) chains:

yaml
# Stage 1: contrastive retriever
task: embedding
training:
  ra_dit_stage: retriever
  ra_dit_retriever_model: sentence-transformers/all-MiniLM-L6-v2
yaml
# Stage 2: RAFT generator
task: sft
data:
  format: raft
training:
  ra_dit_stage: generator

Both stages reuse the existing trainer wrappers. Live orchestration (a single soup train chaining both) ships in v0.62.1.

`training.citation_faithful` — enforce source attribution

yaml
data:
  format: raft
training:
  citation_faithful: true
  citation_style: bracket   # bracket | inline | footnote
  citation_recall_threshold: 0.85

When citation_faithful=true, the trainer masks the loss to emphasize citation spans. The model learns to cite sources from retrieved docs. The final save is refused if citation recall < threshold.

`soup steer` — activation steering vectors

bash
soup steer train --base meta-llama/Llama-3.1-8B-Instruct \
  --method caa --name safety-v1 \
  --pairs ./safety_pairs.jsonl --layer 16

soup steer apply --name safety-v1 --strength 1.5
soup steer list

Three methods:

  • CAA (Contrastive Activation Addition) — add a learned vector to the residual stream.
  • ITI (Inference-Time Intervention) — shift specific attention heads.
  • RepE (Representation Engineering) — PCA-based direction extraction.

|strength| ≤ 10 is enforced. Vectors register as the new steering_vector artifact kind in the v0.26 Registry.

Pairs JSONL:

json
{"positive": "You are a helpful AI.", "negative": "You are a harmful AI."}

`training.grace_codebook` — discrete latent codebook

yaml
training:
  grace_codebook: true
  grace_codebook_size: 1024

GRACE (Generalization-Regularized Adaptive Codebook Embedding) discretizes the latent activation space into a learned codebook. Reduces overfitting on small datasets; useful for thousands of sequential edits without norm-blowup. Schema-only in v0.62.0; live in v0.62.1.

New recipes

Three RAFT-style recipes shipped:

  • raft-llama3-8b — RAFT SFT generator on Llama 3.1 8B.
  • ra-dit-retriever — sentence-transformer contrastive stage.
  • ra-dit-llama3-8b — full RA-DIT stage-2 generator.

Numbers

+215 new tests in v0.62.0 (9571 → 9786). Security: 0 CRITICAL, 0 HIGH, 4 MEDIUM, 11 LOW.

See also

  • [Trace ecosystem](/docs/trace-ecosystem) — v0.63 soup ingest produces the JSONL that feeds the RAFT generator.
  • [Unlearning](/docs/unlearning) — surgical edits as a harder counterpart to soft steering.
  • [Data Forge](/docs/data-forge) — quality moat for the source docs that become golden / distractor pairs.