Stop tuning.Start training.Soup picks the hyperparameters for you_

The whole post-training stack in one CLI. Soup picks the method, writes the config, derives the evals from your own data, gates every save, halts on reward hacking, and ships with canary + auto-rollback — so you stop tuning and start training. 23 methods · 116 recipes · 17 quant formats · MLX + Apple adapter.

Apache-2.0 LicensePython 3.9+No vendor lock-in
1Install
$ pip install soup-cli
2Configure
$ soup init
✓ Created soup.yaml
3Train
$ soup train
> Training started...

Integrates with your entire ML stack

HuggingFaceHuggingFace
OllamaOllama
vLLMvLLM
DeepSpeedDeepSpeed
UnslothUnsloth
ONNXONNX
NVIDIA TensorRTNVIDIA TensorRT
W&BW&B
SGLangSGLang
FlashAttentionFlashAttention
HuggingFaceHuggingFace
OllamaOllama
vLLMvLLM
DeepSpeedDeepSpeed
UnslothUnsloth
ONNXONNX
NVIDIA TensorRTNVIDIA TensorRT
W&BW&B
SGLangSGLang
FlashAttentionFlashAttention
v0.70.0 — loop hardening — reward-hacking detector, ULD cross-tokenizer KD, MiniLLM reverse-KL, mid-epoch RL ckpt, iterative DPO, RAGEN echo-trap

The flywheel: plan → train → x-ray → merge → bisect → ship

Every other tool stops at "hit train and hope." Soup closes the post-training loop in one CLI: pick the base, plan + apply with drift detection, derive evals from your data, train with eval-gates + reward-hack + echo-trap detectors, A/B with Wald-SPRT, ship with canary + auto-rollback, x-ray the result, then bisect any regression. Everything stays on your laptop — no per-trace SaaS fees, no vendor lock-in.

v0.70.0 — Flagship · live kernels

Loop hardening — reward-hack detector, ULD, MiniLLM, mid-epoch RL ckpt, iterative DPO, echo-trap

Hosted vendors silently bill you for the GPU-hours when your RL policy games the reward model or your multi-turn agent echo-traps itself. v0.70 protects the training loop from the failure modes that cost a real GPU-hour, with six new surfaces and live math kernels. +803 tests across v0.68–v0.70 (11,021 → 11,824).

  • --reward-hack-detector info_rm|rm_ensemble — InfoRM cluster-separation + RM-ensemble variance, OK/WARN/HACK at 0.10/0.30 drop
  • --uld-strategy wasserstein|topk_align — cross-tokenizer KD, Llama→Mistral with no shared vocab
  • --minillm-enabled — reverse-KL distillation with all 3 stability tricks bundled
  • --rl-checkpoint-save-every-steps N — mid-epoch PPO/GRPO ckpt (TorchTune still punts this)
  • soup iterative-dpo --rounds N — sample → score → re-pair → retrain driver, frozen per-round artifacts
  • --echo-trap-enabled --echo-trap-halt — RAGEN n-gram repetition detector, halt on TRAP
Read the loop hardening docs
soup train · loop hardening
$ soup train --task grpo --base-model meta-llama/Llama-3.1-8B \
    --reward-hack-detector info_rm --reward-hack-halt \
    --echo-trap-enabled --echo-trap-halt \
    --rl-checkpoint-save-every-steps 200
step 0200  reward=0.42  cluster_sep=0.91  echo=0.18  OK
step 0400  reward=0.61  cluster_sep=0.87  echo=0.22  OK
step 0600  reward=0.79  cluster_sep=0.74  echo=0.31  WARN
step 0800  reward=0.93  cluster_sep=0.58  echo=0.34  HACK
✗ halt: reward-hack verdict = HACK (drop 0.36 ≥ 0.30)
   → recovering to ckpt step-0400 (mid-epoch ckpt) · exit 2

$ soup iterative-dpo --rounds 4 --pairs-per-round 4000 ...
round 01  pairs=4000  win_rate=0.58  → ./iter-dpo/round-01/adapter
round 02  pairs=4000  win_rate=0.66  → ./iter-dpo/round-02/adapter
round 03  pairs=4000  win_rate=0.71  → ./iter-dpo/round-03/adapter
round 04  pairs=4000  win_rate=0.73  → ./iter-dpo/round-04/adapter
✓ promoted round-04 as registry://policy-v4
Decidev0.54 + v0.64 + v0.69 — Advise, plan/apply, soup build, soup expect

Decide before you spend a GPU hour

soup advise ranks PROMPT_ENG / RAG / SFT / DPO / GRPO from your data. soup tunability probes candidate bases and reports the Pareto frontier. soup plan / apply is Terraform-shaped drift detection (exit 3 on changed batch / dataset / base SHA). soup build (v0.69) is a dbt-shaped DAG of dataset transforms with re-tokenise-only-changed-rows. soup expect (v0.69) is a Great Expectations suite that fails on PII, refusal patterns, or out-of-range token lengths.

Decidev0.55 + v0.65 + v0.69 — Eval design, eval depth, brain-rot

Evals from your data, 10 failure modes, AI-slop detector

soup eval design pulls TF-IDF dimensions from your training data and SHA-pins the suite. soup eval behavior diffs pre/post-FT on XSTest / HarmBench / JailbreakBench / ELEPHANT / SycEval. soup eval irt-subset Rasch-fits a ~10% cost-cut subset. soup data brain-rot (v0.69, arXiv 2510.13928) refuses to ship training data that's clickbait-y or low-diversity — OK/MINOR/MAJOR bands, --strict exits 3.

Trainv0.50 + v0.70 — GRPO Plus + reward-hack + echo-trap

7 RLHF variants, hardened against reward hacking & echo-traps

GRPO, GSPO, DAPO, Dr.GRPO, BNPO, two-sided, RFT — real loss kernels, EMA reference, replay buffer, TIS alerts. v0.70 adds --reward-hack-detector (InfoRM cluster-separation or RM-ensemble divergence, --reward-hack-halt on HACK verdict) and --echo-trap-enabled (RAGEN-style multi-turn n-gram repetition detector, OK/WARN/TRAP at 0.30/0.60). Plus Modality II — text / vision / audio / TTS / classifier / PRM in one config.

Trainv0.25 + v0.27 + v0.68 + v0.70 — Multi-GPU, MLX, Apple adapter, ULD

Scale 1 GPU → cluster, run on a Mac, distil across vocabs

soup train --gpus N picks DeepSpeed ZeRO-3, ZeRO++, FSDP2+compile, or pipeline parallelism. Native MLX backend trains SFT / DPO / GRPO on M1–M4. v0.68 adds soup apple-adapter for HF safetensors ↔ MLX npz ↔ Apple FoundationModels (iOS 26) conversion + Merkle-root signing. v0.70 adds --uld-strategy wasserstein|topk_align for cross-tokenizer distillation (Llama→Mistral, no shared vocab) plus --minillm-enabled (3-trick stability).

Shipv0.58 + v0.70 — Soup Loop + iterative DPO + mid-epoch RL ckpt

Production flywheel as a daemon, iterative-DPO driver

soup loop runs production-traces → preference-pairs → eval-gated DPO → canary-deploy → auto-rollback with atomic state, SHA-256 traffic routing at ±0.01% granularity, and monthly-budget + daily-cap guardrails. v0.70 adds soup iterative-dpo (sample → RM-score → re-pair → retrain over N rounds) and --rl-checkpoint-save-every-steps so a PPO crash hops to the last mid-epoch ckpt instead of restarting the epoch (TorchTune still punts this).

ShipSmart Serving

Spec decoding, prefix cache, hot-swap LoRA

OpenAI-compatible server with auto-paired draft models, prefix cache for RAG, hot-swappable adapters, structured JSON/regex output, and OTLP tracing. /v1/messages speaks Anthropic shape on transformers + vLLM.

Operatev0.56 + v0.66 — Diagnose + X-rays

Model report card + mechanistic interpretability

soup diagnose scores 6 failure modes; v0.66 adds 4 more probes — soup probe sae-diff (Sparse Autoencoder feature movement), soup probe sleeper (defection classifier), soup probe interference (N×N adapter compat matrix, gates CI at ≥20%), soup probe pack. Plus live influence-blame for DataInf-style row attribution. soup why explains NaN / plateau / divergence in plain English.

Operatev0.57 + v0.67 — Adapter lifecycle finish

Git for LoRA: diff, CMA-ES merge, bisect, soup.lock, PRs

soup adapters: diff (Frobenius + SVD rank), merge (linear / TIES / DARE / SVD / CMA-ES), blame, branch / checkout, bisect (binary search regression history), pr (GitHub-shaped review markdown). VeRA / VB-LoRA vector banks store thousands of per-user adapters at MB-each. soup lock pins SHA256(base || dataset || env) for reproducible team runs.

Securev0.59–v0.62 + v0.64 + v0.68 — Governance, supply chain, personal flywheel

Procurement-ready ML + your own thumbs-up/down loop

soup bom emit ships CycloneDX 1.6 + SPDX 2.3 ML-BOMs; soup attest emit writes SLSA-3 in-toto attestations. soup adapters scan / sign / verify catches backdoors with Merkle-root tamper detection. task: unlearn (NPO / SimNPO / RMU), soup edit set (ROME / MEMIT / AlphaEdit), soup steer (CAA / ITI / RepE), soup license-advisor — ok/warn/block per (license, deploy-target, MAU). v0.68 adds soup local-rl: POSIX 0o600 SQLite, thumbs harvested to DPO pairs in one command.

Most of the above: LLaMA-Factory / Axolotl / Unsloth ✗

soup migrate

Already using another tool? Switch in 30 seconds

One command converts your existing config. No rewriting, no guessing — just migrate and train.

LLaMA-Factory
LLaMA-Factory
Auto-converted to Soup
Axolotl
Axolotl
Auto-converted to Soup
Unsloth
Unsloth
Auto-converted to Soup

See the difference

LLaMA-Factory config
llama3_lora_sft.yaml
model_name_or_path: meta-llama/Llama-3.1-8B
stage: sft
finetuning_type: lora
lora_rank: 64
lora_alpha: 16
lora_dropout: 0.05
lora_target: all
dataset: alpaca_en
template: llama3
cutoff_len: 2048
per_device_train_batch_size: 4
gradient_accumulation_steps: 4
num_train_epochs: 3
learning_rate: 2.0e-5
lr_scheduler_type: cosine
warmup_ratio: 0.1
quantization_bit: 4
output_dir: ./saves/llama3-lora
Soup config (auto-generated)
soup.yaml
base: meta-llama/Llama-3.1-8B
task: sft

data:
  train: ./data/alpaca_en.jsonl
  max_length: 2048

training:
  epochs: 3
  lr: 2e-5
  quantization: 4bit
  lora:
    r: 64
    alpha: 16

output: ./saves/llama3-lora

Soup auto-detects everything else — optimizer, scheduler, target modules, batch size.

Built for the ML Stack you already use

First-class integrations with the tools powering production ML. Deploy anywhere, track everything.

Works with your favorite models

Llama 4 Scout
Llama 3.1
Llama 3.2 Vision
Qwen 3
Qwen 2.5
Qwen3-Coder
QwQ-32B
Gemma 3
MedGemma
EmbeddingGemma
Phi-4
DeepSeek R1
DeepSeek V3
DeepSeek-OCR
Mistral
Mixtral
Magistral
Voxtral
GPT-OSS
GLM 4.6
Kimi K2
MiniMax M2
Granite 4
LFM2
Pixtral
Qwen2-VL
InternVL 3.5
Qwen2-Audio
Whisper-large-v3
Orpheus TTS
Falcon-E BitNet
SmolLM2
CodeLlama

...and any model on Hugging Face Hub

Pulls production traces from

Langfuse
LangSmith
Helicone
OpenPipe
OpenTelemetry
OpenAI Stored Completions

via soup ingest --source <vendor> --logs <export.jsonl> — no per-trace fees, all offline (v0.63)

Deploy & Serve

Ollama
Ollama
One-command local deploy
vLLM
vLLM
Prefix cache + spec decoding
SGLang
SGLang
RadixAttention backend
llama.cpp
llama.cpp
GGUF export + HF Spaces

Training & Infra

Unsloth
Unsloth
2-5x faster training
Apple MLX
Apple MLX
M1–M4 native SFT/DPO/GRPO
DeepSpeed
DeepSpeed
ZeRO 2/3 + ZeRO++ + MII
FlashAttention
FlashAttention
v2/v3 + Multipack varlen

Ecosystem

HuggingFace
HuggingFace
Push models to Hub
OpenAI API
OpenAI API
Compatible server
W&B
W&B
+ MLflow, SwanLab, Trackio
TensorBoard
TensorBoard
Local metrics viz

Quant & Export

GGUF
GGUF
UD ladder + IQ + ARM rungs
ONNX
ONNX
ONNX Runtime deploy
TensorRT
TensorRT
High-throughput GPU
AWQ/GPTQ
AWQ/GPTQ
+ HQQ, AQLM, EETQ, MXFP4, FP8, NVFP4, BitNet 1.58, TorchAO PTQ
Free forever. Apache-2.0 Licensed.

Your competitors are already fine-tuning.

Every day without custom models is a day your product falls behind. Soup gets you from zero to fine-tuned in under 5 minutes.

23
Training methods
116
Ready recipes
<60s
Setup time
No credit cardNo sign-upNo vendor lock-inWorks offline