Stop tuning.Start training.Soup picks the hyperparameters for you_
The whole post-training stack in one CLI. Soup picks the method, writes the config, derives the evals from your own data, gates every save, halts on reward hacking, and ships with canary + auto-rollback — so you stop tuning and start training. 23 methods · 116 recipes · 17 quant formats · MLX + Apple adapter.
Integrates with your entire ML stack
The flywheel: plan → train → x-ray → merge → bisect → ship
Every other tool stops at "hit train and hope." Soup closes the post-training loop in one CLI: pick the base, plan + apply with drift detection, derive evals from your data, train with eval-gates + reward-hack + echo-trap detectors, A/B with Wald-SPRT, ship with canary + auto-rollback, x-ray the result, then bisect any regression. Everything stays on your laptop — no per-trace SaaS fees, no vendor lock-in.
Loop hardening — reward-hack detector, ULD, MiniLLM, mid-epoch RL ckpt, iterative DPO, echo-trap
Hosted vendors silently bill you for the GPU-hours when your RL policy games the reward model or your multi-turn agent echo-traps itself. v0.70 protects the training loop from the failure modes that cost a real GPU-hour, with six new surfaces and live math kernels. +803 tests across v0.68–v0.70 (11,021 → 11,824).
- •
--reward-hack-detector info_rm|rm_ensemble— InfoRM cluster-separation + RM-ensemble variance, OK/WARN/HACK at 0.10/0.30 drop - •
--uld-strategy wasserstein|topk_align— cross-tokenizer KD, Llama→Mistral with no shared vocab - •
--minillm-enabled— reverse-KL distillation with all 3 stability tricks bundled - •
--rl-checkpoint-save-every-steps N— mid-epoch PPO/GRPO ckpt (TorchTune still punts this) - •
soup iterative-dpo --rounds N— sample → score → re-pair → retrain driver, frozen per-round artifacts - •
--echo-trap-enabled --echo-trap-halt— RAGEN n-gram repetition detector, halt on TRAP
$ soup train --task grpo --base-model meta-llama/Llama-3.1-8B \
--reward-hack-detector info_rm --reward-hack-halt \
--echo-trap-enabled --echo-trap-halt \
--rl-checkpoint-save-every-steps 200
step 0200 reward=0.42 cluster_sep=0.91 echo=0.18 OK
step 0400 reward=0.61 cluster_sep=0.87 echo=0.22 OK
step 0600 reward=0.79 cluster_sep=0.74 echo=0.31 WARN
step 0800 reward=0.93 cluster_sep=0.58 echo=0.34 HACK
✗ halt: reward-hack verdict = HACK (drop 0.36 ≥ 0.30)
→ recovering to ckpt step-0400 (mid-epoch ckpt) · exit 2
$ soup iterative-dpo --rounds 4 --pairs-per-round 4000 ...
round 01 pairs=4000 win_rate=0.58 → ./iter-dpo/round-01/adapter
round 02 pairs=4000 win_rate=0.66 → ./iter-dpo/round-02/adapter
round 03 pairs=4000 win_rate=0.71 → ./iter-dpo/round-03/adapter
round 04 pairs=4000 win_rate=0.73 → ./iter-dpo/round-04/adapter
✓ promoted round-04 as registry://policy-v4
Most of the above: LLaMA-Factory / Axolotl / Unsloth ✗
Already using another tool? Switch in 30 seconds
One command converts your existing config. No rewriting, no guessing — just migrate and train.



See the difference
model_name_or_path: meta-llama/Llama-3.1-8B
stage: sft
finetuning_type: lora
lora_rank: 64
lora_alpha: 16
lora_dropout: 0.05
lora_target: all
dataset: alpaca_en
template: llama3
cutoff_len: 2048
per_device_train_batch_size: 4
gradient_accumulation_steps: 4
num_train_epochs: 3
learning_rate: 2.0e-5
lr_scheduler_type: cosine
warmup_ratio: 0.1
quantization_bit: 4
output_dir: ./saves/llama3-lora
base: meta-llama/Llama-3.1-8B
task: sft
data:
train: ./data/alpaca_en.jsonl
max_length: 2048
training:
epochs: 3
lr: 2e-5
quantization: 4bit
lora:
r: 64
alpha: 16
output: ./saves/llama3-lora
Soup auto-detects everything else — optimizer, scheduler, target modules, batch size.
Built for the ML Stack you already use
First-class integrations with the tools powering production ML. Deploy anywhere, track everything.
Works with your favorite models
...and any model on Hugging Face Hub
Pulls production traces from
via soup ingest --source <vendor> --logs <export.jsonl> — no per-trace fees, all offline (v0.63)
Deploy & Serve
Training & Infra

Ecosystem
Quant & Export
Your competitors are
already fine-tuning.
Every day without custom models is a day your product falls behind. Soup gets you from zero to fine-tuned in under 5 minutes.