CLI Reference
Core Commands
`soup init`
Create a config file with interactive wizard or template.
soup init [--template <template>] [--output <path>]`soup train`
Start a training run.
soup train [--config <path>] [--resume auto|<checkpoint>] [--wandb] [--tensorboard] \
[--deepspeed zero2|zero3|zero2_offload|zero++|zero_pp] [--fsdp full_shard|shard_grad|full_offload] \
[--gpus auto|<N>] [--gate <suite>] \
[--push-as <user/repo>] [--hf-resume] [--yes]--gpus(v0.27) — topology-aware multi-GPU launch hint (NVLink/PCIe detection)--deepspeed zero++(v0.27) — ZeRO-3 with quantized weights & grads--gate(v0.26) — run an eval suite at epoch boundaries and halt on regression--push-as(v0.29) — auto-push everysave_stepscheckpoint to HF Hub as acheckpoint-<N>branch--hf-resume(v0.29) — pull the latest checkpoint branch from HF Hub and resume training
`soup chat`
Interactive chat with a trained model.
soup chat [--model <path>] [--base <model>] [--temperature <float>] [--max-tokens <int>]`soup serve`
Start an OpenAI-compatible inference server.
soup serve [--model <path>] [--backend transformers|vllm|sglang] [--port <int>] [--host <host>] [--tensor-parallel <n>] [--gpu-memory <float>] [--speculative-decoding <draft-model>] [--spec-tokens <n>] [--adapters <name>=<path>]`soup export`
Export model to deployment format.
soup export [--model <path>] [--format gguf|onnx|tensorrt|awq|gptq] [--quant <type>] [--output <path>]`soup merge`
Merge LoRA adapter with base model.
soup merge [--adapter <path>] [--base <model>] [--output <path>] [--dtype <type>]`soup push`
Upload model to HuggingFace Hub.
soup push [--model <path>] [--repo <user/repo>] [--private] [--collection <owner/slug-hash>]--collection(v0.29) — add the pushed repo to an existing HF Collection
Token resolution (single source of truth): env HF_TOKEN / HUGGINGFACE_HUB_TOKEN > ~/.cache/huggingface/token > ~/.huggingface/token. The legacy --token flag is deprecated and delegates to this chain.
`soup eval`
Comprehensive model evaluation platform.
soup eval benchmark [--model <path>] [--benchmarks <list>]
soup eval custom [--model <path>] [--tasks <path>]
soup eval judge [--model <path>] [--prompts <path>] [--judge <model>]
soup eval auto [--config <path>]
soup eval compare <run1> <run2>
soup eval leaderboard
soup eval human [--model-a <path>] [--model-b <path>] [--prompts <path>]`soup deploy`
Deploy models to inference runtimes.
# Ollama
soup deploy ollama [--model <path>] [--name <name>] [--system <prompt>] [--template <tpl>] [--parameter <key=val>]
soup deploy ollama --list
soup deploy ollama --remove <name>
# HuggingFace Spaces (new in v0.29)
soup deploy hf-space \
--model <user/repo> \
--space <user/space> \
--template gradio-chat|streamlit-chatsoup deploy hf-space validates model_repo via validate_repo_id before substituting into app.py / README.md — a crafted repo id cannot inject Python code into the deployed Space.
`soup infer`
Batch inference on a list of prompts.
soup infer [--model <path>] [--input <path>] [--output <path>] [--max-tokens <int>] [--temperature <float>]`soup migrate`
Import config from competing tools.
soup migrate --from llamafactory|axolotl|unsloth <input-file> [--output <path>] [--dry-run]`soup recipes`
Browse and use ready-made training configs.
soup recipes list # List all 116 recipes
soup recipes show <name> # Print recipe YAML
soup recipes use <name> [--output <path>] # Copy to soup.yaml
soup recipes search <query> # Search by model/task/sizeData Commands
soup data inspect <path> # View dataset stats
soup data validate <path> [--format <fmt>] # Check format compliance
soup data convert <path> --to <fmt> --output <f> # Convert between formats
soup data merge <f1> <f2> --output <f> [--shuffle] # Combine datasets
soup data dedup <path> [--threshold <float>] # Remove duplicates
soup data stats <path> # Extended statistics
soup data generate --prompt "..." --count <n> # Generate synthetic data
soup data generate --provider ollama|anthropic|vllm # Multi-provider support
soup data generate --template code|conversation|qa|preference|reasoning # Domain templates
soup data generate --quality-pipeline # Validate + filter + dedup
soup data filter <path> [--coherence <f>] [--perplexity <n>] # Quality filter
soup data sample <path> --strategy random|diverse|hard --count <n> # Sample subset
soup data split <path> --ratio 0.8,0.1,0.1 # Train/val/test split
soup data search <query> # Search HuggingFace Hub
soup data preview <dataset_id> # Preview remote dataset
soup data download <dataset_id> [--output <path>] [--samples <n>] # Download from HF
soup data register <name> --path <path> --format <fmt> # Register local dataset
soup data unregister <name> # Remove from registry
soup data registry # List registered datasets
soup data push --input <jsonl> --hf-dataset <user/repo> # Publish JSONL as HF dataset (v0.29)
soup data from-traces --input <logs> --format langchain|openai|soup-serve # Trace-to-preference (v0.26)
soup data review <pairs.jsonl> # Preview preference pairs (v0.26)
soup data augment --input <jsonl> --strategy rephrase|translate|style # LLM augment (v0.25)Experiment Commands
soup runs # List all training runs
soup runs show <run_id> # Detailed run info + loss curve
soup runs compare <run1> <run2> # Side-by-side comparison
soup runs delete <run_id> # Remove from databaseOther Commands
soup deploy ollama [--model <p>] [--name <n>] # Deploy GGUF to Ollama
soup profile [--model <m>] [--task <t>] # Estimate memory/speed/GPU
soup adapters list [--path <dir>] # Scan for LoRA adapters
soup adapters info <path> # Adapter metadata
soup adapters compare <a> <b> # Side-by-side comparison
soup sweep --config <path> --param key=v1,v2 # Hyperparameter search
soup diff --model-a <p> --model-b <p> # Compare two models
soup doctor [--nccl] # Check system + deps (+ NCCL multi-GPU bandwidth)
soup quickstart [--dry-run] [--yes] # One-command demo
soup ui [--port <int>] [--no-browser] # Launch web UI
soup version [--full] # Show version infov0.54 — Pre-flight Decision
soup advise <data.jsonl> --goal "<task>" # Classify task + rank PROMPT_ENG/RAG/SFT/DPO/GRPO
soup advise --probe # 100-step LoRA probe for empirical ROI
soup advise --record # Append to ~/.soup/advise_history.jsonl
soup advise explain # Print full rubric
soup advise compare <a.jsonl> <b.jsonl> # Compare two candidate datasetsv0.55 — Eval Design
soup eval design <data.jsonl> --goal "..." # Goal-conditioned suite (TF-IDF + dispatch)
soup eval discover <data.jsonl> # Greedy farthest-first canaries + memorization probes
soup eval lock <design.json> # SHA-256-pin the suite as an artifact
soup eval coverage <design.json> --task <category> # Gap analysis vs v0.54 taxonomy
soup eval gate-install --baseline <run-id> # Install .git/hooks/pre-push regression gatev0.56 — Diagnose (Model Report Card)
soup diagnose <run-id> [--attach-to-registry <id>] # 6 failure-mode probes + OK/MINOR/MAJOR + SVG badge
soup train --diagnose-gate <evidence> # Refuse final save on MAJOR regression (exit 2)v0.57 + v0.67 — Adapter VCS
# v0.57 — git for LoRA
soup adapters diff <a> <b> # Per-layer Frobenius + relative drift + SVD effective rank
soup adapters merge <a> <b> [c...] -o <out> \
--strategy linear|ties|dare|svd|cmaes \ # v0.67 adds CMA-ES evolutionary search
[--eval <suite>] [--budget 1h] # (eval + budget required for cmaes)
soup adapters blame --dataset <d> --layer <l> --budget 5m --shards 4
soup adapters branch <name> -c <config.yaml> --base <model> --dataset <data>
soup adapters checkout <name> -o <out.yaml>
soup adapters branches
# v0.67 additions
soup adapters pr <title> --base-sha <hex> --adapter <path> \
--eval <eval_delta.json> --samples <sample_diffs.json> \
--output <PR.md> # GitHub-shaped PR markdown
soup adapters bisect <ckpt1> <ckpt2> ... \
--eval-command "soup eval custom --checkpoint {ckpt} --suite <s.yaml>"
# Binary-search history for regression boundary
# Shared run lockfile (v0.67)
soup lock write --base-model <id> --base-sha <64hex> --dataset-sha <64hex> \
--env-hash <64hex> [--output soup.lock]
soup lock show [soup.lock]
soup lock check --base-model <id> --base-sha <64hex> --dataset-sha <64hex> \
--env-hash <64hex> [soup.lock] # Exit 3 on driftv0.58 — Production Data Flywheel
soup loop init <served-model> --eval <suite> --baseline registry://<id> \
--monthly-budget 50usd --max-runs-per-day 3
soup loop watch # Long-running daemon (SIGTERM-safe)
soup loop status
soup loop pause / soup loop resume
soup loop canary <adapter> --traffic 5% --autoroll-on-regress
soup loop replay [<iter-id>]v0.59-v0.62 — Governance, Supply Chain, Unlearn, RAG
# v0.59 — Governance & Provenance
soup bom emit --name <m> --version <v> --base-model <hf> --base-sha <sha> \
--config-sha <sha> --task <t> --license <spdx> --format cyclonedx|spdx|both
soup attest emit --stage extract|train|eval|export|publish \
--subject <artifact> --sha <sha> --builder <id> --invocation <cmd> \
--sign unsigned|ed25519|sigstore
soup audit-log tail / rotate
soup train --annex-xi <out.md> --repro-receipt <out.json>
# v0.60 — Supply Chain Security
soup adapters scan <dir> # Spectral LoRA backdoor scanner (exit 0/1/3)
soup adapters sign / verify [--strict] # Merkle-root tamper detection
soup adapters check-safetensors --strict # Refuse pickle/.bin/.pt
soup adapters merge --license <spdx> # SPDX-license compatibility gate
soup airgap-bundle --model <dir> --dataset <dir>... --wheel <dir>... --kernel <dir>...
# v0.61 — Unlearning & Knowledge Edit
# (configured via task: unlearn in soup.yaml)
soup eval unlearning <run-id> --benchmark tofu|muse|wmdp
soup edit set --base <m> --method rome|memit|alphaedit --subject "..." --target "..." [--plan-only]
soup edit diff registry://before_id registry://after_id --probes probes.jsonl --top-k 10
# v0.62 — RAG & Activation Steering
soup steer train --base <m> --method caa|iti|repe --name <id> --pairs <jsonl>
soup steer apply --name <id> --strength <float> # |strength| ≤ 10 enforced
soup steer listv0.63 — Production Trace Ecosystem
soup ingest --source langfuse|langsmith|helicone|openpipe|otel|openai-stored \
--logs <jsonl> --output <out.jsonl>
soup prune-prompt --input <jsonl> --output <out.jsonl> --min-frequency 0.95
soup data active-sample --input <jsonl> --budget N
soup ab --input <jsonl> --metric latency|judge_score|retry_rate \
[--alpha 0.05] [--beta 0.20] [--effect-size 0.1]
soup drift-alarm --reference <ref.jsonl> --live <live.jsonl> --threshold 0.2 \
[--slack-url ...] [--discord-url ...] # Exit 3 on driftv0.64 — Pre-flight & Tooling
soup tunability --dataset <path> [--candidates <names>] \
[--probe-steps 100] [--holdout-size 64] [--output <path>] [--plan-only] [--list]
soup plan --config <soup.yaml> [--state ./soup.tfstate]
soup apply --config <soup.yaml> [--state ./soup.tfstate] [--dry-run]
soup env lock [--output ./soup-env.lock]
soup env status [--lock ./soup-env.lock]
soup env check [--lock ./soup-env.lock] # Exit 3 on ABI drift
soup completions bash|zsh|fish
soup license-advisor --target b2c|defense|embedded \
[--license <spdx>] [--monthly-active-users <N>] # Exit 3 on blockv0.65 — Eval Depth (Failure Modes 6 → 10)
soup eval behavior <run-id> --battery xstest|harmbench|jailbreakbench|elephant|syceval \
[--evidence <responses.json>] [--output <report.json>]
soup eval capability <run-id> --suite full|fast|math|code [--output <report.json>]
soup eval checklist <spec.yaml> [--evidence <responses.json>] [--output <report.json>]
soup eval irt-subset <responses.jsonl> --size full|small|tiny [--output <plan.json>]v0.66 — Post-train X-rays
soup probe sae-diff <sae.safetensors> <pre_acts.json> <post_acts.json> \
[--top-k 20] [--output <report.json>]
soup probe sleeper <base> [--evidence <activations.json>] [--output <result.json>]
soup probe interference <losses.json> [--output <matrix.json>] # Exit 2 if worst ≥20%
soup probe pack <base> [--list] [--output <manifest.json>]Global Flags
soup --verbose <command> # Full tracebacks instead of friendly errors> Note: --verbose is a global flag — it must go before the command name, not after.