Question 1

What's new in the latest Soup CLI release?

Accepted Answer

v0.53.0 'Quant Menu II' ships the full advanced-quantization surface: Unsloth Dynamic 2.0 GGUF ladder (14-entry closed allowlist UD-Q8_K_XL … UD-IQ1_M), a 12-entry IQ family and 10-entry Apple/ARM-friendly GGUF set, new 'training.kv_cache_type: q8_0 | bf16 | f16 | fp8' (FP8 Hopper-only), FP8 attention, NVFP4 (Blackwell SM ≥ 12), explicit 'unsloth_bnb_4bit', BNB double-quant, ref/reward-model quantization, 'soup merge --save-format 4bit | 4bit_forced' single-shot BNB merge, and 'soup export --format torchao --quant-config' (Int4WeightOnly / Int8DynActInt4 / Float8DynActFloat8 / NVFP4). Schema-only release; live llama.cpp imatrix + serve / merge / export writers land in v0.53.1. Net +157 tests (7,453 → 7,610) across 179 test files.

Question 2

What is the Soup CLI registry?

Accepted Answer

The registry is a local SQLite-backed catalog of every fine-tune at ~/.soup/registry.db. Each entry stores the config, eval baseline, and parent lineage. You push runs with 'soup registry push --run-id --name --tag ', visualize the DAG with 'soup history ', diff two versions with 'soup registry diff', and promote a version to prod with a tag. Eval gates can reference 'registry://' as a baseline to catch regressions.

Question 3

What is Soup Autopilot?

Accepted Answer

Autopilot is a zero-config decision engine introduced in Soup CLI v0.25.0. You pass 'soup autopilot --model --data --goal chat' and Soup profiles the dataset, model, and GPU, then picks task, quantization, PEFT rank, batch size, learning rate, epochs, and max length — generating a soup.yaml with every choice justified.

Question 4

Can Soup CLI auto-push checkpoints to HuggingFace Hub?

Accepted Answer

Yes. Soup ships deep HuggingFace Hub integration. 'soup train --push-as user/my-model' uploads every save_steps checkpoint to HF Hub as a 'checkpoint-<N>' branch. Pair with '--hf-resume' to pull the latest branch and keep going after a spot-instance preemption. Set 'HF_ENDPOINT=https://hf.internal.example.com' to route to a self-hosted Hub (SSRF-hardened: loopback HTTP only, RFC1918 IPs rejected). 'soup deploy hf-space' creates a Gradio or Streamlit Space wrapping your model in one command.

Question 5

What is the unified preference dispatcher in Soup CLI v0.40?

Accepted Answer

v0.40.0 'Preference Variety' adds 'task: preference' as a unified dispatcher. Set 'training.preference_loss: dpo|simpo|orpo|ipo|bco' to pick the loss without renaming the task — making hyperparameter sweeps over the loss type itself trivial. Legacy 'task: dpo' / 'task: simpo' / etc. remain first-class. The release also ships BCO (Binary Classifier Optimization) as a new trainer with 'task: bco', two opt-in DPO controls (beta annealing via 'dpo_beta_schedule' + periodic reference-model refresh via 'dpo_ref_regen_epochs'), and a multi-objective preference-loss schema ('preference_loss_weights') that validates 2–5 entries summing to 1.

Question 6

What quantization formats does Soup CLI support?

Accepted Answer

Seventeen formats as of v0.53.0 'Quant Menu II'. Set via 'training.quantization' in soup.yaml: 4bit (bitsandbytes default), 8bit, none (fp16/bf16/full), gptq, awq, hqq:1bit … hqq:8bit (8 sub-variants), aqlm (extreme 2-bit), eetq (8-bit fast kernel SM75+), mxfp4, fp8 (training Hopper+), and bitnet_1.58 (v0.52.0). On top of that, v0.53.0 ships a 14-entry Unsloth Dynamic 2.0 GGUF ladder (UD-Q8_K_XL … UD-IQ1_M), 12-entry IQ family, 10-entry Apple/ARM-friendly GGUF set, KV-cache types ('training.kv_cache_type: q8_0|bf16|f16|fp8' — FP8 Hopper-only), FP8 attention, NVFP4 (Blackwell), explicit 'unsloth_bnb_4bit', BNB double-quant, plus 'soup merge --save-format 4bit | 4bit_forced' and 'soup export --format torchao --quant-config' (Int4WeightOnly / Int8DynActInt4 / Float8DynActFloat8 / NVFP4). 'soup train' runs check_quant_distributed_compat() at startup to flag FSDP / ZeRO-3 incompatibilities before training begins.

Question 7

What is Multipack in Soup CLI?

Accepted Answer

Multipack (v0.37.0) is Soup's largest single throughput win on chat fine-tuning over uneven-length data. Instead of padding every sample to max_length, it uses First-Fit-Decreasing bin packing to group variable-length samples — eliminating padding waste. Set 'training.multipack: true' in soup.yaml. 18-architecture allowlist (Llama 3.x, Qwen 2/3, Mistral, Gemma 2/3, Phi 3/4, DeepSeek V2/V3, Mixtral, Falcon, StableLM, SmolLM2). Unknown architectures fail loudly at config-load. SFT / Pretrain only on the transformers backend.

Question 8

What LoRA quality features does Soup CLI ship?

Accepted Answer

Five PEFT-surface improvements (v0.39.0). PiSSA initializes LoRA from the SVD of the base weight for faster early convergence. ReLoRA fires every N steps to magnitude-prune the adapter and clear optimizer state — useful for very long runs. Per-pattern rank/alpha lets you map module name patterns to integer ranks. Surgical patches auto-fire for Gemma 4 ClippableLinear and fused-MoE 3-D expert dropout. The 17 built-in templates live as soup_cli/templates/*.yaml with a manifest.json index.

Question 9

How does Soup CLI explain training anomalies?

Accepted Answer

'soup why' (v0.34.0) reads the most recent (or named) run and surfaces plain-English diagnoses with concrete next steps. Detects NaN/Inf loss, plateau (≥30 steps with <0.5% change), divergence (loss > 3× initial), persistent high gradient norm, learning rate outside [1e-6, 5e-3]. Pure rule-based — no model calls. 'soup tui' opens a full-screen Textual dashboard. 'soup train --profile' records a torch.profiler Chrome-trace. Crash bundles auto-write a self-contained .crash JSON with redacted secrets when training fails.

Question 10

Does Soup CLI's inference server support speculative decoding and structured output?

Accepted Answer

Yes (v0.30.0). Speculative decoding via '--speculative-decoding draft-model' or '--auto-spec' (auto-pairs Llama 3.1/3.3/4, Qwen 2.5/3, Mistral Large, Mixtral, DeepSeek V3/R1, Gemma 2/3). Structured output via '--structured-output json --json-schema s.json' or '--structured-output regex --regex-pattern ...'. vLLM prefix caching for RAG/agent workloads via '--prefix-cache'. Dynamic LoRA hot-swap via 'POST /v1/adapters/activate/<name>'. Live continuous-batching dashboard plus '/metrics' endpoint via '--dashboard'. OpenTelemetry tracing via '--trace --trace-endpoint http://localhost:4317'.

Question 11

Does Soup CLI default to safe model loading?

Accepted Answer

Yes (v0.36.0 'Correctness First'). 'soup train', 'chat', 'serve', 'data download', 'eval auto' now require '--trust-remote-code' to load any HF model that ships custom Python (auto_map in config.json). First-party orgs (Meta, Mistral, Qwen, Google, etc.) suppress the warning panel; everything else prints a REMOTE CODE WARNING before loading. Tokenizers without a chat template raise a ValueError instead of silently building garbage strings. Raw Jinja chat-template strings reject filesystem-touching directives (include/import/from/macro/extends) at config-load.

Question 12

What license is Soup CLI under?

Accepted Answer

Soup CLI is Apache-2.0 licensed as of v0.29.0 (previously MIT). Downstream redistributors must retain the NOTICE file per §4(d).

Question 13

Does Soup CLI support Apple Silicon?

Accepted Answer

Yes. Soup v0.25.0 added an MLX backend for M1–M4 Macs. Install with 'pip install soup-cli[mlx]', set 'backend: mlx' in soup.yaml, and run SFT, DPO, or GRPO natively on unified memory without CUDA.

Question 14

How do I fine-tune Llama with Soup CLI?

Accepted Answer

Install with 'pip install soup-cli', then run 'soup recipes use llama3.1-8b-sft' to drop a vetted soup.yaml, point 'data.train' at your dataset, and run 'soup train'. Soup ships 112 recipes covering Llama 3.1/3.2/4 Scout, Qwen 2.5/3, Gemma 3, Mistral, Phi-4, DeepSeek R1/V3.

Question 15

What training methods does Soup CLI support?

Accepted Answer

19 methods: SFT, DPO, GRPO (with RLVR verifiable rewards for math/code/JSON), PPO, KTO, ORPO, SimPO, IPO, BCO, Pretrain, Embedding, Reward Model, and the unified 'preference' dispatcher (set 'training.preference_loss: dpo|simpo|orpo|ipo|bco' to swap the loss without renaming the task). PEFT options include LoRA, QLoRA, DoRA, rsLoRA, VeRA, OLoRA, plus PiSSA SVD init, ReLoRA magnitude-prune cycles, and per-pattern rank/alpha overrides.

Question 16

How do I install Soup CLI?

Accepted Answer

Install from PyPI: 'pip install soup-cli'. Requires Python 3.9+. Optional extras: 'soup-cli[fast]' for Unsloth 2-5x speedup, 'soup-cli[mlx]' for Apple Silicon, 'soup-cli[serve]' for the inference server, 'soup-cli[ui]' for the web dashboard.

Question 17

What models does Soup CLI support?

Accepted Answer

Soup CLI ships 112 recipes spanning text (Llama 3.1/3.2/4 Scout + Maverick, Qwen 2.5, Qwen 3 8B/14B/32B + 30B MoE + 235B-A22B, Gemma 3, Mistral, Mixtral 8x7B/8x22B, Phi-4, DeepSeek R1 + V3), vision (Llama-3.2-Vision 11B/90B, Pixtral 12B, Qwen2-VL 7B/72B, InternVL 2.5, MiniCPM-V 2.6), audio (Qwen2-Audio, SeamlessM4T v2, Whisper-large-v3), reasoning (all 6 DeepSeek-R1-Distill sizes, Qwen3-Coder 30B, Phi-4 reasoning), small/edge (SmolLM2 135M-1.7B, Phi-3.5-mini, Llama-3.2 1B/3B), and domain specialists (BioMistral, Meditron, CodeLlama, Magicoder, Mathstral, Nemotron-4 340B). Works with any of the 340,000+ text-generation models on Hugging Face Hub.

Question 18

What new modalities did Soup add in v0.52 'Modality II'?

Accepted Answer

v0.52.0 adds three new task families plus BitNet quant and reasoning-effort dispatch. (1) TTS: 'task: tts' + 'modality: audio_out' with 5 family allowlists (Orpheus, Sesame-CSM, Llasa, Spark, Oute) and per-family emotion vocabularies. (2) Classification heads: 'task: classifier' / 'reranker' / 'cross_encoder' with 'num_labels' + 'label_names' validators. (3) Distillation: 'task: distill' + 'teacher_model' + 'distill_divergence' (kl / forward_kl / reverse_kl / js) + 'distill_temperature ∈ [0.05, 100]'. Plus 'quantization: bitnet_1.58' (text, sft/pretrain/dpo only — non-MLX), 'reasoning_effort: low|medium|high' for gpt-oss reasoning dispatch, EBFT (structured/strided) + GDPO variants, and MoE expert quant ('moe_expert_quant: nf4|int8_rowwise', 'train_router_only').

Question 19

What is GRPO Plus (v0.50) and what variants ship?

Accepted Answer

v0.50.0 'GRPO Plus' brings full unsloth + axolotl parity for GRPO. Pick a variant with 'training.grpo_variant ∈ {standard, gspo, dapo, dr_grpo, bnpo, two_sided, rft}'. Pair 'two_sided' with 'grpo_delta ∈ (0, 1]' for asymmetric clipping. 'grpo_fp16: true' opts into explicit FP16 mixed precision. Long-context GRPO ('long_context_grpo'), vision GRPO ('vision_grpo'), async rollout prefetch ('async_grpo_prefetch'), reference-model EMA ('ref_model_ema_alpha'), replay buffer ('replay_buffer_size'), truncated-completion masking, and zero-advantage skipping all land as first-class schema fields. New rollout backends 'art' / 'ruler' / 'nemo_gym' / 'openenv' for multi-turn agentic RL. Task list now includes 'prm' (Process Reward Model).

Question 20

Does Soup CLI ship a plugin system?

Accepted Answer

Yes (v0.45.0). 'soup plugins list/install/enable/disable' manages a public plugin / hook system built on the BasePlugin Protocol with PluginSpec registration (name + version validation). v0.45.0 also added an OpenAI ↔ Anthropic Messages converter, server-side tools allowlist with WebSearchConfig, n-gram speculative-decoding schema, a 15-entry external integrations catalog, advanced trainer-plugin allowlist, and a Data Recipe DAG parser with topological sort. v0.46.0 layered Agent Forge on top: parse OpenAPI 3.x / MCP manifests / GraphQL introspection into tool-calling SFT datasets via 'soup agent synth' + 'soup agent train' + 'soup agent eval', plus 'soup deploy autopilot' for picking PEFT + quant + spec-decoding combos against a 10-profile hardware catalog.

Question 21

What long-context features ship in Soup v0.49?

Accepted Answer

v0.49.0 adds three RoPE-scaling strategies for 128k+ context fine-tuning. YaRN (yet-another-RoPE-extension) with 4 schema fields and the full math kernel. Dynamic NTK scaling. LongLoRA S² shifted-sparse attention (schema-only in v0.49.0; forward-pass override lands in v0.49.1). Llama 3.1 NTK-aware scaling. All gated to compatible architectures via 'validate_longlora_compat' / 'is_llama_model'. Pair with 'task: pretrain' or 'task: sft' and Multipack (v0.37.0) to keep variable-length samples efficient on long-context runs.

Question 22

What is Soup Data Forge?

Accepted Answer

v0.47.0 'Data Forge & Quality Moat'. 'soup data forge' is a synthetic-data pipeline: chunk → judge → active-prune → write JSONL with provenance. 'soup data score' runs a composite data-quality scorecard combining PII detection, toxicity scoring, language detection, educational-value scoring, and benchmark decontamination. Each filter is also addressable individually: 'soup data decontaminate' (n-gram benchmark overlap), 'soup data toxicity', 'soup data langdetect', 'soup data pii' (email / phone / SSN / credit-card patterns), 'soup data educational'. Toxicity uses a keyword baseline today; Llama-Guard backend lands in v0.47.1.

Question 23

Can I use experiment trackers other than W&B and TensorBoard?

Accepted Answer

Yes (v0.43.0 'Tracker & Eval Pro'). The '--tracker' flag accepts a closed allowlist: wandb, tensorboard, mlflow, swanlab, trackio, none. It is mutually exclusive with the legacy '--wandb' / '--tensorboard' flags via 'resolve_report_to'. v0.43.0 also ships pure-Python BLEU + ROUGE-1/2/L + effective_tokens_per_second NLG metrics, KL-divergence calibration with OK/MINOR/MAJOR classification (mirroring quant-check thresholds), and an Elo Tournament arena ('eval/arena.py' — 256-model cap, 1M-match cap). 'soup data demo' bundles 4 example JSONL fixtures (alpaca / sharegpt / dpo / grpo) for instant onboarding.

Question 24

Which optimizers does Soup CLI support?

Accepted Answer

33+ optimizers across HF-native, bitsandbytes-backed, and the v0.41.0 'Optimizer Zoo' additions: BAdam, APOLLO (apollo_adamw), Adam-mini, lomo / adalomo, grokadamw, schedule_free_adamw / schedule_free_sgd, muon, dion, came_pytorch, plus TorchAO ao_adamw_{fp8,4bit,8bit}. Set 'training.optimizer' to any name on the closed 'SUPPORTED_OPTIMIZERS' allowlist — invalid names fail at config-load with an actionable message. v0.41.0 also adds per-module LR overrides via 'training.lr_groups' (≤32 entries; pattern → LR map with first-match-wins routing), LoftQ quantization-aware LoRA init ('init_strategy: loftq' + 'loftq_iter' + 'loftq_bits'), LLaMA Pro block expansion schema ('expand_layers' + 'freeze_trainable_layers'), and friendly 'load_in_8bit' / 'load_in_16bit' aliases.

Question 25

Can I migrate from LLaMA-Factory, Axolotl, or Unsloth to Soup CLI?

Accepted Answer

Yes. Run 'soup migrate --from llamafactory config.yaml', 'soup migrate --from axolotl config.yml', or 'soup migrate --from unsloth notebook.ipynb' to automatically convert your existing training config or notebook to Soup format.

Format	Bits	Use case	Optional dep
`4bit`	4	Default. Best general LoRA training.	bitsandbytes
`8bit`	8	Larger memory budget, more accurate gradients.	bitsandbytes
`none`	16/32	Full fine-tuning or DPO/PPO without quant.	—
`gptq`	2/3/4/8	Train LoRA on top of an existing GPTQ checkpoint.	gptqmodel
`awq`	4	Train LoRA on top of an existing AWQ checkpoint.	autoawq
`hqq:Nbit`	1, 2, 3, 4, 5, 6, 8	Wide bit range; compose with LoRA.	hqq
`aqlm`	2	Extreme compression.	aqlm
`eetq`	8	Fast 8-bit kernel for SM75+.	eetq
`mxfp4`	4	Newer 4-bit type with better activation distribution.	bitsandbytes ≥ 0.45
`fp8`	—	Train fp16/bf16 on top of FP8-released checkpoints.	transformers ≥ 4.45

Quant Menu — 9 Quantization Formats (v0.38.0)

Format matrix

Compatibility matrix

Pre-quantized + QAT

Scope