Correctness First (v0.36.0)
Four silent-failure modes Soup had → loud failures.
Assistant-only loss masking
By default, Soup masks every non-assistant token with -100 so the SFT loss reflects only what the model should *generate*. Toggle via data.train_on_responses_only (default true):
data:
train: data.jsonl
train_on_responses_only: true # default
# OR per-message control:
# train_on_messages_with_train_field: trueWhen the tokenizer ships a chat template with {% generation %} markers, the mask is exact. Without those markers, Soup falls back to an incremental tokenize-delta walk and documents the looseness.
`--trust-remote-code` opt-in
soup train, chat, serve, data download, eval auto now require --trust-remote-code to load any HF model that ships custom Python (auto_map in config.json).
soup train --config soup.yaml --trust-remote-codeFirst-party orgs (Meta, Mistral, Qwen, Google, etc. — 15 in the allowlist) suppress the warning panel; everything else prints a REMOTE CODE WARNING panel before loading.
Chat-template hardening
Tokenizers without a chat template now raise a ValueError with a fix suggestion instead of silently building garbage f"{role}: {content}" strings.
data:
train: data.jsonl
chat_template: chatml # or: llama3, qwen2.5, mistral, gemma3, phi4, deepseek-r1, or a raw Jinja stringRaw Jinja strings are validated: null bytes, >64 KB, and filesystem-touching directives ({% include %}, {% import %}, {% from %}, {% macro %}, {% extends %}) are rejected at config-load.
OOM-probe auto batch size
training:
batch_size: auto # unchanged
auto_batch_size_strategy: probe # NEW: 'static' | 'probe' | 'auto' (default)Replaces the static memory formula with a real try-halve-then-double-to-ceiling loop. Picked size is cached at ~/.soup/batch_cache.json keyed on (model, max_length, quantization, lora_r, gpu_name, gpu_memory_gb) so repeat runs short-circuit. Cache file gets best-effort 0o600 perms after atomic rename.