Soup Cans v2 + Live LR Finder (v0.33.0)
v0.33 graduates several v0.27–v0.32 stubs into live functionality and ships meaningful follow-ups across the whole stack.
Soup Cans v2 — `soup can run` + `soup can publish`
# Run a .can end-to-end: extract → train → optional deploy
soup can run my-recipe.can --yes
soup can run my-recipe.can --yes --deploy --env-capture env.txt
# Publish to HF Hub as a dataset
soup can publish my-recipe.can --hf-hub user/my-recipesoup can run requires explicit --yes (mandatory consent — auto-downloads data + auto-trains). Manifest format bumped 1 → 2 (additive: new deploy_targets field). Both v1 and v2 cans still load.
Security — extract dir is containment-checked, GGUF rglob result for ollama deploy is realpath+commonpath checked against extract dir to prevent symlink escape, subprocess TimeoutExpired after 24h cap returns rc=124 (coreutils convention). soup can publish validates repo_id, resolves the HF token via env / cache files, commit messages first-line + 200-char capped (matches v0.29 push policy).
Live `--find-lr` training loop
The v0.32 stub curve is replaced with a real in-process LR-sweep training loop. NaN/Inf loss terminates the sweep early so diverged_at is honest. Falls back to a synthetic curve when prerequisites are missing (no torch / config load failure / dataset empty) so CI without GPUs still produces a parseable report.
soup train --config soup.yaml --find-lr --find-lr-output ./lr_finder.jsonSpike-recovery hint file
Loss-spike recovery now writes a spike_recovery.json hint with the decayed LR for re-launch:
{"original_lr": 2e-4, "recovery_lr": 5e-5, "decay_factor": 0.25, "trigger_step": 482}Live optimizer-state rewind and live DataLoader rebuild remain follow-ups (HF Trainer / TRL upstream constraints).
VRAM grad-accum advisory (live)
When VRAM pressure crosses the threshold, the advisory now prints a concrete recommended (batch, accum) pair preserving effective batch:
[advisory] VRAM at 94% — try batch=2, accum=8 (preserves effective batch=16)One-shot per run; doesn't fire when CUDA is unavailable.
Auto-reexec under `accelerate`
soup train --gpus N now auto-reexecs under accelerate launch instead of just printing the command. Critical flags (--fsdp, --deepspeed, --resume, --wandb, --tensorboard, --yes) are forwarded as separate argv elements. Use --no-reexec to opt out and just print the command.
soup train --config soup.yaml --gpus 4 # auto-reexec
soup train --config soup.yaml --gpus 4 --no-reexec # print command onlyRegistry artifact attach
# Attach an eval JSON to an existing registry entry
soup eval custom --tasks evals/sanity.jsonl --model ./output \
--attach-to-registry chat-llama@v1
# Auto-attach exported artifact to registry
soup export --model ./output --format gguf --registry-id chat-llama@v1eval_results and tensorrt are now valid artifact kinds. lookup_entry_by_output_dir emits ResourceWarning when its 1000-row scan limit is hit (no silent miss).
v0.28 features expanded to DPO + Pretrain trainers
use_cut_ce, quantization_aware: "fp8", kernel_auto_compose, and activation_offloading now work on DPO and Pretrain trainers (in addition to SFT from v0.28). GRPO/KTO/ORPO/SimPO/IPO/PPO/RewardModel/Embedding still error at config-load with a precise multi-trainer message — full expansion arrives in v0.35.
RLVR OS-level isolation
GRPO's code_exec reward gains real OS-level sandboxing on top of the v0.25 RLIMIT/socket-patch baseline:
- Linux — best-effort
os.unshare(CLONE_NEWUSER|CLONE_NEWNET|CLONE_NEWPID). Falls back silently to RLIMIT + socket-patch on hardened kernels (unprivileged_userns_clone=0). - macOS —
sandbox-execwrapper with default-deny profile.(allow mach-lookup)narrowed to a 3-name allowlist (SecurityServer / notification_center / opendirectoryd.libinfo) to prevent DNS/NSURLSession bypass.
See also
- [Soup Cans](/docs/soup-cans) — pack, fork, run, publish
- [Training stability](/docs/training-stability) — LR finder + spike recovery
- [Multi-GPU](/docs/multi-gpu) — auto-reexec
- [Registry](/docs/registry) — artifact attach