`soup loop` — the production data flywheel, all from the CLI

NVIDIA's data-flywheel reference needs a multi-service stack. Observability vendors (Langwatch, Helicone, Galileo) monetize per-trace and have zero upside pushing customers downstream into training. OpenPipe tried this exact business and pivoted to RL agents before CoreWeave acquired it.

soup loop (v0.58.0) ships it as one CLI on a laptop. It connects 8 of Soup's existing uniques — v0.26 Trace-to-Preference, v0.26 Eval-Gated Training, v0.26 Registry lineage, v0.26 Quant-Lobotomy verdicts, v0.26 Soup Cans, v0.25 Autopilot, v0.54 Advise, v0.55 Eval Design, v0.56 Diagnose — into a single loop:

> production traces → preference pairs → eval-gated DPO → canary deploy → auto-rollback

Six subcommands

bash
soup loop init  <served-model> --eval <suite> --baseline registry://<id> \
                --monthly-budget 50usd --max-runs-per-day 3
soup loop status
soup loop watch [--foreground|--detach] [--max-iterations N] [--poll-interval F]
soup loop pause / resume
soup loop canary <adapter> --traffic 5% [--autoroll-on-regress]
soup loop replay [<iteration-id>]

Control plane

The full loop state lives in a single atomic file: .soup/loop.yaml (JSON-formatted via tempfile.mkstemp + os.replace, cwd-containment, direct os.lstat symlink rejection, 1 MiB cap, POSIX 0o600).

Frozen LoopState schema:

  • served_model / eval_suite / baseline / status (one of {running, paused, stopped})
  • 6 counters (traces_harvested / pairs_distilled / runs_started / runs_shipped / runs_rolled_back / runs_skipped_by_budget)
  • canary + budget + daily-cap metadata
  • iteration metadata

to_dict returns a MappingProxyType so callers can't mutate state through it. Only with_status and bumped are sanctioned mutators.

Canary router

soup loop canary <adapter> --traffic 5% --autoroll-on-regress splits traffic via deterministic SHA-256 hash routing:

  • 4-byte slice mod _HASH_MOD = 10_000 → ±0.01% split granularity
  • Per-request route(policy, request_key) is pure
  • Verdict via BucketStats.verdict() returns OK / MAJOR / UNKNOWN using v0.26 Quant-Lobotomy thresholds (5-pct regression band, min 30 samples)
  • BucketStats is mutable with threading.Lock for live request streams
  • Cross-field validation: canary == stable rejected, traffic_pct ∈ [0, 100]
  • rollback returns a new policy with the canary cleared (sticky-on-rollback)

Budget guardrails

parse_budget_string accepts "50usd" / "50" / "50 USD" (≤ $1M hard cap). Each iteration runs through check_budget → frozen BudgetDecision(proceed, reason, projected_total_usd, runs_today). Check order:

1. daily-cap (UTC-day rollover via reset_daily_counter_if_new_day)

2. estimate-sanity — refuse implausible cost estimates

3. monthly-budget — projected total vs configured cap

Budget-skipped iterations produce no manifests — no half-records to confuse soup loop replay.

Watch daemon

soup loop watch orchestrates 5 stage callables: HarvestFn / TrainFn / GateFn / DeployFn / CostFn. The default stub bindings are no-ops; live wiring to v0.26 traces, the v0.55 eval-gate, and the v0.30 /v1/adapters/activate hot-swap endpoint is operator-driven via WatchConfig.

  • run_once(state, config) is pure with respect to time (testable).
  • watch(config) is the long-running daemon. Installs SIGTERM / SIGINT handlers.
  • Reloads state every iteration so external pause / resume takes effect immediately without restarting the daemon.
  • --detach spawns python -m soup_cli.cli loop watch --foreground via argv-list subprocess.Popen — no shell, no string interpolation.
  • maybe_rollback fires only on "MAJOR" canary verdict.

Iteration manifests

Every iteration writes a frozen IterationRecord to .soup-loops/<iteration_id>/iteration.json:

text
iteration_id        20260515T231600-a1f3b2c4
started_at          2026-05-15T23:16:00Z
finished_at         2026-05-15T23:25:12Z
pairs_harvested     89
run_id              run-8f3
gate_verdict        OK            # OK | MAJOR | SKIPPED
canary_verdict      OK            # OK | MAJOR | UNKNOWN | None
shipped             true
rolled_back         false
estimated_cost_usd  2.40
notes               []

new_iteration_id = UTC timestamp + 8-hex uuid.uuid4(). soup loop replay walks the directory in chronological order.

Known limitations

  • Stage callbacks default to no-op stubs — live trace ingestion, registry baseline auto-pick, and /v1/adapters/activate rollout are operator-driven via WatchConfig.
  • Soup Can packaging of each iteration is deferred to v0.58.1.
  • --detach is a single-process subprocess; full daemonization (double-fork, session leader, /dev/null fds) is deferred.

See also

  • [Trace-to-preference](/docs/trace-to-preference) — the HarvestFn half of the loop
  • [Eval-gated training](/docs/eval-gate) — the GateFn half
  • [Registry](/docs/registry) — where baseline and shipped runs live
  • [Diagnose](/docs/diagnose) — pair with soup train --diagnose-gate for a second safety net
  • [Advise](/docs/advise) — what to run *before* a loop iteration