v0.53.1 — Live writers for Quant Menu II

v0.53.0 shipped the closed allowlists and validators. v0.53.1 lifts every stub to live wiring.

TorchAO PTQ export

bash
soup export --format torchao --quant-config quant.yaml

Closed scheme allowlist with per-scheme kwarg validation:

  • Int4WeightOnly
  • Int8DynActInt4
  • Float8DynActFloat8
  • NVFP4 (Blackwell SM ≥ 12, runtime gate)

Single-shot BNB-4bit merge

bash
soup merge --save-format 4bit          # standard
soup merge --save-format 4bit_forced   # forced single-shot path

Writes a BNB-4bit quantized merged checkpoint without the wasteful dequant → merge → requant cycle (Unsloth merged_4bit recipe).

UD / IQ / Apple-ARM GGUF live

bash
soup export --format gguf-ud --gguf-flavour UD-Q4_K_XL --calibration-data calib.jsonl

3-stage pipeline: convert_hf_to_gguf.pyimatrixquantize. Calibration JSONL is required for any UD / IQ rung.

Autopilot pre-quantized detection

detect_prequantized_format(model_id) recognises TheBloke/...-GPTQ, -AWQ, -MLX, GGUF names. soup autopilot now recommends gptq instead of stacking 4bit on top of a pre-quantized checkpoint.

Deploy autopilot scorecard

bash
soup deploy autopilot --measure --tasks tasks.jsonl

Runs the picked recipe end-to-end and writes an OK / MINOR / MAJOR scorecard with a disk cache keyed on (model, hardware, tasks).

Tests

  • 7,610 → 7,722 (+112)

See also

  • [Quant Menu II reference](/docs/quant-menu-ii)
  • [Speed & Memory](/docs/training-speed-memory)