v0.53.1 — Live writers for Quant Menu II
v0.53.0 shipped the closed allowlists and validators. v0.53.1 lifts every stub to live wiring.
TorchAO PTQ export
bash
soup export --format torchao --quant-config quant.yamlClosed scheme allowlist with per-scheme kwarg validation:
Int4WeightOnlyInt8DynActInt4Float8DynActFloat8NVFP4(Blackwell SM ≥ 12, runtime gate)
Single-shot BNB-4bit merge
bash
soup merge --save-format 4bit # standard
soup merge --save-format 4bit_forced # forced single-shot pathWrites a BNB-4bit quantized merged checkpoint without the wasteful dequant → merge → requant cycle (Unsloth merged_4bit recipe).
UD / IQ / Apple-ARM GGUF live
bash
soup export --format gguf-ud --gguf-flavour UD-Q4_K_XL --calibration-data calib.jsonl3-stage pipeline: convert_hf_to_gguf.py → imatrix → quantize. Calibration JSONL is required for any UD / IQ rung.
Autopilot pre-quantized detection
detect_prequantized_format(model_id) recognises TheBloke/...-GPTQ, -AWQ, -MLX, GGUF names. soup autopilot now recommends gptq instead of stacking 4bit on top of a pre-quantized checkpoint.
Deploy autopilot scorecard
bash
soup deploy autopilot --measure --tasks tasks.jsonlRuns the picked recipe end-to-end and writes an OK / MINOR / MAJOR scorecard with a disk cache keyed on (model, hardware, tasks).
Tests
- 7,610 → 7,722 (+112)
See also
- [Quant Menu II reference](/docs/quant-menu-ii)
- [Speed & Memory](/docs/training-speed-memory)