Optimizer Zoo (v0.41.0)
A closed allowlist of 33+ optimizers covering HF-native, bitsandbytes-backed, and v0.41 additions.
Supported optimizers
HF-native: adamw_torch, adamw_torch_fused, adafactor, sgd, and the rest of the HF set.
bitsandbytes-backed: adamw_bnb_8bit, paged_adamw_8bit, lion_8bit, etc.
New in v0.41:
- BAdam — block-coordinate descent for full-parameter fine-tuning
- APOLLO (
apollo_adamw) — gradient-scaled training in low rank - Adam-mini — half the optimizer memory of AdamW
- lomo / adalomo — low-memory optimization with optional adaptive scaling
- grokadamw — Grokfast-style AdamW
- schedule_free_adamw / schedule_free_sgd
- muon, dion, came_pytorch
- TorchAO
ao_adamw_fp8/ao_adamw_4bit/ao_adamw_8bit
training:
optimizer: muonvalidate_optimizer_name rejects non-string / empty / null-byte / >64-char inputs with actionable messages and lowercases for deterministic lookup. Unknown names fail at config-load.
Per-module LR groups
Map regex patterns → learning rates with first-match-wins routing.
training:
lr: 2e-4 # base / fallback
lr_groups:
- { pattern: "model\\.embed_tokens", lr: 5e-5 }
- { pattern: ".*lora_A.*", lr: 1e-3 }
- { pattern: ".*lora_B.*", lr: 1e-3 }Capped at 32 entries. Each pattern must compile via re.compile (ReDoS probe runs). Each lr ∈ (0, 1], finite-only (NaN / ±inf rejected), bool rejected. Duplicates rejected.
LoftQ quantization-aware LoRA init
training:
lora:
init_strategy: loftq # random | pissa | olora | loftq
loftq_iter: 5 # [1, 10]
loftq_bits: 4 # 2 | 4 | 8Builds the LoftQ config via peft (lazy import); incompatible with DoRA + VeRA.
LLaMA Pro block expansion
training:
expand_layers: 4 # [1, 64]
freeze_trainable_layers: 32 # required when expand_layers setSchema lands in v0.41.0; live patch ships in v0.41.1 (same stub-then-live pattern as v0.27.0 MII / v0.37.0 multipack).
Friendly load_in_X aliases
load_in_8bit: true remaps quantization → 8bit. load_in_16bit: true remaps to none. Mutually exclusive. Combining either with an explicit Quant Menu format raises rather than silently overriding.