v0.53.2 — Live trainers for Modality II
v0.52.0 shipped the schema, allowlists, and validators. v0.53.2 lights up the trainers.
Live trainers
`task: distill`
DistillTrainerWrapper — student model + frozen teacher + per-token KL divergence. Independent trust_remote_code flags per side.
yaml
task: distill
base: meta-llama/Llama-3.2-1B
teacher_model: meta-llama/Llama-3.2-3B-Instruct
training:
distill_divergence: forward_kl # kl | forward_kl | reverse_kl | js
distill_temperature: 2.0 # [0.05, 100]`task: classifier | reranker | cross_encoder`
ClassifierTrainerWrapper covers single-label, multi-label, and cross-encoder heads.
yaml
task: classifier
training:
num_labels: 5
label_names: [toxic, severe, threat, obscene, identity_hate]Reasoning effort dispatch
yaml
training:
reasoning_effort: high # low | medium | high — gpt-oss familyInjects the <|reasoning_effort|> prefix into chat turns.
Assistant-only loss + EOT unmask
train_on_eot: true unmasks the EOT / EOS boundary token so the model learns end-of-turn behaviour without leaking the rest of the prompt into the loss.
EBFT + GDPO live
EBFT (structured / strided) and GDPO (standard / length_normalized / margin) loss kernels both land as drop-in losses behind their respective task schemas.
Tests
- 7,722 → 7,842 (+120)
See also
- [Modality II overview](/docs/modality-ii)
- [Training methods](/docs/training)