v0.53.2 — Live trainers for Modality II

v0.52.0 shipped the schema, allowlists, and validators. v0.53.2 lights up the trainers.

Live trainers

`task: distill`

DistillTrainerWrapper — student model + frozen teacher + per-token KL divergence. Independent trust_remote_code flags per side.

yaml
task: distill
base: meta-llama/Llama-3.2-1B
teacher_model: meta-llama/Llama-3.2-3B-Instruct
training:
  distill_divergence: forward_kl   # kl | forward_kl | reverse_kl | js
  distill_temperature: 2.0          # [0.05, 100]

`task: classifier | reranker | cross_encoder`

ClassifierTrainerWrapper covers single-label, multi-label, and cross-encoder heads.

yaml
task: classifier
training:
  num_labels: 5
  label_names: [toxic, severe, threat, obscene, identity_hate]

Reasoning effort dispatch

yaml
training:
  reasoning_effort: high   # low | medium | high — gpt-oss family

Injects the <|reasoning_effort|> prefix into chat turns.

Assistant-only loss + EOT unmask

train_on_eot: true unmasks the EOT / EOS boundary token so the model learns end-of-turn behaviour without leaking the rest of the prompt into the loss.

EBFT + GDPO live

EBFT (structured / strided) and GDPO (standard / length_normalized / margin) loss kernels both land as drop-in losses behind their respective task schemas.

Tests

  • 7,722 → 7,842 (+120)

See also

  • [Modality II overview](/docs/modality-ii)
  • [Training methods](/docs/training)