Long Context (v0.49.0)

Three RoPE-scaling strategies for 128k+ context fine-tuning.

YaRN

yaml
training:
  rope_scaling:
    type: yarn
    factor: 8.0
    original_max_position_embeddings: 8192
    beta_fast: 32
    beta_slow: 1

Full math kernel ships in v0.49.0.

Dynamic NTK

yaml
training:
  rope_scaling:
    type: dynamic
    factor: 4.0

LongLoRA S² shifted-sparse attention

yaml
training:
  longlora_s2: true
  longlora_s2_group_size: 2048

Schema-only in v0.49.0. Forward-pass override lands in v0.49.1.

Llama 3.1 NTK-aware scaling

Set the standard rope_scaling: { type: llama3, ... } and Soup wires up the Llama 3.1-style NTK-aware schedule.

Gates: validate_longlora_compat + is_llama_model reject incompatible architectures at config-load. Pair with [Multipack](/docs/multipack) to keep variable-length samples efficient on long-context runs.