Apple Silicon MLX Backend

Soup v0.25.0 adds a native MLX training backend for M1–M4 Macs. SFT, DPO, and GRPO run on unified memory without CUDA, Rosetta, or x86 emulation.

Install

bash
pip install 'soup-cli[mlx]'

This pulls mlx>=0.20 and mlx-lm>=0.20 as optional dependencies.

Enable the backend

Set backend: mlx in soup.yaml, or pass --backend mlx to soup train:

yaml
base: mlx-community/Llama-3.1-8B-Instruct-4bit
backend: mlx
task: sft

data:
  train: data.jsonl
  format: chatml

training:
  epochs: 3
  lr: 1e-4
  batch_size: 2
  lora:
    r: 16
    alpha: 32

Supported tasks

| Task | MLX | Notes |

|---|:---:|---|

| sft | ✓ | LoRA and QLoRA (mlx-community 4bit models) |

| dpo | ✓ | Frozen reference model |

| grpo | ✓ | Works with built-in reward functions |

| ppo | – | Not yet — use CUDA backend |

| reward_model | – | Not yet |

| embedding | – | Not yet |

| pretrain | – | Not yet |

Diagnostics

bash
soup doctor

On Apple Silicon this reports MLX version, chip name, unified memory, and a recommended batch size.

MLX-native recipes

  • llama3.1-8b-sft-mlx — M2+ 16GB
  • qwen3-8b-sft-mlx — M2+ 16GB
  • gemma3-9b-sft-mlx — M2+ 16GB

Use any of them with soup recipes use <name>.

Limitations

  • Single-device only — no distributed MLX training.
  • Base models must be MLX-format (typically from the mlx-community HF org).
  • bitsandbytes is unused on this backend — quantization comes from the MLX model itself.
  • Unsloth backend is not MLX-compatible (they're separate execution paths).