Fine-tune Llama 3.1 with LoRA using Soup CLI

This guide shows how to fine-tune Meta Llama 3.1 8B with LoRA adapters on a custom dataset using Soup CLI. End-to-end from install to inference in under 10 minutes on a single GPU.

Why LoRA on Llama 3.1?

LoRA (Low-Rank Adaptation) trains only a small fraction (~0.1%) of model parameters, which means:

  • Train 8B parameter Llama 3.1 on a single 24GB GPU (RTX 4090, A10)
  • Checkpoints are tiny (~100MB instead of 16GB)
  • Faster training and easier experimentation

1. Install

bash
pip install 'soup-cli[fast]'

The [fast] extra adds Unsloth for 2–5× training speedup on Llama-family models.

2. Prepare your dataset

Use the Alpaca format (JSON list of instruction/input/output triples):

json
[
  {
    "instruction": "Summarize the following text.",
    "input": "Soup CLI is a fine-tuning toolkit...",
    "output": "Soup CLI is an open-source LLM fine-tuning tool."
  }
]

Save as train.json.

3. Create the config

Save as llama31.yaml:

yaml
base:
  model: meta-llama/Meta-Llama-3.1-8B-Instruct

task: sft

data:
  train: train.json
  format: alpaca

training:
  backend: unsloth
  epochs: 3
  learning_rate: 2.0e-4
  batch_size: 2
  gradient_accumulation_steps: 8
  lora:
    enabled: true
    r: 16
    alpha: 32
    dropout: 0.05
    target_modules: [q_proj, k_proj, v_proj, o_proj]

4. Train

bash
soup train --config llama31.yaml

Soup auto-detects GPU, enables FlashAttention, and trains LoRA adapters. Expect ~20 minutes for 1k examples on an RTX 4090.

5. Chat with your fine-tuned model

bash
soup chat --adapter ./runs/llama31/latest

6. Export for deployment

bash
# Merge LoRA into base model and export GGUF for Ollama
soup export --adapter ./runs/llama31/latest --format gguf --quant q4_k_m

Troubleshooting

Out of memory? Enable QLoRA (4-bit base model):

yaml
training:
  quant: 4bit
  lora:
    enabled: true

Slow training? Ensure backend: unsloth is set and you installed soup-cli[fast].

Next steps

  • [Export to GGUF and Ollama](/docs/export-to-gguf-ollama)
  • [Multi-GPU training with DeepSpeed](/docs/multi-gpu-deepspeed)
  • [Training methods reference](/docs/training)