Tool-Calling Fine-Tuning

Soup v0.25.0 adds an end-to-end pipeline for training models that call functions.

Data format

A new tool-calling format was added to soup_cli/data/formats.py:

json
{
  "messages": [{"role": "user", "content": "What's the weather in Tokyo?"}],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "parameters": {"type": "object", "properties": {"city": {"type": "string"}}}
      }
    }
  ],
  "tool_calls": [
    {"function": {"name": "get_weather", "arguments": "{\"city\": \"Tokyo\"}"}}
  ]
}

The detector expects messages, tools, and tool_calls to all be present. Tool definitions are embedded in the system message and tool_calls are emitted as assistant turns during normalization.

Generate tool-call training data

bash
soup data generate \
  --template tool-calling \
  --provider openai \
  --count 1000 \
  --out tool_calls.jsonl

The synth template lives in soup_cli/data/templates/tool_calling.py and is configurable across API domains (weather, search, database, filesystem).

Train

Use a ready-made recipe:

bash
soup recipes use qwen3-8b-tools
soup train

Or start from a recipe clone and edit:

yaml
base: Qwen/Qwen3-8B
task: sft

data:
  train: ./tool_calls.jsonl
  format: tool-calling

training:
  epochs: 3
  lr: 2e-4
  lora: { r: 16, alpha: 32 }

Evaluation

soup eval custom ships three tool-call scoring functions:

  • tool_call_match — exact function name + arguments
  • tool_call_name_match — function name only
  • tool_call_args_subset — partial credit for matching a subset of arguments

Recipes

  • qwen3-8b-tools — Qwen 3 8B, 4bit, LoRA r=16
  • llama4-scout-tools — Llama 4 Scout 17B, 4bit, LoRA r=16

See also

  • [Data formats](/docs/data-formats)
  • [Autopilot](/docs/autopilot) — --goal tool-calling picks this format automatically