HuggingFace Hub Deep Integration

Soup v0.29.0 closes the loop between training and the Hub. Every checkpoint can auto-push as a branch, training can resume from the Hub after a crash, models deploy to a Gradio or Streamlit Space in one command, and self-hosted Hubs are supported via HF_ENDPOINT.

Auto-push every checkpoint

bash
soup train --push-as alpamys/chat-llama

Every save_steps checkpoint is uploaded to HuggingFace Hub as a checkpoint-<N> branch. The main branch is updated only at training completion. A sticky _repo_failed flag means push failures (bad token, rate limit) short-circuit subsequent saves — no log spam, no training crash.

The upload uses an explicit allow_patterns allowlist (*.safetensors, *.bin, *.pt, *.json, tokenizer*, trainer_state.json, training_args.bin, README.md) — an output_dir that overlaps with your project root cannot accidentally publish .env or source files.

Resume from the Hub

bash
soup train --push-as alpamys/chat-llama --hf-resume

--hf-resume pulls the latest checkpoint-<N> branch via snapshot_download, places it in output_dir (containment-checked against cwd), and resumes. local_dir_use_symlinks=False defeats symlink-based filesystem escapes on older huggingface_hub versions.

Use this when a spot GPU dies, a preemption kicks you off, or you want to migrate training between machines.

HF Collections

bash
soup push --model ./output \
  --repo alpamys/chat-llama \
  --collection alpamys/kazakh-llms-abc123

Adds the pushed repo to an existing HuggingFace Collection. The slug follows the owner/slug-hash regex; null bytes, whitespace, and .. are rejected. Max 256 chars.

Self-hosted Hub (`HF_ENDPOINT`)

bash
HF_ENDPOINT=https://hf.internal.example.com soup train --push-as team/my-model

Every HF operation routes to your internal Hub. SSRF hardening:

  • Scheme allowlist (http/https only)
  • Plain HTTP permitted only for loopback (localhost / 127.0.0.1 / ::1)
  • 0.0.0.0 explicitly rejected
  • Private / link-local / cloud-metadata IPs (RFC1918, 169.254.x) rejected via ipaddress.ip_address
  • Null bytes rejected

Publish datasets

bash
soup data push --input data.jsonl --hf-dataset alpamys/kazakh-chat

Uploads a local JSONL as a HuggingFace dataset repo. Input path must stay under cwd, repo_id is validated, and a token is resolved via the unified utils/hf.resolve_token chain. Commit messages are stripped to the first line and capped at 200 chars to prevent multi-line injection into public HF commit history.

Deploy to HF Spaces

bash
soup deploy hf-space \
  --model alpamys/chat-llama \
  --space alpamys/chat-llama-demo \
  --template gradio-chat   # or: streamlit-chat

Creates a Space wrapping your fine-tuned model. render_space_template validates model_repo via validate_repo_id before substituting into app.py / README.md — a crafted repo id cannot inject Python code into the deployed Space.

Model card v2

Generated READMEs now include:

  • Training config: task, base model, learning rate, optimizer
  • Optional eval scorecard from an eval_results.json
  • HTML-escaped data_lineage parameter
  • Markdown-active chars neutralized in task names and non-numeric score values (|, [, ], (, ), !, <, >, newlines, tabs) — no table-row / link / image / raw-HTML injection in the rendered README

Token resolution

utils/hf.resolve_token is the single source of truth:

1. Env: HF_TOKEN or HUGGINGFACE_HUB_TOKEN

2. ~/.cache/huggingface/token

3. ~/.huggingface/token

Non-printable tokens are rejected. The legacy soup push --token flag now warns deprecated and delegates to this chain.

`repo_id` validation

Applied to soup push --repo, soup train --push-as, soup data push --hf-dataset, soup deploy hf-space --model/--space:

  • Alphanumeric + ._- only
  • Per-component ≤ 96 chars
  • Total ≤ 200 chars
  • Null-byte / whitespace / .. / leading-/ rejection

License migration

As of v0.29.0, Soup is Apache-2.0 (previously MIT). Downstream redistributors must retain the NOTICE file per §4(d).

See also

  • [Registry](/docs/registry) — track every pushed run locally
  • [Eval-gated training](/docs/eval-gate) — catch regressions before pushing to the Hub
  • [CLI reference](/docs/cli-reference)