pull-llamacpp-model
Pull a llamacpp Model
This machine uses kyuz0/amd-strix-halo-toolboxes:rocm-7.2 for llamacpp inference (AMD Strix Halo / gfx1151 — not supported by the official ROCm build). Harbor's pull mechanism starts an ephemeral container with --n-gpu-layers 0; the custom image fails in that context without ROCm device access. Use the standard CPU image just for pulling, then restore.
Steps
1. Switch to the standard CPU image
harbor config set llamacpp.image.rocm ghcr.io/ggml-org/llama.cpp:server
2. Pull the model
harbor pull <hf-owner/model-repo:quantization>
# Examples:
harbor pull bartowski/Qwen2.5-7B-Instruct-GGUF:Q4_K_M
harbor pull unsloth/Mistral-Small-3.1-24B-Instruct-2503-GGUF:UD-Q4_K_XL
Harbor detects the HuggingFace model spec and routes it through llamacpp automatically.
3. Restore the custom image
harbor config set llamacpp.image.rocm kyuz0/amd-strix-halo-toolboxes:rocm-7.2
Notes
- The restore step is mandatory — without it, llamacpp will not use the GPU on the next
harbor up llamacpp - Extra args (
-fa 1 -dio --no-mmap --ctx-size 64000 --fit off) are stored separately in config and are not affected - To verify the model after pulling:
harbor llamacpp model - To verify the image was restored:
harbor config get llamacpp.image.rocm
More from av/skills
run-llms
Comprehensive guide for setting up and running local LLMs using Harbor. Use when user wants to run LLMs locally, set up or troubleshoot Ollama, Open WebUI, llama.cpp, vLLM, SearXNG, Open Terminal, or similar local AI services. Covers full setup from Docker prerequisites through running models, per-service configuration, VRAM optimization, GPU troubleshooting, web search integration, code execution, profiles, tunnels, and advanced features. Includes decision trees for autonomous agent workflows and step-by-step troubleshooting playbooks.
16preact-buildless-frontend
Build-less ESM frontends that run directly in the browser without bundlers. Use this skill when creating static frontends, SPAs without build tools, prototypes, or when the user explicitly wants no Vite/Webpack/bundler. Covers import maps, CDN imports, cache-busting, hash routing, and performance patterns.
12turso-db
Install, configure, and work with Turso DB — an in-process SQLite-compatible relational database engine written in Rust. Use when the user needs to (1) install Turso DB, (2) create or query databases with the tursodb CLI shell, (3) use Turso from JavaScript/Node.js via @tursodatabase/database, (4) work with vector search or embeddings in Turso, (5) set up full-text search with FTS indexes, (6) configure transactions including MVCC concurrent transactions, (7) enable encryption at rest, or (8) use Change Data Capture (CDC) for audit logging.
8bugbash
Systematically explore and test any software project (CLI, API, Backend, Library, etc.) to find bugs, usability issues, and edge cases. Produces a structured report with full reproduction evidence (exact commands, inputs, logs, and tracebacks) for every issue.
5agent-integration-testing
Use when the user requests integration testing, feature validation, or test plan execution
4timeboxed-iterating
Use when the user specifies a task and a duration, and the work should be done iteratively by subagents over that time period
3