skills/gaojizhou/skills/local-llm-privacy

local-llm-privacy

SKILL.md

Local LLM Privacy Skill

Handle AI tasks involving private or sensitive data by routing them to a local Ollama model instead of the cloud. This protects user data by never sending it to external APIs.


Step 1 — Confirm the Privacy Requirement

Before doing anything, acknowledge why local processing matters here. Say something like:

"Since this data is sensitive, I'll try to handle it using a local model on your machine so nothing gets sent to the cloud."

Then proceed to Step 2.


Step 2 — Detect Ollama and Available Models

Run the following bash commands to check for Ollama:

# Check if ollama is installed and running
ollama list 2>/dev/null || echo "OLLAMA_NOT_FOUND"

Parse the output into three possible states:

State Condition
A — Available ollama list returns a model list
B — Installed but not running ollama command exists but connection refused → try ollama serve & then retry
C — Not installed OLLAMA_NOT_FOUND or command not found

Step 3 — Model Selection (State A: Ollama running)

Read the model list carefully. Select the best available model for the task using the capability matrix below. If multiple models qualify, prefer larger/more capable ones.

Consult references/model-capabilities.md for the full model reference table.

3a. Task Type — Check First

Some models simply cannot do certain tasks regardless of size:

  • Image/vision tasks → requires a vision-capable model (llava, bakllava, moondream, minicpm-v, etc.). A text-only model (mistral, llama, phi, gemma, qwen text variants) cannot process images — tell the user immediately.
  • Code generation → prefer codellama, deepseek-coder, qwen2.5-coder, starcoder
  • Embeddings/semantic search → prefer nomic-embed-text, mxbai-embed, all-minilm
  • General text → any instruct/chat model works

3b. Model Size — Check Second

Larger = more capable for complex tasks:

Size Range Example Models Suitable For
< 3B phi3:mini, qwen2:1.5b, smollm Simple Q&A, short summaries, keyword extraction only
3B–7B phi3:medium, llama3.2:3b, mistral:7b Summaries, classification, basic analysis
8B–13B llama3.1:8b, mistral-nemo Most professional tasks, structured extraction, code review
14B–34B qwen2.5:14b, codellama:34b Complex reasoning, nuanced writing, long documents
70B+ llama3.1:70b, qwen2.5:72b Near cloud-quality, nearly any text task

Infer size from model name tag: :1b/:2b → tiny, :7b/:8b → medium, :13b/:14b → large, :70b/:72b → very large. No tag or :latest → assume default for that family (usually 7–8B).

3c. When No Good Match Exists

If models are too small for the task:

"Your available local model ({model_name}, ~{size}B params) may struggle with this task because {reason}. Results may be incomplete or unreliable. Options: proceed anyway, pull a larger model (ollama pull llama3.1:8b), or use a cloud model."

If task needs vision but no vision model exists:

"This task involves images, but none of your local models support vision. Run ollama pull llava or ollama pull moondream to process images locally. Or I can use a cloud model if you consent."


Step 4 — Call the Local Model

Once a model is selected, send the task via the Ollama REST API:

Text generation:

curl -s http://localhost:11434/api/generate \
  -H "Content-Type: application/json" \
  -d '{
    "model": "<selected_model>",
    "prompt": "<constructed_prompt>",
    "stream": false
  }'

Chat-style (with history):

curl -s http://localhost:11434/api/chat \
  -H "Content-Type: application/json" \
  -d '{
    "model": "<selected_model>",
    "messages": [{"role": "user", "content": "<prompt>"}],
    "stream": false
  }'

Vision tasks (vision model required):

BASE64_IMG=$(base64 -w 0 /path/to/image.jpg)
curl -s http://localhost:11434/api/generate \
  -H "Content-Type: application/json" \
  -d "{
    \"model\": \"llava\",
    \"prompt\": \"<prompt>\",
    \"images\": [\"$BASE64_IMG\"],
    \"stream\": false
  }"

Parse .response (or .message.content for chat) from the JSON output and present it to the user.


Step 5 — Fallback Flows

State B — Ollama installed but not running

ollama serve > /tmp/ollama.log 2>&1 &
sleep 3
ollama list

If it starts, continue from Step 3. If it fails, treat as State C.

State C — Ollama not installed

Present the user with explicit choices — do not proceed with cloud without consent:

"Ollama doesn't appear to be installed, so I can't process your data locally right now. Here are your options:

  1. Install Ollama — Visit https://ollama.com/download (~2 min setup). Then come back and I'll use it automatically.
  2. Pull a model after installollama pull llama3.1:8b (text) or ollama pull llava (vision)
  3. Use a cloud model — I can process this with my standard capabilities, but the data will leave your device.

Which would you prefer?"


Step 6 — Output and Transparency

After every local processing run, always disclose:

  • Which model was used (e.g., llama3.1:8b)
  • That it ran locally / or that cloud was used (with user consent)
  • Any quality caveats from model size limitations

Example footer:

Processed locally using mistral:7b on your machine. No data was sent to any external server.


Quick Reference

Scenario Action
Has llava / moondream Use for image tasks
Has llama3.1:8b+ Good for most text tasks
Has only tiny model (< 3B) Warn: simple tasks only
Has nomic-embed-text only Embeddings only, not generation
Has deepseek-coder / qwen2.5-coder Prefer for code tasks
No Ollama installed Offer install guide or cloud opt-in
Vision task, no vision model Explain gap, suggest ollama pull llava

Core Rules

  • Never silently fall back to cloud — always ask first and get explicit consent.
  • Never assume a text model can do vision — check model family name before attempting.
  • Small model failures are silent — if output looks garbled/truncated, tell user and suggest a larger model.
  • Privacy guarantee — when local processing succeeds, confirm data stayed on-device.
Weekly Installs
1
GitHub Stars
26
First Seen
3 days ago
Installed on
amp1
cline1
openclaw1
opencode1
cursor1
kimi-cli1