LM Studio Local Models

Use LM Studio local models directly via API calls to offload tasks to free, local AI models. This skill equips agents to discover available models, select appropriate ones based on task requirements, and use them for cost-effective local processing without requiring pre-configuration in Clawdbot.

Why this skill exists (when to reach for it)

Use this skill to offload self-contained work to local/free models when quality is sufficient—saving paid tokens for tasks that truly need your primary model.

Great fits:

Summarization, extraction, classification, rewriting
“First-pass” code review or refactoring suggestions
Drafting outlines, alternatives, and brainstorming

Avoid / be cautious:

Tasks requiring web access, proprietary tools, or high-stakes correctness (use your primary model)

Key Terms

model_key: The identifier used by lms commands (from lms ls). This is what you pass to lms load.
model_identifier: The identifier used when loading with --identifier. Can be the same as model_key or a custom name. This is what you use in API calls to LM Studio.
lm_studio_api_url: The base URL for LM Studio's API. Default is http://127.0.0.1:1234/v1. No Clawdbot config required - the skill works with LM Studio's default server.

Note: The description above contains all triggering information. The sections below provide implementation details for using the skill once triggered.

Prerequisites

LM Studio installed with lms CLI available on PATH
LM Studio server running (default: http://127.0.0.1:1234)
Models downloaded in LM Studio
Node.js available (for helper script; curl can be used as alternative)

Complete Workflow

Step 0: Preflight (Required)

Verify LM Studio CLI is available:

exec command:"lms --help"

Verify the LM Studio server is running and reachable:

exec command:"lms server status --json"

Step 1: List Available Models

Get all downloaded models:

exec command:"lms ls --json"

Parse JSON to extract:

model_key (e.g., meta-llama-3.1-8b-instruct or lmstudio-community/meta-llama-3.1-8b-instruct)
Type (llm, vlm, embeddings)
Size (disk space)
Architecture (Llama, Qwen2, etc.)
Parameters (model size)

Filter by type if needed:

lms ls --json --llm - Only LLM models
lms ls --json --embedding - Only embedding models
lms ls --json --detailed - More detailed information

Step 2: Check Currently Loaded Models

Check what's already in memory:

exec command:"lms ps --json"

Parse JSON to see which models are currently loaded.

If a suitable model is already loaded (check by model_identifier), skip to Step 6 (call API).

Step 3: Model Selection

Analyze task requirements and select appropriate model:

Selection Criteria:

Task complexity: Smaller models (1B-3B) for simple tasks, larger models (7B+) for complex tasks
Context requirements: Match model's max context length to task needs
Model capabilities: VLM models for vision tasks, embeddings for search, LLMs for text generation
Memory constraints: Prefer already-loaded models when appropriate
Model size: Balance capability needs with available memory

Model Selection:

Pick a model_key from lms ls that matches task requirements.
Use the model_key as the model_identifier when loading (or derive a clean identifier from it).
Any model in LM Studio can be used - no configuration needed.

Step 4: Load Model

Before loading a large model, optionally estimate memory needs:

exec command:"lms load --estimate-only <model_key>"

Load the selected model into memory:

exec command:"lms load <model_key> --identifier \"<model_identifier>\" --ttl 3600"

Optional flags:

--gpu=max|auto|0.0-1.0 - Control GPU offload (e.g., --gpu=0.5 for 50% GPU, --gpu=max for full GPU)
--context-length=<N> - Set context length (e.g., --context-length=4096)
--identifier="<name>" - Assign custom identifier for API reference (use model_key or derive clean identifier)
--ttl=<seconds> - Auto-unload after inactivity period (recommended default to avoid thrash and cleanup races)

Important: The lms load command blocks until the model is fully loaded. For large models (70B+), this can take 3+ minutes. The command will return when loading completes.

Example:

exec command:"lms load meta-llama-3.1-8b-instruct --identifier \"meta-llama-3.1-8b-instruct\" --gpu=auto --context-length=4096 --ttl 3600"

Step 5: Verify Model Loaded (CRITICAL SAFETY STEP)

NEVER call the API without verifying the model is loaded.

Note: Since lms load blocks until loading completes, verification should be straightforward. However, verify anyway as a safety check.

Verify the model is actually in memory:

exec command:"lms ps --json"

Parse JSON response and check if model_identifier appears as a loaded identifier.

If model not found:

This should be rare since lms load blocks until complete, but if it happens:
Wait 2-3 seconds (model may still be finalizing)
Retry verification: exec command:"lms ps --json"
Repeat up to 3 attempts total
If still not loaded after retries: ABORT with error message, do NOT call API

If model found: Proceed to call LM Studio API.

Step 6: Call LM Studio API Directly

Call LM Studio's OpenAI-compatible API directly using the loaded model.

Option A: Using helper script (recommended for reliability)

exec command:"node {baseDir}/scripts/lmstudio-api.mjs <model_identifier> '<task description>' --temperature=0.7 --max-tokens=2000"

The script handles:

Proper JSON encoding (no escaping issues)
Error handling and retries
Response validation (checks response.model matches request)
Consistent output format

Option B: Direct curl call

API URL: Use default http://127.0.0.1:1234/v1 (LM Studio's standard default). No configuration needed.

Make API call:

exec command:"curl -X POST <lm_studio_api_url>/chat/completions \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer lmstudio' \
  -d '{
    \"model\": \"<model_identifier>\",
    \"messages\": [{\"role\": \"user\", \"content\": \"<task description>\"}],
    \"temperature\": 0.7,
    \"max_tokens\": 2000
  }'"

Parameters:

model (required): The model_identifier used when loading (must match --identifier from Step 4)
messages (required): Array of message objects with role and content
temperature (optional): Sampling temperature (0.0-2.0, default 0.7)
max_tokens (optional): Maximum tokens to generate (adjust based on task)

Response format:

Parse JSON response
Validate response.model field matches requested model_identifier (LM Studio may use different model if requested one isn't loaded)
Extract choices[0].message.content for the model's response
Check for error field in response for error handling

Example (using script):

exec command:"node {baseDir}/scripts/lmstudio-api.mjs meta-llama-3.1-8b-instruct 'Summarize this document and extract key points' --temperature=0.7 --max-tokens=2000"

Example (using curl):

exec command:"curl -X POST http://127.0.0.1:1234/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer lmstudio' \
  -d '{
    \"model\": \"meta-llama-3.1-8b-instruct\",
    \"messages\": [{\"role\": \"user\", \"content\": \"Summarize this document and extract key points\"}],
    \"temperature\": 0.7,
    \"max_tokens\": 2000
  }'"

Step 7: Format and Return Results

Extract and format the API response:

If using helper script:

Parse JSON output from script (already validated)
Extract content field - this contains the model's response
Optionally use usage field for token statistics
Format the result appropriately for the task context
Return the formatted result to the user

If using curl directly:

Parse the JSON response from the curl command
Validate response.model field - ensure it matches the requested model_identifier (important: LM Studio may auto-select models)
Extract choices[0].message.content - this contains the model's response
Check for errors: if response contains error field, handle appropriately
If response.model doesn't match request, log warning but proceed (LM Studio behavior)
Format the result appropriately for the task context
Return the formatted result to the user

Error handling:

If error field present: report error message to user
If response.model doesn't match: log warning, proceed with response (LM Studio may have auto-selected model)
If response structure unexpected: log warning and attempt to extract content
If API call fails (non-200 status): report HTTP error

Step 8: Unload Model (Cleanup)

Default policy: Rely on --ttl for automatic cleanup to avoid thrash and races. Unload explicitly when you hit memory pressure or the user requests immediate cleanup.

If unloading explicitly after API call completes:

exec command:"lms unload <model_identifier>"

Note: lms unload accepts either the model_key or the identifier. Since we loaded with --identifier, use the model_identifier for consistency.

Handle errors gracefully:

If model already unloaded: No-op, continue
If model still in use: Log warning, suggest manual cleanup later
If unload fails: Log warning, suggest manual cleanup

Model Selection Guide

Decision inputs (what to look at)

Pull these from lms ls --json (and optionally lms ls --json --detailed):

type: llm | vlm | embedding
vision: boolean (if the task includes images, require vision=true)
trainedForToolUse: boolean (prefer true when tool/function calling is important)
maxContextLength: number (require enough context for long docs)
paramsString / model size: rough proxy for cost/speed

Also check runtime state:

lms ps --json for already-loaded candidates (prefer these to avoid load time and memory churn)

Heuristics (simple selection policy)

Use a constraints-first approach, then score:

Hard constraints

If the task is vision/image-based → only consider models where vision=true
If you need embeddings → only consider type=embedding
If task requires a minimum context window → only consider models with maxContextLength >= needed

Preferences / scoring

Prefer models already loaded (lms ps) if they meet constraints
Prefer trainedForToolUse=true when the task benefits from structured tool use
Prefer smaller models for cheap/fast tasks; larger models for deeper reasoning

Fallbacks

If no model meets constraints: either pick the closest match (and warn) or fall back to your primary model.

Memory optimization

Check lms ps first — prefer already-loaded models when appropriate
Use lms load --estimate-only <model_key> to preview requirements
Use --ttl to avoid leaving large models resident indefinitely

Safety Checks

CRITICAL: Load Verification

Never call the API without verifying the model is loaded.

The verification step (Step 5) is mandatory. Without it:

API call may fail with "model not available" errors
Wasted resources making API calls that can't succeed
Confusing error messages

Retry Logic

Load verification includes retry logic to handle eventual consistency:

Initial check immediately after load
Wait 2-3 seconds if not found
Retry up to 3 total attempts
Abort if still not loaded after retries

Model Identifier Consistency

Ensure consistent use of model identifiers:

Use model_key from lms ls for lms load
Use the same model_identifier (from --identifier) for API calls
The identifier used in API calls must match what was loaded

Error Handling

Model Not Found

Symptom: lms ls doesn't show the model, or lms load fails with "model not found"

Response:

Error message: "Model not found in LM Studio"
Suggest: "Download the model first using lms get <model-key> or via LM Studio UI"

API Call Failed

Symptom: curl command returns non-200 status or error response

Response:

Check HTTP status code in response
If 404: Model not found or not loaded - verify model_identifier matches loaded model
If 500: LM Studio server error - check server logs, try reloading model
If connection refused: LM Studio server not running - start server first
Extract error message from response JSON if available
Suggest: "Verify model is loaded with lms ps, check LM Studio server status, or try reloading the model"

Invalid API Response

Symptom: API call succeeds but response structure is unexpected or missing content

Response:

Check if response contains choices array
Check if choices[0].message.content exists
If structure unexpected: Log warning, attempt to extract any available content
If completely malformed: Report error and suggest retrying the API call

Load Timeout

Symptom: lms load command hangs or takes extremely long

Response:

lms load blocks until loading completes, which can take 3+ minutes for large models (70B+)
The exec tool has a default timeout (1800 seconds / 30 minutes) which should be sufficient
If timeout occurs: "Model load timed out - this may indicate insufficient memory or a corrupted model file"
Suggest: "Try smaller model, free up memory by unloading other models, or verify model file integrity"

Load Verification Fails

Symptom: Load command succeeds but lms ps doesn't show model after retries

Response:

This should be rare since lms load blocks until complete
If it happens: Abort workflow with error: "Model failed to appear after load completion"
Do NOT call API
Suggest: "Check LM Studio logs, verify the identifier matches what was loaded, try reloading"

Insufficient Memory

Symptom: lms load fails with memory-related errors

Response:

Error message: "Insufficient memory to load model"
Suggest: "Unload other models using lms unload --all or select smaller model"
Use lms load --estimate-only to preview requirements

API Call Fails After Verification

Symptom: Model verified as loaded but API call fails

Response:

Report error to user
Check if model is still loaded: lms ps --json
If model disappeared: Reload model and retry API call
If model still loaded but API fails: Check API URL, verify model_identifier matches exactly
Still attempt to unload model (cleanup) if requested

Model Already Loaded

Symptom: lms ps shows model is already loaded

Response:

Skip load step (Step 4)
Proceed directly to verification (Step 5) and then API call (Step 6)
This is an optimization, not an error
Ensure the model_identifier matches what's already loaded

Unload Fails

Symptom: lms unload fails (model still in use, etc.)

Response:

Log warning: "Failed to unload model "
Suggest: "Model may still be in use, unload manually later with lms unload <model-key>"
Continue workflow (unload failure doesn't block completion)

Examples

Simple Task: Document Summarization

# 1. List models
exec command:"lms ls --json --llm"

# 2. Check loaded
exec command:"lms ps --json"

# 3. Select small model (e.g., meta-llama-3.1-8b-instruct)

# 4. Load model
exec command:"lms load meta-llama-3.1-8b-instruct --identifier \"meta-llama-3.1-8b-instruct\" --ttl 3600"

# 5. Verify loaded
exec command:"lms ps --json"
# Parse and confirm model appears

# 6. Call LM Studio API (using helper script)
exec command:"node {baseDir}/scripts/lmstudio-api.mjs meta-llama-3.1-8b-instruct 'Summarize this document and extract 5 key points' --temperature=0.7 --max-tokens=2000"

# 7. Parse response and extract content field

# 8. Optional explicit unload after completion (otherwise rely on TTL)
exec command:"lms unload meta-llama-3.1-8b-instruct"

Complex Task: Codebase Analysis

# 1-2. List and check (same as above)

# 3. Select larger model (e.g., meta-llama-3.1-70b-instruct)

# 4. Load with context length
exec command:"lms load meta-llama-3.1-70b-instruct --identifier \"meta-llama-3.1-70b-instruct\" --context-length=8192 --gpu=auto --ttl 3600"

# 5. Verify loaded
exec command:"lms ps --json"

# 6. Call LM Studio API with longer context (using helper script)
exec command:"node {baseDir}/scripts/lmstudio-api.mjs meta-llama-3.1-70b-instruct 'Analyze the codebase architecture, identify main components, and suggest improvements' --temperature=0.3 --max-tokens=4000"

# 7. Parse response and format results

# 8. Optional unload (same as above)

Vision Task: Image Description

# 1. List VLM models
exec command:"lms ls --json"

# 2-3. Select VLM model (e.g., qwen2-vl-7b-instruct)

# 4. Load VLM model
exec command:"lms load qwen2-vl-7b-instruct --identifier \"qwen2-vl-7b-instruct\" --gpu=max --ttl 3600"

# 5. Verify loaded
exec command:"lms ps --json"

# 6. Call LM Studio API with image (if supported by model, using helper script)
exec command:"node {baseDir}/scripts/lmstudio-api.mjs qwen2-vl-7b-instruct 'Describe this image in detail, including objects, colors, composition, and any text visible' --temperature=0.7 --max-tokens=2000"

# 7-8. Parse response and unload

LM Studio API Details

Helper Script (Recommended)

The skill includes scripts/lmstudio-api.mjs for reliable API calls. This script is optional but recommended for better error handling and response validation.

Benefits:

Proper JSON encoding (no escaping issues)
Built-in error handling
Response validation (checks response.model matches request)
Consistent output format
Environment variable support (LM_STUDIO_API_URL)

Usage:

node {baseDir}/scripts/lmstudio-api.mjs <model_identifier> '<task>' [--temperature=0.7] [--max-tokens=2000] [--api-url=http://127.0.0.1:1234/v1]

Output:

{
  "content": "<model response>",
  "model": "<model used>",
  "usage": {"prompt_tokens": 100, "completion_tokens": 200, "total_tokens": 300}
}

Note: If Node.js is not available, you can use curl directly (see Option B in Step 6).

API Endpoint Format

LM Studio exposes an OpenAI-compatible API endpoint:

Base URL: http://127.0.0.1:1234/v1 (default, no configuration required)
Chat completions: POST /v1/chat/completions
Models list: GET /v1/models

Determining API URL

The API URL defaults to http://127.0.0.1:1234/v1 (LM Studio's standard default). No configuration is required - the skill works out of the box with LM Studio's default server.

The helper script supports LM_STUDIO_API_URL environment variable if you need to override the default URL.

Request Format (OpenAI-Compatible)

{
  "model": "<model_identifier>",
  "messages": [
    {"role": "user", "content": "<task description>"}
  ],
  "temperature": 0.7,
  "max_tokens": 2000
}

Required fields:

model: Must match the identifier used when loading (--identifier value)
messages: Array of message objects with role ("user", "assistant", "system") and content

Optional fields:

temperature: 0.0-2.0 (default 0.7)
max_tokens: Maximum tokens to generate
stream: true for streaming responses (not recommended for exec tool)
top_p: Nucleus sampling parameter
frequency_penalty: -2.0 to 2.0
presence_penalty: -2.0 to 2.0

Response Format

Success response:

{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "created": 1234567890,
  "model": "<model_identifier>",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "<model response>"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 100,
    "completion_tokens": 200,
    "total_tokens": 300
  }
}

Error response:

{
  "error": {
    "message": "Error description",
    "type": "invalid_request_error",
    "code": "model_not_found"
  }
}

Response Parsing

Parse JSON response from curl command
Check for error field - if present, handle error
Validate response.model field - ensure it matches the requested model_identifier (LM Studio may use a different model if the requested one isn't loaded)
Extract choices[0].message.content for the model's response
Optionally extract usage for token statistics
Format and return content to user

Important: Always validate response.model matches the requested model. LM Studio may auto-select/auto-load models, so the API may succeed even if lms ps doesn't show your requested model. If response.model doesn't match, log a warning or handle appropriately.

Authentication

LM Studio API typically uses:

Header: Authorization: Bearer lmstudio
Some setups may not require authentication (check LM Studio server settings)

Notes

Model identifier: Use the same identifier for --identifier when loading and model in API calls
JSON output: Always use --json flag for lms commands for machine-readable output
Already loaded: Check lms ps first - if model is already loaded, skip load step to save time
Cleanup policy: Prefer --ttl to avoid thrash; explicitly unload on memory pressure or when requested
No config required: Models do not need to be pre-configured in Clawdbot - any model in LM Studio can be used
Load time: lms load blocks until complete. Large models (70B+) can take 3+ minutes. This is normal and expected
API compatibility: LM Studio uses OpenAI-compatible API format, so standard OpenAI request/response patterns apply
Model validation: Always validate response.model field matches requested model_identifier. LM Studio may auto-select/auto-load models, so API calls may succeed even if lms ps doesn't show the requested model
Model name validation: LM Studio API may not reject unknown model names - it may use whatever model is currently loaded. Always validate model exists via lms ls before making API calls
Tested with: LM Studio version 0.3.39. Behavior may vary with different versions

lmstudio-subagents

LM Studio Local Models

Why this skill exists (when to reach for it)

Key Terms

Prerequisites

Complete Workflow

Step 0: Preflight (Required)

Step 1: List Available Models

Step 2: Check Currently Loaded Models

Step 3: Model Selection

Step 4: Load Model

Step 5: Verify Model Loaded (CRITICAL SAFETY STEP)

Step 6: Call LM Studio API Directly

Step 7: Format and Return Results

Step 8: Unload Model (Cleanup)

Model Selection Guide

Decision inputs (what to look at)

Heuristics (simple selection policy)

Memory optimization

Safety Checks

CRITICAL: Load Verification

Retry Logic

Model Identifier Consistency

Error Handling

Model Not Found

API Call Failed

Invalid API Response

Load Timeout

Load Verification Fails

Insufficient Memory

API Call Fails After Verification

Model Already Loaded

Unload Fails

Examples

Simple Task: Document Summarization

Complex Task: Codebase Analysis

Vision Task: Image Description

LM Studio API Details

Helper Script (Recommended)

API Endpoint Format

Determining API URL

Request Format (OpenAI-Compatible)

Response Format

Response Parsing

Authentication

Notes