lmstudio-subagents
LM Studio Local Models
Use LM Studio local models directly via API calls to offload tasks to free, local AI models. This skill equips agents to discover available models, select appropriate ones based on task requirements, and use them for cost-effective local processing without requiring pre-configuration in Clawdbot.
Why this skill exists (when to reach for it)
Use this skill to offload self-contained work to local/free models when quality is sufficient—saving paid tokens for tasks that truly need your primary model.
Great fits:
- Summarization, extraction, classification, rewriting
- “First-pass” code review or refactoring suggestions
- Drafting outlines, alternatives, and brainstorming
Avoid / be cautious:
- Tasks requiring web access, proprietary tools, or high-stakes correctness (use your primary model)
Key Terms
- model_key: The identifier used by
lmscommands (fromlms ls). This is what you pass tolms load. - model_identifier: The identifier used when loading with
--identifier. Can be the same asmodel_keyor a custom name. This is what you use in API calls to LM Studio. - lm_studio_api_url: The base URL for LM Studio's API. Default is
http://127.0.0.1:1234/v1. No Clawdbot config required - the skill works with LM Studio's default server.
Note: The description above contains all triggering information. The sections below provide implementation details for using the skill once triggered.
Prerequisites
- LM Studio installed with
lmsCLI available on PATH - LM Studio server running (default: http://127.0.0.1:1234)
- Models downloaded in LM Studio
- Node.js available (for helper script; curl can be used as alternative)
Complete Workflow
Step 0: Preflight (Required)
- Verify LM Studio CLI is available:
exec command:"lms --help"
- Verify the LM Studio server is running and reachable:
exec command:"lms server status --json"
Step 1: List Available Models
Get all downloaded models:
exec command:"lms ls --json"
Parse JSON to extract:
- model_key (e.g.,
meta-llama-3.1-8b-instructorlmstudio-community/meta-llama-3.1-8b-instruct) - Type (llm, vlm, embeddings)
- Size (disk space)
- Architecture (Llama, Qwen2, etc.)
- Parameters (model size)
Filter by type if needed:
lms ls --json --llm- Only LLM modelslms ls --json --embedding- Only embedding modelslms ls --json --detailed- More detailed information
Step 2: Check Currently Loaded Models
Check what's already in memory:
exec command:"lms ps --json"
Parse JSON to see which models are currently loaded.
If a suitable model is already loaded (check by model_identifier), skip to Step 6 (call API).
Step 3: Model Selection
Analyze task requirements and select appropriate model:
Selection Criteria:
- Task complexity: Smaller models (1B-3B) for simple tasks, larger models (7B+) for complex tasks
- Context requirements: Match model's max context length to task needs
- Model capabilities: VLM models for vision tasks, embeddings for search, LLMs for text generation
- Memory constraints: Prefer already-loaded models when appropriate
- Model size: Balance capability needs with available memory
Model Selection:
- Pick a
model_keyfromlms lsthat matches task requirements. - Use the
model_keyas themodel_identifierwhen loading (or derive a clean identifier from it). - Any model in LM Studio can be used - no configuration needed.
Step 4: Load Model
Before loading a large model, optionally estimate memory needs:
exec command:"lms load --estimate-only <model_key>"
Load the selected model into memory:
exec command:"lms load <model_key> --identifier \"<model_identifier>\" --ttl 3600"
Optional flags:
--gpu=max|auto|0.0-1.0- Control GPU offload (e.g.,--gpu=0.5for 50% GPU,--gpu=maxfor full GPU)--context-length=<N>- Set context length (e.g.,--context-length=4096)--identifier="<name>"- Assign custom identifier for API reference (use model_key or derive clean identifier)--ttl=<seconds>- Auto-unload after inactivity period (recommended default to avoid thrash and cleanup races)
Important: The lms load command blocks until the model is fully loaded. For large models (70B+), this can take 3+ minutes. The command will return when loading completes.
Example:
exec command:"lms load meta-llama-3.1-8b-instruct --identifier \"meta-llama-3.1-8b-instruct\" --gpu=auto --context-length=4096 --ttl 3600"
Step 5: Verify Model Loaded (CRITICAL SAFETY STEP)
NEVER call the API without verifying the model is loaded.
Note: Since lms load blocks until loading completes, verification should be straightforward. However, verify anyway as a safety check.
Verify the model is actually in memory:
exec command:"lms ps --json"
Parse JSON response and check if model_identifier appears as a loaded identifier.
If model not found:
- This should be rare since
lms loadblocks until complete, but if it happens: - Wait 2-3 seconds (model may still be finalizing)
- Retry verification:
exec command:"lms ps --json" - Repeat up to 3 attempts total
- If still not loaded after retries: ABORT with error message, do NOT call API
If model found: Proceed to call LM Studio API.
Step 6: Call LM Studio API Directly
Call LM Studio's OpenAI-compatible API directly using the loaded model.
Option A: Using helper script (recommended for reliability)
exec command:"node {baseDir}/scripts/lmstudio-api.mjs <model_identifier> '<task description>' --temperature=0.7 --max-tokens=2000"
The script handles:
- Proper JSON encoding (no escaping issues)
- Error handling and retries
- Response validation (checks
response.modelmatches request) - Consistent output format
Option B: Direct curl call
API URL: Use default http://127.0.0.1:1234/v1 (LM Studio's standard default). No configuration needed.
Make API call:
exec command:"curl -X POST <lm_studio_api_url>/chat/completions \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer lmstudio' \
-d '{
\"model\": \"<model_identifier>\",
\"messages\": [{\"role\": \"user\", \"content\": \"<task description>\"}],
\"temperature\": 0.7,
\"max_tokens\": 2000
}'"
Parameters:
model(required): The model_identifier used when loading (must match--identifierfrom Step 4)messages(required): Array of message objects withroleandcontenttemperature(optional): Sampling temperature (0.0-2.0, default 0.7)max_tokens(optional): Maximum tokens to generate (adjust based on task)
Response format:
- Parse JSON response
- Validate
response.modelfield matches requested model_identifier (LM Studio may use different model if requested one isn't loaded) - Extract
choices[0].message.contentfor the model's response - Check for
errorfield in response for error handling
Example (using script):
exec command:"node {baseDir}/scripts/lmstudio-api.mjs meta-llama-3.1-8b-instruct 'Summarize this document and extract key points' --temperature=0.7 --max-tokens=2000"
Example (using curl):
exec command:"curl -X POST http://127.0.0.1:1234/v1/chat/completions \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer lmstudio' \
-d '{
\"model\": \"meta-llama-3.1-8b-instruct\",
\"messages\": [{\"role\": \"user\", \"content\": \"Summarize this document and extract key points\"}],
\"temperature\": 0.7,
\"max_tokens\": 2000
}'"
Step 7: Format and Return Results
Extract and format the API response:
If using helper script:
- Parse JSON output from script (already validated)
- Extract
contentfield - this contains the model's response - Optionally use
usagefield for token statistics - Format the result appropriately for the task context
- Return the formatted result to the user
If using curl directly:
- Parse the JSON response from the curl command
- Validate
response.modelfield - ensure it matches the requestedmodel_identifier(important: LM Studio may auto-select models) - Extract
choices[0].message.content- this contains the model's response - Check for errors: if response contains
errorfield, handle appropriately - If
response.modeldoesn't match request, log warning but proceed (LM Studio behavior) - Format the result appropriately for the task context
- Return the formatted result to the user
Error handling:
- If
errorfield present: report error message to user - If
response.modeldoesn't match: log warning, proceed with response (LM Studio may have auto-selected model) - If response structure unexpected: log warning and attempt to extract content
- If API call fails (non-200 status): report HTTP error
Step 8: Unload Model (Cleanup)
Default policy: Rely on --ttl for automatic cleanup to avoid thrash and races. Unload explicitly when you hit memory pressure or the user requests immediate cleanup.
If unloading explicitly after API call completes:
exec command:"lms unload <model_identifier>"
Note: lms unload accepts either the model_key or the identifier. Since we loaded with --identifier, use the model_identifier for consistency.
Handle errors gracefully:
- If model already unloaded: No-op, continue
- If model still in use: Log warning, suggest manual cleanup later
- If unload fails: Log warning, suggest manual cleanup
Model Selection Guide
Decision inputs (what to look at)
Pull these from lms ls --json (and optionally lms ls --json --detailed):
type:llm|vlm|embeddingvision: boolean (if the task includes images, requirevision=true)trainedForToolUse: boolean (prefer true when tool/function calling is important)maxContextLength: number (require enough context for long docs)paramsString/ model size: rough proxy for cost/speed
Also check runtime state:
lms ps --jsonfor already-loaded candidates (prefer these to avoid load time and memory churn)
Heuristics (simple selection policy)
Use a constraints-first approach, then score:
- Hard constraints
- If the task is vision/image-based → only consider models where
vision=true - If you need embeddings → only consider
type=embedding - If task requires a minimum context window → only consider models with
maxContextLength >= needed
- Preferences / scoring
- Prefer models already loaded (
lms ps) if they meet constraints - Prefer
trainedForToolUse=truewhen the task benefits from structured tool use - Prefer smaller models for cheap/fast tasks; larger models for deeper reasoning
- Fallbacks
- If no model meets constraints: either pick the closest match (and warn) or fall back to your primary model.
Memory optimization
- Check
lms psfirst — prefer already-loaded models when appropriate - Use
lms load --estimate-only <model_key>to preview requirements - Use
--ttlto avoid leaving large models resident indefinitely
Safety Checks
CRITICAL: Load Verification
Never call the API without verifying the model is loaded.
The verification step (Step 5) is mandatory. Without it:
- API call may fail with "model not available" errors
- Wasted resources making API calls that can't succeed
- Confusing error messages
Retry Logic
Load verification includes retry logic to handle eventual consistency:
- Initial check immediately after load
- Wait 2-3 seconds if not found
- Retry up to 3 total attempts
- Abort if still not loaded after retries
Model Identifier Consistency
Ensure consistent use of model identifiers:
- Use
model_keyfromlms lsforlms load - Use the same
model_identifier(from--identifier) for API calls - The identifier used in API calls must match what was loaded
Error Handling
Model Not Found
Symptom: lms ls doesn't show the model, or lms load fails with "model not found"
Response:
- Error message: "Model not found in LM Studio"
- Suggest: "Download the model first using
lms get <model-key>or via LM Studio UI"
API Call Failed
Symptom: curl command returns non-200 status or error response
Response:
- Check HTTP status code in response
- If 404: Model not found or not loaded - verify model_identifier matches loaded model
- If 500: LM Studio server error - check server logs, try reloading model
- If connection refused: LM Studio server not running - start server first
- Extract error message from response JSON if available
- Suggest: "Verify model is loaded with
lms ps, check LM Studio server status, or try reloading the model"
Invalid API Response
Symptom: API call succeeds but response structure is unexpected or missing content
Response:
- Check if response contains
choicesarray - Check if
choices[0].message.contentexists - If structure unexpected: Log warning, attempt to extract any available content
- If completely malformed: Report error and suggest retrying the API call
Load Timeout
Symptom: lms load command hangs or takes extremely long
Response:
lms loadblocks until loading completes, which can take 3+ minutes for large models (70B+)- The exec tool has a default timeout (1800 seconds / 30 minutes) which should be sufficient
- If timeout occurs: "Model load timed out - this may indicate insufficient memory or a corrupted model file"
- Suggest: "Try smaller model, free up memory by unloading other models, or verify model file integrity"
Load Verification Fails
Symptom: Load command succeeds but lms ps doesn't show model after retries
Response:
- This should be rare since
lms loadblocks until complete - If it happens: Abort workflow with error: "Model failed to appear after load completion"
- Do NOT call API
- Suggest: "Check LM Studio logs, verify the identifier matches what was loaded, try reloading"
Insufficient Memory
Symptom: lms load fails with memory-related errors
Response:
- Error message: "Insufficient memory to load model"
- Suggest: "Unload other models using
lms unload --allor select smaller model" - Use
lms load --estimate-onlyto preview requirements
API Call Fails After Verification
Symptom: Model verified as loaded but API call fails
Response:
- Report error to user
- Check if model is still loaded:
lms ps --json - If model disappeared: Reload model and retry API call
- If model still loaded but API fails: Check API URL, verify model_identifier matches exactly
- Still attempt to unload model (cleanup) if requested
Model Already Loaded
Symptom: lms ps shows model is already loaded
Response:
- Skip load step (Step 4)
- Proceed directly to verification (Step 5) and then API call (Step 6)
- This is an optimization, not an error
- Ensure the model_identifier matches what's already loaded
Unload Fails
Symptom: lms unload fails (model still in use, etc.)
Response:
- Log warning: "Failed to unload model "
- Suggest: "Model may still be in use, unload manually later with
lms unload <model-key>" - Continue workflow (unload failure doesn't block completion)
Examples
Simple Task: Document Summarization
# 1. List models
exec command:"lms ls --json --llm"
# 2. Check loaded
exec command:"lms ps --json"
# 3. Select small model (e.g., meta-llama-3.1-8b-instruct)
# 4. Load model
exec command:"lms load meta-llama-3.1-8b-instruct --identifier \"meta-llama-3.1-8b-instruct\" --ttl 3600"
# 5. Verify loaded
exec command:"lms ps --json"
# Parse and confirm model appears
# 6. Call LM Studio API (using helper script)
exec command:"node {baseDir}/scripts/lmstudio-api.mjs meta-llama-3.1-8b-instruct 'Summarize this document and extract 5 key points' --temperature=0.7 --max-tokens=2000"
# 7. Parse response and extract content field
# 8. Optional explicit unload after completion (otherwise rely on TTL)
exec command:"lms unload meta-llama-3.1-8b-instruct"
Complex Task: Codebase Analysis
# 1-2. List and check (same as above)
# 3. Select larger model (e.g., meta-llama-3.1-70b-instruct)
# 4. Load with context length
exec command:"lms load meta-llama-3.1-70b-instruct --identifier \"meta-llama-3.1-70b-instruct\" --context-length=8192 --gpu=auto --ttl 3600"
# 5. Verify loaded
exec command:"lms ps --json"
# 6. Call LM Studio API with longer context (using helper script)
exec command:"node {baseDir}/scripts/lmstudio-api.mjs meta-llama-3.1-70b-instruct 'Analyze the codebase architecture, identify main components, and suggest improvements' --temperature=0.3 --max-tokens=4000"
# 7. Parse response and format results
# 8. Optional unload (same as above)
Vision Task: Image Description
# 1. List VLM models
exec command:"lms ls --json"
# 2-3. Select VLM model (e.g., qwen2-vl-7b-instruct)
# 4. Load VLM model
exec command:"lms load qwen2-vl-7b-instruct --identifier \"qwen2-vl-7b-instruct\" --gpu=max --ttl 3600"
# 5. Verify loaded
exec command:"lms ps --json"
# 6. Call LM Studio API with image (if supported by model, using helper script)
exec command:"node {baseDir}/scripts/lmstudio-api.mjs qwen2-vl-7b-instruct 'Describe this image in detail, including objects, colors, composition, and any text visible' --temperature=0.7 --max-tokens=2000"
# 7-8. Parse response and unload
LM Studio API Details
Helper Script (Recommended)
The skill includes scripts/lmstudio-api.mjs for reliable API calls. This script is optional but recommended for better error handling and response validation.
Benefits:
- Proper JSON encoding (no escaping issues)
- Built-in error handling
- Response validation (checks
response.modelmatches request) - Consistent output format
- Environment variable support (
LM_STUDIO_API_URL)
Usage:
node {baseDir}/scripts/lmstudio-api.mjs <model_identifier> '<task>' [--temperature=0.7] [--max-tokens=2000] [--api-url=http://127.0.0.1:1234/v1]
Output:
{
"content": "<model response>",
"model": "<model used>",
"usage": {"prompt_tokens": 100, "completion_tokens": 200, "total_tokens": 300}
}
Note: If Node.js is not available, you can use curl directly (see Option B in Step 6).
API Endpoint Format
LM Studio exposes an OpenAI-compatible API endpoint:
- Base URL:
http://127.0.0.1:1234/v1(default, no configuration required) - Chat completions:
POST /v1/chat/completions - Models list:
GET /v1/models
Determining API URL
The API URL defaults to http://127.0.0.1:1234/v1 (LM Studio's standard default). No configuration is required - the skill works out of the box with LM Studio's default server.
The helper script supports LM_STUDIO_API_URL environment variable if you need to override the default URL.
Request Format (OpenAI-Compatible)
{
"model": "<model_identifier>",
"messages": [
{"role": "user", "content": "<task description>"}
],
"temperature": 0.7,
"max_tokens": 2000
}
Required fields:
model: Must match the identifier used when loading (--identifiervalue)messages: Array of message objects withrole("user", "assistant", "system") andcontent
Optional fields:
temperature: 0.0-2.0 (default 0.7)max_tokens: Maximum tokens to generatestream:truefor streaming responses (not recommended for exec tool)top_p: Nucleus sampling parameterfrequency_penalty: -2.0 to 2.0presence_penalty: -2.0 to 2.0
Response Format
Success response:
{
"id": "chatcmpl-...",
"object": "chat.completion",
"created": 1234567890,
"model": "<model_identifier>",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "<model response>"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 100,
"completion_tokens": 200,
"total_tokens": 300
}
}
Error response:
{
"error": {
"message": "Error description",
"type": "invalid_request_error",
"code": "model_not_found"
}
}
Response Parsing
- Parse JSON response from curl command
- Check for
errorfield - if present, handle error - Validate
response.modelfield - ensure it matches the requestedmodel_identifier(LM Studio may use a different model if the requested one isn't loaded) - Extract
choices[0].message.contentfor the model's response - Optionally extract
usagefor token statistics - Format and return content to user
Important: Always validate response.model matches the requested model. LM Studio may auto-select/auto-load models, so the API may succeed even if lms ps doesn't show your requested model. If response.model doesn't match, log a warning or handle appropriately.
Authentication
LM Studio API typically uses:
- Header:
Authorization: Bearer lmstudio - Some setups may not require authentication (check LM Studio server settings)
Notes
- Model identifier: Use the same identifier for
--identifierwhen loading andmodelin API calls - JSON output: Always use
--jsonflag forlmscommands for machine-readable output - Already loaded: Check
lms psfirst - if model is already loaded, skip load step to save time - Cleanup policy: Prefer
--ttlto avoid thrash; explicitly unload on memory pressure or when requested - No config required: Models do not need to be pre-configured in Clawdbot - any model in LM Studio can be used
- Load time:
lms loadblocks until complete. Large models (70B+) can take 3+ minutes. This is normal and expected - API compatibility: LM Studio uses OpenAI-compatible API format, so standard OpenAI request/response patterns apply
- Model validation: Always validate
response.modelfield matches requested model_identifier. LM Studio may auto-select/auto-load models, so API calls may succeed even iflms psdoesn't show the requested model - Model name validation: LM Studio API may not reject unknown model names - it may use whatever model is currently loaded. Always validate model exists via
lms lsbefore making API calls - Tested with: LM Studio version 0.3.39. Behavior may vary with different versions