openclaw-smart-model-switching-glm
SKILL.md
Smart Model Switching
Three-tier z.ai (GLM) routing: Flash β Standard β Plus / 32B
Start with the cheapest model. Escalate only when needed. Designed to minimize API cost without sacrificing correctness.
The Golden Rule
If a human would need more than 30 seconds of focused thinking, escalate from Flash to Standard.
If the task involves architecture, complex tradeoffs, or deep reasoning, escalate to Plus / 32B.
Model Reality (Relative)
| Tier | Example Models | Purpose |
|---|---|---|
| Flash | GLM-4.5-Flash, GLM-4.7-Flash | Fastest & cheapest |
| Standard | GLM-4.6, GLM-4.7 | Strong reasoning & code |
| Plus / 32B | GLM-4-Plus, GLM-4-32B-128K | Heavy reasoning & architecture |
Bottom line: Wrong model selection wastes money OR time. Flash for simple, Standard for normal work, Plus/32B for complex decisions.
π FLASH β Default for Simple Tasks
Stay on Flash for:
- Factual Q&A β βwhat is Xβ, βwho is Yβ, βwhen did Zβ
- Quick lookups β definitions, unit conversions, short translations
- Status checks β monitoring, file reads, session state
- Heartbeats β periodic checks, OK responses
- Memory & reminders
- Casual conversation β greetings, acknowledgments
- Simple file ops β read, list, basic writes
- One-liner tasks β anything answerable in 1β2 sentences
- Cron jobs (always Flash by default)
NEVER do these on Flash
- β Write code longer than 10 lines
- β Create comparison tables
- β Write more than 3 paragraphs
- β Do multi-step analysis
- β Write reports or proposals
π STANDARD β Core Workhorse
Escalate to Standard for:
Code & Technical
- Code generation β functions, scripts, features
- Debugging β normal bug investigation
- Code review β PRs, refactors
- Documentation β README, comments, guides
Analysis & Planning
- Comparisons and evaluations
- Planning β roadmaps, task breakdowns
- Research synthesis
- Multi-step reasoning
Writing & Content
- Long-form writing (>3 paragraphs)
- Summaries of long documents
- Structured output β tables, outlines
Most real user conversations belong here.
β€οΈ PLUS / 32B β Complex Reasoning Only
Escalate to Plus / 32B for:
Architecture & Design
- System and service architecture
- Database schema design
- Distributed or multi-tenant systems
- Major refactors across multiple files
Deep Analysis
- Complex debugging (race conditions, subtle bugs)
- Security reviews
- Performance optimization strategy
- Root cause analysis
Strategic & Judgment-Based Work
- Strategic planning
- Nuanced judgment and ambiguity
- Deep or multi-source research
- Critical production decisions
π Implementation
For Subagents
// Routine monitoring
sessions_spawn(task="Check backup status", model="GLM-4.5-Flash")
// Standard code work
sessions_spawn(task="Build the REST API endpoint", model="GLM-4.7")
// Architecture decisions
sessions_spawn(task="Design the database schema for multi-tenancy", model="GLM-4-Plus")
For Cron Jobs
json
Copy code
{
"payload": {
"kind": "agentTurn",
"model": "GLM-4.5-Flash"
}
}
Always use Flash for cron unless the task genuinely needs reasoning.
π Quick Decision Tree
pgsql
Copy code
Is it a greeting, lookup, status check, or 1β2 sentence answer?
YES β FLASH
NO β
Is it code, analysis, planning, writing, or multi-step?
YES β STANDARD
NO β
Is it architecture, deep reasoning, or a critical decision?
YES β PLUS / 32B
NO β Default to STANDARD, escalate if struggling
π Quick Reference Card
less
Copy code
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β SMART MODEL SWITCHING β
β Flash β Standard β Plus / 32B β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β π FLASH (cheapest) β
β β’ Greetings, status checks, quick lookups β
β β’ Factual Q&A, reminders β
β β’ Simple file ops, 1β2 sentence answers β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β π STANDARD (workhorse) β
β β’ Code > 10 lines, debugging β
β β’ Analysis, comparisons, planning β
β β’ Reports, long writing β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β€οΈ PLUS / 32B (complex) β
β β’ Architecture decisions β
β β’ Complex debugging, multi-file refactoring β
β β’ Strategic planning, deep research β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β π‘ RULE: >30 sec human thinking β escalate β
β π° START CHEAP β SCALE ONLY WHEN NEEDED β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Built for z.ai (GLM) setups.
Weekly Installs
1
First Seen
Feb 28, 2026
Installed on
mcpjam1
amp1
cline1
openclaw1
opencode1
cursor1