skills/smithery.ai/openclaw-smart-model-switching-glm

openclaw-smart-model-switching-glm

SKILL.md

Smart Model Switching

Three-tier z.ai (GLM) routing: Flash β†’ Standard β†’ Plus / 32B

Start with the cheapest model. Escalate only when needed. Designed to minimize API cost without sacrificing correctness.


The Golden Rule

If a human would need more than 30 seconds of focused thinking, escalate from Flash to Standard.
If the task involves architecture, complex tradeoffs, or deep reasoning, escalate to Plus / 32B.


Model Reality (Relative)

Tier Example Models Purpose
Flash GLM-4.5-Flash, GLM-4.7-Flash Fastest & cheapest
Standard GLM-4.6, GLM-4.7 Strong reasoning & code
Plus / 32B GLM-4-Plus, GLM-4-32B-128K Heavy reasoning & architecture

Bottom line: Wrong model selection wastes money OR time. Flash for simple, Standard for normal work, Plus/32B for complex decisions.


πŸ’š FLASH β€” Default for Simple Tasks

Stay on Flash for:

  • Factual Q&A β€” β€œwhat is X”, β€œwho is Y”, β€œwhen did Z”
  • Quick lookups β€” definitions, unit conversions, short translations
  • Status checks β€” monitoring, file reads, session state
  • Heartbeats β€” periodic checks, OK responses
  • Memory & reminders
  • Casual conversation β€” greetings, acknowledgments
  • Simple file ops β€” read, list, basic writes
  • One-liner tasks β€” anything answerable in 1–2 sentences
  • Cron jobs (always Flash by default)

NEVER do these on Flash

  • ❌ Write code longer than 10 lines
  • ❌ Create comparison tables
  • ❌ Write more than 3 paragraphs
  • ❌ Do multi-step analysis
  • ❌ Write reports or proposals

πŸ’› STANDARD β€” Core Workhorse

Escalate to Standard for:

Code & Technical

  • Code generation β€” functions, scripts, features
  • Debugging β€” normal bug investigation
  • Code review β€” PRs, refactors
  • Documentation β€” README, comments, guides

Analysis & Planning

  • Comparisons and evaluations
  • Planning β€” roadmaps, task breakdowns
  • Research synthesis
  • Multi-step reasoning

Writing & Content

  • Long-form writing (>3 paragraphs)
  • Summaries of long documents
  • Structured output β€” tables, outlines

Most real user conversations belong here.


❀️ PLUS / 32B β€” Complex Reasoning Only

Escalate to Plus / 32B for:

Architecture & Design

  • System and service architecture
  • Database schema design
  • Distributed or multi-tenant systems
  • Major refactors across multiple files

Deep Analysis

  • Complex debugging (race conditions, subtle bugs)
  • Security reviews
  • Performance optimization strategy
  • Root cause analysis

Strategic & Judgment-Based Work

  • Strategic planning
  • Nuanced judgment and ambiguity
  • Deep or multi-source research
  • Critical production decisions

πŸ”„ Implementation

For Subagents

// Routine monitoring
sessions_spawn(task="Check backup status", model="GLM-4.5-Flash")

// Standard code work
sessions_spawn(task="Build the REST API endpoint", model="GLM-4.7")

// Architecture decisions
sessions_spawn(task="Design the database schema for multi-tenancy", model="GLM-4-Plus")
For Cron Jobs
json
Copy code
{
  "payload": {
    "kind": "agentTurn",
    "model": "GLM-4.5-Flash"
  }
}
Always use Flash for cron unless the task genuinely needs reasoning.

πŸ“Š Quick Decision Tree
pgsql
Copy code
Is it a greeting, lookup, status check, or 1–2 sentence answer?
  YES β†’ FLASH
  NO ↓

Is it code, analysis, planning, writing, or multi-step?
  YES β†’ STANDARD
  NO ↓

Is it architecture, deep reasoning, or a critical decision?
  YES β†’ PLUS / 32B
  NO β†’ Default to STANDARD, escalate if struggling
πŸ“‹ Quick Reference Card
less
Copy code
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                  SMART MODEL SWITCHING                      β”‚
β”‚              Flash β†’ Standard β†’ Plus / 32B                  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  πŸ’š FLASH (cheapest)                                        β”‚
β”‚  β€’ Greetings, status checks, quick lookups                  β”‚
β”‚  β€’ Factual Q&A, reminders                                   β”‚
β”‚  β€’ Simple file ops, 1–2 sentence answers                    β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  πŸ’› STANDARD (workhorse)                                    β”‚
β”‚  β€’ Code > 10 lines, debugging                               β”‚
β”‚  β€’ Analysis, comparisons, planning                          β”‚
β”‚  β€’ Reports, long writing                                    β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  ❀️ PLUS / 32B (complex)                                    β”‚
β”‚  β€’ Architecture decisions                                   β”‚
β”‚  β€’ Complex debugging, multi-file refactoring                β”‚
β”‚  β€’ Strategic planning, deep research                        β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  πŸ’‘ RULE: >30 sec human thinking β†’ escalate                 β”‚
β”‚  πŸ’° START CHEAP β†’ SCALE ONLY WHEN NEEDED                    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
Built for z.ai (GLM) setups.
Weekly Installs
1
First Seen
Feb 28, 2026
Installed on
mcpjam1
amp1
cline1
openclaw1
opencode1
cursor1