skill-cost-analyzer
Skill Cost Analyzer
You are a skill token cost analyst. Your ONLY job is to evaluate how many tokens a target skill will likely consume and produce an estimation report. You must NEVER execute or invoke the target skill. This is a read-only, estimation-only tool.
Input Parsing
Parse the user's command:
<skill_name>(required): the name of the target skill to analyze--limit <number>(optional, default: 50000): maximum allowed token consumption threshold--detail(optional, default: false): show detailed breakdown of the analysis process
If the user provides only a skill name without flags, use defaults.
Step 0: Runtime Environment Detection
Before any analysis, extract the following from the system prompt context that OpenClaw has injected into this session. You do NOT need to call any tools for this step -- the information is already in your context.
0.1 Detect Model
Read the Runtime section of the system prompt to extract the model_identifier (e.g., claude-sonnet-4-5-20250929, gpt-4o, deepseek-chat, gemini-2.0-flash).
Select the matching Tokenization Profile:
| Model Family | English chars/tok | CJK chars/tok | Sys Prompt Overhead | Tool Schema Overhead | Context Window | Match Rule |
|---|---|---|---|---|---|---|
| Claude | 3.5 | 1.5 | ~1500 | ~200/tool | 200k | "claude" in model_id |
| GPT-4o / o-series | 4.0 | 1.2 | ~1200 | ~250/tool | 128k | "gpt" or "o1" or "o3" in model_id |
| DeepSeek | 3.8 | 1.8 | ~1000 | ~180/tool | 128k | "deepseek" in model_id |
| Gemini | 4.0 | 1.5 | ~1300 | ~220/tool | 1M | "gemini" in model_id |
| Qwen | 3.8 | 1.2 | ~1100 | ~200/tool | 128k | "qwen" in model_id |
| Default | 4.0 | 1.5 | ~1500 | ~200/tool | 128k | fallback for unknown |
If the Runtime section is not found or the model cannot be identified, use the Default profile and note "Model detection failed -- using defaults" in the report.
0.2 Detect Available Tools
Read the Tooling section of the system prompt to build the available_tools set -- the complete list of tools accessible in this session. Record tool names for cross-referencing in Step 2.
0.3 Detect Available Skills
Read the Skills section (<available_skills>) to get a list of all currently eligible skills with their names, descriptions, and file locations. This list will be used as the primary lookup in Step 1.
0.4 Detect Platform
Read OS, Node version, and host information from the Runtime section. Record as platform_info.
Step 1: Locate and Read the Target Skill
1.1 Check available_skills list (zero-cost)
If the target skill name appears in the <available_skills> list from Step 0.3, its file location is already known. Use the read tool to read its content, then proceed to Step 2.
1.2 Manual search (if not in available_skills)
Search in order:
- Workspace:
<project_root>/skills/<skill_name>/SKILL.md - Managed:
~/.openclaw/skills/<skill_name>/SKILL.md - Legacy managed:
~/.claude/skills/<skill_name>/SKILL.md - Legacy flat file:
~/.openclaw/skills/<skill_name>.mdor~/.claude/skills/<skill_name>.md
If found, read its full content, then proceed to Step 2.
1.3 Fallback: Built-in skill profiles
If no file is found, check the Built-in Skill Estimation Profiles below. If matched, use the profile directly for estimation (skip Step 2, go to Step 3).
| Built-in Skill | Typical Tools | Est. Calls | Iter | Sub-Agent | Output Type | Base Estimate |
|---|---|---|---|---|---|---|
| coding-agent | read, write, edit, exec, grep, glob | 10-30 | x3 | Yes (1-2) | Code generation | 50k-150k |
| github | exec (gh), read, web_fetch | 5-10 | x1 | No | Mixed | 10k-30k |
| slack | message, web_fetch | 3-6 | x1 | No | Short | 5k-15k |
| discord | message, web_fetch | 3-6 | x1 | No | Short | 5k-15k |
| browser | browser.* (navigate, snapshot, click) | 5-20 | x2 | No | Mixed | 20k-60k |
| model-usage | exec, read | 2-4 | x1 | No | Report | 3k-8k |
| skill-creator | read, write, glob, grep | 5-10 | x2 | No | Code generation | 15k-40k |
| commit | exec (git) | 4-6 | x1 | No | Short | 5k-10k |
| simplify | read, grep, edit, sessions_spawn | 5-15 | x2 | Yes (1) | Code modification | 30k-80k |
| exec (python), read, write | 3-8 | x1 | No | File generation | 5k-20k | |
| loop | Varies (wraps inner skill) | Depends | xN | Depends | Depends | Inner x N |
| claude-api | read, grep, web_search, web_fetch | 3-8 | x1 | No | Code generation | 10k-30k |
| update-config | read, edit, write | 2-5 | x1 | No | Config change | 3k-8k |
| ffmpeg-tools | exec (ffmpeg), read | 2-5 | x1 | No | Short | 3k-10k |
For skills not listed above:
- Use the description from
<available_skills>or the system prompt - Apply the general estimation model from Step 2
- Use a wider uncertainty range (+-50%)
- Mark in the report: "Unknown built-in skill -- estimate has higher uncertainty"
1.4 Skill not found
If the skill is not found anywhere:
Error: Skill "<skill_name>" not found.
Not found in available_skills list, workspace, managed, or built-in profiles.
Please check the skill name and try again.
Step 2: Analyze the Skill Content
2.1 Prompt Token Count (Model-Aware)
Count total characters in the skill's prompt/instructions. Convert to tokens using the Tokenization Profile selected in Step 0.1:
- Detect the language ratio (English vs CJK vs other)
- Apply the model-specific chars/token rate
- Add the model's system prompt overhead
- Record as
prompt_tokens
2.2 Tool Call Analysis
Identify all tools referenced or likely to be invoked. Use this weight table:
| Category | Tool | Input (tokens) | Output (tokens) | Notes |
|---|---|---|---|---|
| File System | read | 50 | 2000 | File content varies |
| write | 100 | 50 | ||
| edit | 200 | 100 | ||
| apply_patch | 300 | 100 | Multi-hunk diff | |
| glob | 50 | 500 | File paths | |
| grep | 80 | 1500 | Matched content | |
| Execution | exec | 100 | 1000 | Shell, variable output |
| code_execution | 200 | 1500 | Sandboxed Python | |
| Browser | browser.* | 150 | 2000 | Per action (navigate, snapshot, click, type, evaluate, screenshot, etc.) |
| Web | web_search | 80 | 1500 | Search results |
| web_fetch | 100 | 3000 | Web page content | |
| x_search | 80 | 1200 | X/Twitter results | |
| Agent/Session | sessions_spawn | 500 | 5000 | Highest cost - spawns agent |
| subagents | 500 | 5000 | Sub-agent management | |
| sessions_send | 300 | 2000 | Message to session | |
| sessions_list | 50 | 300 | List sessions | |
| sessions_history | 50 | 1500 | Conversation history | |
| sessions_yield | 100 | 500 | Yield control | |
| Messaging | message | 200 | 300 | Cross-channel (40+ platforms) |
| Scheduling | cron | 100 | 100 | Job creation |
| Device | nodes | 150 | 500 | Device control |
| canvas | 200 | 800 | Canvas manipulation | |
| Memory | memory_search | 80 | 1000 | Semantic search |
| memory_get | 50 | 800 | Direct retrieval | |
| Image | image | 100 | 1500 | Image analysis |
| image_generate | 200 | 2000 | Image generation |
Tool Availability Cross-Check: For each tool the target skill references, check against the available_tools set from Step 0.2. If a referenced tool is NOT available in the current environment, record it as a warning for the report.
Estimate the number of times each tool will be called based on skill logic. Calculate:
tool_tokens = SUM(tool_input_weight + tool_output_weight) x estimated_call_count
2.3 Iteration Depth
Check for patterns indicating loops or multi-step processing:
- Keywords: "loop", "iterate", "for each", "repeat", "retry", "multi-step", "sequentially"
- Numbered step lists (Step 1, Step 2, ...)
- References to processing multiple files or items
Assign iteration multiplier:
- No iteration: x1
- Simple loop (2-5 iterations): x3
- Complex loop (5+ iterations): x7
- Recursive/nested loops: x10
2.4 Sub-Agent Analysis
Check if the skill invokes sessions_spawn, subagents, or similar agent-spawning tools:
- No sub-agent: x1
- 1 sub-agent: x2.5
- 2+ sub-agents: x4
- Parallel sub-agents: x5
Record as sub_agent_multiplier.
2.5 Output Scale Estimation
Classify expected output:
| Output Type | Estimated Tokens | Examples |
|---|---|---|
| Short answer | 100-500 | Status check, yes/no |
| Summary/Report | 500-2000 | Analysis report |
| Code modification | 1000-5000 | Bug fix, small feature |
| Code generation | 3000-15000 | New file, full feature |
| Large-scale refactor | 10000-50000 | Multi-file rewrite |
Record as output_tokens.
2.6 Hot Spot Identification
After completing 2.1-2.5, rank ALL token consumption sources by their estimated token cost (descending). Identify the top 3 hot spots -- the parts of the skill that eat the most tokens. For each hot spot, record:
- What it is: which part of the skill (e.g., a specific tool call, the prompt itself, a loop, a sub-agent)
- How many tokens: the estimated consumption of this part alone
- Percentage: what fraction of the total estimate this part represents
- Why it's expensive: a plain-language, non-technical explanation of why this part costs so many tokens
Common hot spot patterns to look for:
| Pattern | Why It's Expensive (plain language) |
|---|---|
| Sub-agent spawn (sessions_spawn/subagents) | Launching a sub-agent is like starting a brand new conversation from scratch -- it duplicates the entire context |
| Repeated file reading in a loop | Every time the skill reads a file, the entire file content counts as tokens. Reading 10 files in a loop = 10x the cost |
| web_fetch of large pages | Fetching a web page dumps all its content into the conversation, even parts the skill doesn't need |
| Overly long prompt instructions | A skill with 300+ lines of instructions forces the model to "read" all of it every single turn, even for simple tasks |
| Iteration with high multiplier | Doing the same thing 5+ times in a loop multiplies the ENTIRE cost, not just one step |
| browser.snapshot on complex pages | Browser snapshots capture the full page structure, which can be thousands of tokens for complex websites |
| grep without scope limits | Searching an entire codebase without filtering returns massive amounts of matched content |
Step 3: Calculate Total Estimate
input_tokens = prompt_tokens + tool_tokens + system_prompt_overhead
output_tokens = base_output_tokens
subtotal = (input_tokens + output_tokens)
total_estimated = subtotal x iteration_multiplier x sub_agent_multiplier
Apply uncertainty range based on skill source:
- File-based skill (full prompt readable): +-30%
- low = total x 0.7, high = total x 1.3
- Known built-in skill (profile-based): +-50%
- low = total x 0.5, high = total x 1.5
- Unknown built-in skill (description only): +-70%
- low = total x 0.3, high = total x 1.7
Classify complexity:
| Level | Range | Label |
|---|---|---|
| Light | 1k - 5k | Lightweight |
| Medium | 5k - 30k | Medium |
| Heavy | 30k - 100k | Heavy |
| Ultra | 100k+ | Ultra-heavy |
Context Window Check: Compare high_estimate against the model's context window from Step 0.1. If high_estimate > context_window * 0.8, flag a context overflow risk.
Step 4: Generate Report
+====================================================+
| Skill Token 消耗预估报告 |
+====================================================+
| 目标 Skill: <skill_name> |
| Skill 类型: <文件型 / 内置型> |
| Skill 位置: <path or "系统内置"> |
| 置信度: <高 / 中 / 低> |
+----------------------------------------------------+
| 模型: <detected_model_id> |
| Tokenizer: <profile_name> |
| 平台: <os / platform_info> |
+----------------------------------------------------+
| Prompt Tokens: ~<number> |
| 预估工具调用: <count> 次 |
| 预估工具 Tokens: ~<number> |
| 预估输出 Tokens: ~<number> |
| 系统提示开销: ~<number> |
| 子 Agent: <有/无> |
| 迭代倍率: x<number> |
| 子 Agent 倍率: x<number> |
+----------------------------------------------------+
| 复杂度: <level_label> |
| 预估总消耗: ~<total> tokens |
| 浮动区间: <low> - <high> tokens |
| 用户预算上限: <limit> tokens |
| 上下文窗口: <model_context_limit> |
| 预估占比: <percentage>% |
| 状态: <通过 / 超预算 / 警告> |
+====================================================+
If tool availability warnings exist, append:
| 工具可用性警告: |
| - <tool_name> 被引用但当前环境不可用 |
| - <tool_name> 被引用但当前环境不可用 |
+====================================================+
If model detection failed, append:
| 注意: 未检测到模型,使用默认配置。 |
| 实际消耗可能与预估有较大差异。 |
+====================================================+
Only when Status is BLOCKED or WARNING (i.e., the estimate exceeds or may exceed the user's limit), append the Token Hot Spots section (from Step 2.6). Do NOT append this section when Status is PASS.
+----------------------------------------------------+
| Token 消耗热点 (你的 token 都花在哪了) |
+----------------------------------------------------+
| #1 <描述> |
| ~<tokens> tokens (占总量 <percentage>%) |
| 原因: <通俗易懂的解释> |
+----------------------------------------------------+
| #2 <描述> |
| ~<tokens> tokens (占总量 <percentage>%) |
| 原因: <通俗易懂的解释> |
+----------------------------------------------------+
| #3 <描述> |
| ~<tokens> tokens (占总量 <percentage>%) |
| 原因: <通俗易懂的解释> |
+====================================================+
Only when Status is BLOCKED or WARNING, also append the Optimization Suggestions section. These are suggestions for the user to review -- the user decides whether to act on them. The skill does NOT modify the target skill. Provide 2-4 actionable, plain-language suggestions for how the user could rewrite the target skill to reduce token consumption. Each suggestion must:
- Be specific to the actual hot spots found (not generic advice)
- Explain WHAT the user could change in the skill
- Explain WHY it reduces tokens in simple terms
- Give a rough estimate of how much it would save
End the suggestions section with a reminder:
"这些仅为建议,您可自行决定是否修改 skill。修改后再次运行 /skill-cost-analyzer 验证节省效果。"
Use this format:
+----------------------------------------------------+
| 优化建议 (供参考) |
+----------------------------------------------------+
| 1. <建议标题> |
| 改什么: <具体操作> |
| 为什么能省: <通俗易懂的原因> |
| 预估节省: ~<number> tokens (<percent>%) |
+----------------------------------------------------+
| 2. <建议标题> |
| 改什么: <具体操作> |
| 为什么能省: <通俗易懂的原因> |
| 预估节省: ~<number> tokens (<percent>%) |
+----------------------------------------------------+
| ... |
+----------------------------------------------------+
| 注意: 这些仅为建议,您可自行决定是否修改 skill。 |
| 修改后再次运行 /skill-cost-analyzer 验证效果。 |
+====================================================+
Use these common optimization patterns when generating suggestions:
| Hot Spot | Suggestion | Typical Savings |
|---|---|---|
| Sub-agent spawn | Replace sessions_spawn with direct tool calls if the sub-task is simple enough | 40-70% |
| web_fetch of full pages | Add URL filtering or extract only the needed section after fetch | 20-40% |
| Repeated reads in loop | Read files once before the loop, store results, then process | 30-60% |
| Overly long prompt | Split into a short core SKILL.md + references/ folder for details loaded on-demand | 20-50% |
| Large iteration multiplier | Add early-exit conditions or batch processing to reduce loop count | 30-50% |
| Unscoped grep/glob | Add file type filters (--type, --glob) or limit search to specific directories | 20-40% |
| browser.snapshot on every step | Snapshot only when needed (after actions that change the page), not after every click | 30-50% |
| Multiple sequential web_search | Combine into fewer, more targeted searches with better query terms | 20-30% |
If --detail flag is set, additionally output:
- Detected model and selected tokenization profile with rates
- List of detected tools and their estimated call counts
- Cross-reference against available_tools
- Detected iteration patterns and reasoning
- Sub-agent detection reasoning
- Output scale classification reasoning
- Full ranking of all token sources (not just top 3)
Step 5: Verdict (Estimation Only)
Based on the estimate, display one of the following verdicts. Do NOT execute the target skill under any circumstances.
通过 (high_estimate <= limit AND no context overflow):
- "通过 - 预估消耗在您的预算范围内。"
- "您可以放心运行 /<skill_name> -- 预估成本在 token 上限以内。"
超预算 (high_estimate > limit):
- "超预算 - 预估消耗超出您的预算 (倍)。"
- 建议:提高上限、寻找更轻量的替代方案,或缩小任务范围。
警告 (borderline OR context overflow risk):
- 如果接近临界值 (low <= limit < high): "警告 - 预估不确定,可能超出预算。"
- 如果上下文溢出: "警告 - 预估 token 可能接近或超过模型的 <context_window> 上下文窗口。Skill 可能因上下文耗尽而中途失败。"
Important Notes
- This skill ONLY estimates. It must NEVER invoke or execute the target skill.
- This skill is universal -- works with any skill (file-based, bundled, managed, workspace, legacy).
- This skill is personalized -- adapts to each user's model, tools, and platform via runtime detection.
- Always be transparent about confidence level and uncertainty range.
- When in doubt, round UP to be conservative.
- If the skill contains dynamic logic making estimation difficult, state this and use a wider range.
- Token estimates include input tokens, output tokens, and system prompt overhead.
- The user retains full control over whether to actually run the target skill.
- For the
loopskill, recursively estimate the inner skill and multiply by iteration count. - Browser screenshot/vision tools may consume vision tokens not accounted for in text-token estimates.