skills/natsufox/a-stockit/model-capability-advisor

model-capability-advisor

Installation
SKILL.md

Model Capability Advisor

Recommend model capability choices for: $ARGUMENTS

Overview

  • Implementation status: code-backed
  • Local entry script: <bundle-root>/model-capability-advisor/run.py
  • Primary purpose: score provided model names against workflow needs and return quick/deep recommendation pairs
  • Research layer: meta-workflow optimization (Cross-cutting concern for agent execution design)
  • Workflow role: meta-planning support for agent execution design, not a market-analysis surface
  • Local executor guarantee: produce heuristic workflow-fit advice from the local capability profiles it knows about

Use When

  • The user asks which model names fit a workflow.
  • The caller wants quick/deep pairing advice for analysis, chat, batch, backtest, or paper.
  • The user wants warnings about likely model tradeoffs before assigning a workflow.
  • The user wants to optimize agent execution for speed vs. depth.
  • The user wants to understand model capability differences for quantitative workflows.

Do Not Use When

  • The caller needs verification that a provider is actually configured or reachable.
  • The user expects benchmark-backed precision or live provider comparison; this skill is heuristic and local.
  • The question is about investment analysis rather than agent execution design.
  • The user wants actual performance benchmarks rather than capability profiles.

Inputs

  • Optional positional target.
  • Optional --workflow; defaults to analysis.
  • Optional repeatable --quick MODEL.
  • Optional repeatable --deep MODEL.
  • Scope note:
    • the local advisor reasons over known capability profiles and workflow heuristics
    • the agent should say when a recommendation is only based on naming heuristics or incomplete local profiles

Execution

Step 1: Define workflow requirements

Identify specific needs for the target workflow:

Workflow types and requirements:

Analysis workflow:

  • Speed priority: Fast iteration for exploratory analysis
  • Depth priority: Deep reasoning for thesis development
  • Key capabilities: Long context, structured output, evidence synthesis
  • Typical use: analysis, market-brief, stock-data

Chat workflow:

  • Speed priority: Low latency for interactive conversation
  • Depth priority: Contextual understanding for multi-turn dialogue
  • Key capabilities: Context retention, clarification, follow-up
  • Typical use: strategy-chat, interactive Q&A

Batch workflow:

  • Speed priority: High throughput for screening many symbols
  • Depth priority: Consistent quality across batch
  • Key capabilities: Parallel processing, cost efficiency
  • Typical use: market-screen, decision-dashboard, watchlist-import

Backtest workflow:

  • Speed priority: Fast iteration for parameter tuning
  • Depth priority: Rigorous evaluation for validation
  • Key capabilities: Numerical reasoning, statistical analysis
  • Typical use: backtest-evaluator, performance analysis

Paper trading workflow:

  • Speed priority: Real-time decision making
  • Depth priority: Risk assessment and validation
  • Key capabilities: Numerical precision, rule adherence
  • Typical use: paper-trading, strategy-design, decision-support

Step 2: Define model capability profiles

Establish capability dimensions for model evaluation:

Capability dimensions:

Speed:

  • Tokens per second (throughput)
  • Time to first token (latency)
  • Total response time
  • Cost per token

Reasoning depth:

  • Complex problem solving
  • Multi-step reasoning
  • Abstract thinking
  • Nuanced judgment

Context handling:

  • Maximum context window
  • Context retention quality
  • Long-range dependency handling
  • Context compression efficiency

Structured output:

  • JSON/XML generation quality
  • Table formatting
  • Numerical precision
  • Consistency across outputs

Tool use:

  • Function calling reliability
  • Parameter extraction accuracy
  • Error handling
  • Multi-tool orchestration

Domain knowledge:

  • Financial domain understanding
  • Quantitative reasoning
  • Statistical analysis
  • A-share market specifics

Step 3: Score candidate models

Evaluate available models against workflow requirements:

Model scoring methodology:

For each model, score (0-100) on each dimension:

  • Speed: Based on known throughput and latency
  • Reasoning: Based on benchmark performance and complexity handling
  • Context: Based on window size and retention quality
  • Structure: Based on output formatting reliability
  • Tools: Based on function calling accuracy
  • Domain: Based on financial knowledge demonstrations

Workflow-specific weighting:

Analysis workflow weights:

  • Reasoning: 40%
  • Context: 25%
  • Structure: 15%
  • Domain: 15%
  • Speed: 5%

Chat workflow weights:

  • Speed: 35%
  • Context: 30%
  • Reasoning: 20%
  • Structure: 10%
  • Tools: 5%

Batch workflow weights:

  • Speed: 50%
  • Structure: 20%
  • Tools: 15%
  • Reasoning: 10%
  • Context: 5%

Backtest workflow weights:

  • Reasoning: 35%
  • Structure: 30%
  • Domain: 20%
  • Speed: 10%
  • Context: 5%

Paper trading workflow weights:

  • Reasoning: 30%
  • Tools: 25%
  • Structure: 25%
  • Speed: 15%
  • Domain: 5%

Composite score calculation:

  • Weighted sum of dimension scores
  • Normalize to 0-100 scale
  • Rank models by composite score

Step 4: Generate quick/deep recommendations

Select optimal model pairing:

Quick model selection:

  • Prioritize speed and cost efficiency
  • Acceptable reasoning depth for routine tasks
  • Good enough for screening, monitoring, batch operations
  • Typical candidates: Gemini Flash, Claude Haiku, GPT-3.5

Deep model selection:

  • Prioritize reasoning depth and accuracy
  • Acceptable speed for critical decisions
  • Required for thesis development, risk assessment, validation
  • Typical candidates: Claude Opus, GPT-4, Gemini Pro

Pairing strategy:

  • Use quick model for initial screening and routine tasks
  • Escalate to deep model for critical decisions and complex analysis
  • Balance cost vs. quality based on workflow stage
  • Consider fallback options if primary unavailable

Escalation triggers:

  • High uncertainty in quick model output
  • Critical decision point (large position, high risk)
  • Complex reasoning required (multi-factor analysis)
  • Validation needed (cross-check quick model)
  • User explicitly requests deeper analysis

Step 5: Identify model tradeoffs

Highlight important considerations:

Speed vs. depth tradeoffs:

  • Quick models: 5-10× faster but may miss nuances
  • Deep models: More thorough but higher latency and cost
  • Batch operations: Speed matters more than individual quality
  • Critical decisions: Depth matters more than speed

Context window tradeoffs:

  • Large context: Can process more data but slower and costier
  • Small context: Faster and cheaper but may miss connections
  • Long documents: Require large context models
  • Iterative workflows: Can use smaller context with state management

Cost tradeoffs:

  • Quick models: $0.001-0.01 per 1K tokens
  • Deep models: $0.01-0.10 per 1K tokens
  • Batch operations: Cost scales linearly with volume
  • Critical decisions: Cost is secondary to quality

Reliability tradeoffs:

  • Established models: More reliable but may be slower or costlier
  • Newer models: Faster and cheaper but less proven
  • Tool use: Some models better at function calling
  • Structured output: Some models better at JSON/table generation

Step 6: Generate capability advice report

Organize findings into structured report:

Part 1: Workflow Requirements

  • Target workflow type
  • Speed vs. depth priority
  • Key capability requirements
  • Typical use cases

Part 2: Quick Model Recommendation

  • Recommended model name
  • Composite score for workflow
  • Dimension scores (speed, reasoning, context, etc.)
  • Strengths for this workflow
  • Limitations to be aware of

Part 3: Deep Model Recommendation

  • Recommended model name
  • Composite score for workflow
  • Dimension scores
  • Strengths for this workflow
  • Limitations to be aware of

Part 4: Pairing Rationale

  • Why this quick/deep pairing
  • When to use quick vs. deep
  • Escalation triggers
  • Cost/quality tradeoff

Part 5: Alternative Candidates

  • Other viable quick models (with scores)
  • Other viable deep models (with scores)
  • Fallback options if primary unavailable
  • Experimental models to consider

Part 6: Model Tradeoffs

  • Speed vs. depth considerations
  • Context window considerations
  • Cost considerations
  • Reliability considerations

Part 7: Important Unknowns

  • Provider availability (not verified)
  • Actual latency in host environment (not measured)
  • Tool-call reliability (not tested)
  • Cost in production (estimates only)
  • Model updates (profiles may be outdated)

Part 8: Recommendations

  • Use quick model for: [specific tasks]
  • Use deep model for: [specific tasks]
  • Monitor performance and adjust
  • Consider benchmarking in production

Step 7: Run the local executor

python3 <bundle-root>/model-capability-advisor/run.py --workflow analysis --quick gemini-flash --deep claude-sonnet

Step 8: Deliver as heuristic planning support

When delivering results, maintain proper framing:

Advice interpretation:

  • This is heuristic workflow-fit advice, not empirical benchmarking
  • Scores are based on known capability profiles, not production testing
  • Recommendations assume typical use cases, not specific edge cases
  • Actual performance may vary based on host environment

Verification needed:

  • Provider availability (is model actually configured?)
  • Actual latency (measure in production environment)
  • Tool-call reliability (test with actual workflows)
  • Cost in production (monitor actual usage)
  • Model updates (check for newer versions)

Limitation disclosure:

  • Profiles may be outdated (models update frequently)
  • Heuristics may not match specific use case
  • No guarantee of availability or performance
  • Recommendations are starting point, not final answer
  • Empirical testing recommended for high-stakes workflows

Output Contract

  • Success format: readable text beginning with # Model Capability Advice (<workflow>).
  • Sections: selected quick model, selected deep model, rationale, optional warnings, and candidate profiles.
  • Caller-facing delivery standard:
    • Eight-part structure: Workflow requirements, quick model recommendation, deep model recommendation, pairing rationale, alternative candidates, model tradeoffs, important unknowns, recommendations
    • Heuristic framing: Label result as workflow-fit advice, not empirical benchmarking
    • Scoring transparency: Provide dimension scores and composite scores for each model
    • Pairing rationale: Explain why this quick/deep pairing and when to use each
    • Tradeoff disclosure: Speed vs. depth, context, cost, reliability considerations
    • Unknown identification: Provider availability, actual latency, tool reliability not verified
    • Verification guidance: State what needs to be verified in production
    • No authoritative claims: Avoid implying recommended model is installed, configured, or benchmark-dominant unless verified

Failure Handling

  • Parse and argument errors: non-zero exit with a readable 命令错误 message.
  • Missing candidates do not cause failure; the output degrades into workflow-level guidance.
  • Unknown workflow type: provide general guidance and suggest closest known workflow.
  • No models provided: recommend based on typical model profiles for workflow.
  • Conflicting requirements: highlight tradeoffs and recommend multiple options.

Key Rules

  • Keep this skill advisory rather than authoritative.
  • Do not imply that a recommended model is installed or configured unless separately verified.
  • Use it to improve workflow design, not to replace empirical benchmarking when high-stakes performance matters.
  • Scoring must be transparent. Provide dimension scores and weighting methodology.
  • Tradeoffs must be explicit. Speed vs. depth, cost vs. quality, etc.
  • Unknowns must be identified. Provider availability, actual latency, tool reliability.
  • Verification must be recommended. State what needs testing in production.
  • Profiles may be outdated. Acknowledge that models update frequently.
  • Heuristic framing is mandatory. This is planning support, not empirical truth.

Composition

  • Often used before analysis, strategy-chat, backtest-evaluator, or paper-trading.
  • Works as a planning helper rather than a market-analysis skill.
  • Can inform agent execution design across all workflow stages.
  • Should be rerun when new models become available or requirements change.
  • Results can inform cost optimization and performance tuning.
Weekly Installs
1
GitHub Stars
2
First Seen
Mar 20, 2026