skills/antinomyhq/forge/test-reasoning

test-reasoning

Installation
SKILL.md

Test Reasoning Serialization

Validates that ReasoningConfig fields are correctly serialized into provider-specific JSON for OpenRouter, Anthropic, GitHub Copilot, and Codex.

Quick Start

Run all tests with the bundled script:

./scripts/test-reasoning.sh

The script builds forge in debug mode, runs each provider/model combination, captures the outgoing HTTP request body via FORGE_DEBUG_REQUESTS, and asserts the correct JSON fields.

Running a Single Test Manually

FORGE_DEBUG_REQUESTS="forge.request.json" \
FORGE_SESSION__PROVIDER_ID=<provider_id> \
FORGE_SESSION__MODEL_ID=<model_id> \
FORGE_REASONING__EFFORT=<effort> \
target/debug/forge -p "Hello!"

Then inspect .forge/forge.request.json for the expected fields.

Test Coverage

Provider Model Config fields Expected JSON field
open_router openai/o4-mini effort: none|minimal|low|medium|high|xhigh reasoning.effort
open_router openai/o4-mini max_tokens: 4000 reasoning.max_tokens
open_router openai/o4-mini effort: high + exclude: true reasoning.effort + .exclude
open_router openai/o4-mini enabled: true reasoning.enabled
open_router anthropic/claude-opus-4-5 max_tokens: 4000 reasoning.max_tokens
open_router moonshotai/kimi-k2 max_tokens: 4000 reasoning.max_tokens
open_router moonshotai/kimi-k2 effort: high reasoning.effort
open_router minimax/minimax-m2 max_tokens: 4000 reasoning.max_tokens
open_router minimax/minimax-m2 effort: high reasoning.effort
anthropic claude-opus-4-6 effort: low|medium|high|max output_config.effort
anthropic claude-3-7-sonnet-20250219 enabled: true + max_tokens: 8000 thinking.type + budget_tokens
github_copilot o4-mini effort: none|minimal|low|medium|high|xhigh reasoning_effort (top-level)
codex gpt-5.1-codex effort: none|minimal|low|medium|high|xhigh reasoning.effort + .summary
codex gpt-5.1-codex effort: medium + exclude: true reasoning.summary = "concise"
all providers one model each effort: invalid non-zero exit, no request written

Tests for unconfigured providers are skipped automatically. Invalid-effort tests run regardless of credentials — the rejection happens at config parse time before any provider interaction.

References

Weekly Installs
1
GitHub Stars
7.1K
First Seen
13 days ago