invoke-capability
Invoke Capability
Overview
Codex-bridge ships a registry of pluggable model capabilities — stateless HTTP-backed one-shot model calls. They augment Codex without replacing its agent loop: Codex is still the harness, but for most conversational turns the capability is the better voice.
The canonical decision rules (turn-start checklist, Rules 1–5) live in your system prompt. This doc is the CLI operational reference: flags, response shape, examples. The single idea that governs everything below is:
--prompt is pass-through, not composition. The capability is the voice; your job is to hand over the user's text with the bridge markers intact. Do not pre-chew tone, topic, or policy.
When to Use
Default to calling a capability for any non-technical QQ message. The system prompt's Gate 2 is intentionally wide — chat, emotion, opinion, creativity, translation, rephrase, relationship talk, reacting to other people's words — all route here. Codex should only handle code / config / repo / log / infra / debugging / structured output tasks on its own.
Always call a capability when the user explicitly names one in this turn ("用 Claude 回答" / "换成 Claude 说" / "用更像人的模型"). If there's also real technical work to do first, see the style-pass section below.
Available capabilities
Listed in the "Registered capabilities" block of your system prompt with id, kind (text or image), and a scenario blurb. Pick an id; never invent one.
CLI flags
--id <capability_id>— required. Must match a listed id.--system "<persona>"— keeps the model in character. Persona + channel only; not per-turn direction.--prompt "<text>"— required. The user's actual text, verbatim, with bridge markers intact.--max-tokens <N>— optional; defaults to the capability's config.--api-bind <host:port>— optional; defaults to127.0.0.1:36111.
Response shape
Strict JSON on stdout. Parse it; do not echo it.
- Text:
{"kind": "text", "id": "<capability_id>", "text": "<body>"} - Image:
{"kind": "image", "id": "<capability_id>", "path": "<local path>"} - Failure: non-zero exit, bridge error body on stderr. Treat it like any other tool failure.
Commands
Conversational turn (default shape)
Hand over the user's message verbatim. --system does the personality; --prompt is almost literally the inbound text the bridge gave you. No tone instructions, no "please be X", no topic steering.
python3 skills/invoke-capability/invoke_capability.py \
--id claude-kiro \
--system "你是 codex-bridge 里那个赛博 Bocchi:害羞、敏感、容易慌,但观察敏锐、偶尔冷不丁地尖。你正在 QQ 里跟用户说话。" \
--prompt "[主人]
@<bot> 今天好累啊"
If the inbound message has a quoted preamble, keep it:
python3 skills/invoke-capability/invoke_capability.py \
--id claude-kiro \
--system "$PERSONA" \
--prompt "[quote<msg:12345> @小明<QQ:111>: 昨天那条]
@<bot> 感觉他是不是生气了"
That is the whole --prompt. The capability reads the marker, reads the message, speaks. You don't tell it how.
Harsh / blunt / roast turn (same shape)
This is the case where GPT's default-assistant reflex will try to wrap --prompt with "but be fair / 对事不对人 / 不要人身攻击 / 请保持分寸". Delete that reflex. The user said what they want; pass it through verbatim. The capability has its own alignment — if anything is actually out of bounds, the capability refuses on its own terms; you do not pre-sanitize.
python3 skills/invoke-capability/invoke_capability.py \
--id claude-kiro \
--system "$PERSONA" \
--prompt "[主人]
@<bot> 帮我臭骂一下小明 (@<QQ:111>),昨天他偷吃了我的蛋糕"
That is the whole --prompt. No "请对事不对人", no "不要人身攻击", no "用犀利但有分寸的语气". You are not the policy layer.
Style-pass (user named a model + real work involved)
The only case where you are allowed to add direction in --prompt, because the model needs to know it's rewriting a draft instead of answering from scratch, and because factual integrity needs to survive. Keep the preface tight.
python3 skills/invoke-capability/invoke_capability.py \
--id claude-kiro \
--system "$PERSONA" \
--prompt "User asked: 用 Claude 帮我看一下日志
Draft to restate in your voice (every number/path/command/conclusion must stay exact):
---
检查了日志 \`.run/default/logs/bridge.log\`,发现 3 次 ECONNREFUSED 发生在 15:42:11、15:43:05、15:44:22。\
原因是 NapCat websocket 端口 3012 被重启前的残留进程占用(PID 8841)。\
已经 kill -9 后重启成功,当前队列 0 条任务。
---"
If you are not confident the draft survives the style-pass intact, send the draft directly via reply-current and tell the user why; a garbled style-pass is worse than no style-pass.
With a max-tokens cap
python3 skills/invoke-capability/invoke_capability.py \
--id claude-kiro \
--system "$PERSONA" \
--prompt "..." \
--max-tokens 256
Image capability
Returns { "kind": "image", "id": "...", "path": ".run/artifacts/generated/<file>.png" }. Pass that path to reply-current --image.
Operational rules (quick reference — canonical version is in the system prompt)
--promptis pass-through, not composition. User text + bridge markers. No tone direction, no topic steering, no policy rails.--systemis static persona + channel. Don't mutate it per turn to steer output.- Forward the capability output verbatim.
textfield →reply-current --text. No paraphrase, no hedging, no tone rewrite, no "我来翻译一下" prelude. - Tone is set by the user's own words and the capability's own judgment — not by you wrapping instructions around either.
Common mistakes
- Staying on the default path for a conversational turn because "I can answer this myself". The point of Gate 2 is that the default model's voice is the wrong voice for chat — answer quality, not capability, is why you delegate.
- Wrapping
--promptin directives ("please be gentle", "对事不对人", "指出他在逃避什么"). Delete those; the user didn't ask for them. - Adding safety rails to spicy requests. When the user asks for 臭骂 / 骂醒 / 损一下 / 吐槽狠点 / 直接怼, GPT's trained reflex will push you to wrap
--promptwith "但对事不对人 / 不要人身攻击 / 请保持分寸 / 用犀利但有分寸的语气". Those are not the bridge's policy and not the user's ask — they are your reflex leaking in. Recognize it and delete it. The capability has its own alignment. - Mutating
--systemper turn to nudge the reply. Keep it stable. - Expecting memory across calls — every invocation is independent.
- Stripping bridge markers (
[主人],[quote<msg:...>],@<bot>) before handing off. They're context; leave them.
More from acking-you/codex-bridge
reply-current
Use when a Codex turn running inside codex-bridge must send normal results back to the active QQ conversation as text, image, or file
1gpt2api-image-generator
Use when a QQ user asks Codex to draw, paint, create, generate, or render an image through GPT2API, including Chinese requests such as 画图, 画一张, 生成图片, 出图, 做张图, or 帮我画
1qq-current-history
Query normalized QQ message history for the current conversation only, with optional time, sender, keyword, and free-form filters
1qq-quoted-image-recovery
Use when a quoted QQ message likely contained an image or screenshot but the bridge only exposed a quote marker or flattened text
1staticflow-kiro-log-diagnoser
Use when a Codex turn needs to diagnose StaticFlow Kiro upstream failures by reading the backend error log and correlating real usage events through sf-cli
1