kage

Installation
SKILL.md

Kage — pentest sandbox

When to use this skill

  • kage <target> — black-box engagement
  • pentest <target> / security audit on <target> — same as above
  • kage audit <local path> — white-box source audit only
  • kage greybox <target> <source-path> — black-box + source context

<target> can be a domain, URL, wildcard (*.example.com), or a local source path. In greybox mode, Turn 0 runs the bundled audit-context-building methodology on the source first, then feeds trust-boundary and auth-flow context into Turns 2–3.

Container model

Every tool runs inside a per-engagement Kali container via the $K shim. Each working directory gets its own container (name derived from $PWD), so two engagements run simultaneously without cross-contamination.

SKILL_DIR="$HOME/.claude/skills/kage"
K="$SKILL_DIR/scripts/k"
"$K" <cmd>            # runs <cmd> inside this engagement's container
"$K" ls | reset | prune | nuke   # management subcommands

Probes live at /skill/scripts/*.py inside the container (read-only bind mount). Results go to /workspace/results/<target>/ (bind-mounted from your CWD).

Never run pentest tools directly on the host.


Turn 0 — Setup

SKILL_DIR="$HOME/.claude/skills/kage"
K="$SKILL_DIR/scripts/k"
"$K" whoami            # warms the sandbox; surfaces docker errors

TARGET="example.com"   # derive from user prompt; slugify for paths

# Host-side mkdir so dirs are owned by your UID (host Write tool needs this).
mkdir -p "results/$TARGET"/{recon,vulns,testing,exploits,chains,verification,judging,reports}

Read ./creds.md from the user's CWD if present. If absent: ask whether to proceed black-box, or point at the template in assets/creds.sample.md.

Greybox pre-flight (only if mode is greybox)

Apply the audit-context-building methodology to <source-path>. For dense modules, dispatch the function-analyzer sub-agent — multiple in parallel if warranted.

Output results/$TARGET/context.md covering: trust boundaries, auth flow, data flow, high-value entry points, known-sensitive parameters. All Turn-2 testers and Turn-3a exploiter read context.md alongside their usual inputs.

Write results/$TARGET/engagement.json with {target, scope_type, started_at, rules_of_engagement}.


Turn 1 — Recon (two phases)

Discovery runs fast (1–3 min) as a streaming pipe. Vuln scanning is long (5–15 min) and isolated in a sub-agent so nuclei's verbose output doesn't pollute the main context.

Phase 1 — Discovery

R="results/$TARGET"
"$K" bash -c '
  set -e
  cd /workspace
  R="results/'"$TARGET"'"
  (subfinder -d "'"$TARGET"'" -silent \
     | tee "$R/recon/subs.txt" \
     | httpx -silent -title -tech-detect -status-code \
     | tee "$R/recon/live.txt") &
  (gau --subs "'"$TARGET"'" > "$R/recon/wayback.txt") &
  (until [ -s "$R/recon/live.txt" ]; do sleep 1; done
   katana -u "$R/recon/live.txt" -d 3 -jc -silent -o "$R/recon/crawl.txt") &
  (python3 /skill/scripts/dorks.py -d "'"$TARGET"'" --output "$R/recon/dorks.json") &
  wait'

In parallel, dispatch port-scanner: HOSTS_FILE=$R/recon/live.txt, OUTDIR=$R/recon/ports/.

If GITHUB_TOKEN is set, also run scripts/gitmail.py:

"$K" bash -c "GITHUB_TOKEN=\"$GH_TOKEN\" python3 /skill/scripts/gitmail.py \
   -O <guessed-org> -r -s --verified-only \
   -o /workspace/$R/recon/github.json"

Wait for pipe + port-scanner (+ gitmail if run) before Phase 2.

Phase 2 — Vuln scan

Dispatch vuln-scanner with LIVE_HOSTS_FILE=$R/recon/live.txt, OUTPUT=$R/vulns/nuclei.txt.

Wait for vuln-scanner before Turn 2.

Summary

Write recon/summary.md: subdomain count, live-host count, high-value ports, nuclei severity counts + top hits, auth endpoints, ID-bearing API paths, URL-accepting parameters.

Stop here if the user said recon-only.


Turn 2 — Deep testing (parallel fan-out)

Read recon/summary.md. For every trigger that fires, spawn the matching tester sub-agents simultaneously — emit all dispatches in one message. Each tester issues its own $K calls into the shared container (docker exec is concurrent) and writes to its own testing/<class>/ path.

Trigger in recon Dispatch
login / SSO / OAuth / JWT endpoints auth-tester
API endpoints with object IDs + 2 accounts idor-tester
403s, admin panels, role-gated routes + ≥2 roles access-control-tester
url= / redirect= / proxy= / fetch= / webhook= params ssrf-tester
parameterised URLs, search fields, forms, GraphQL vars injection-tester
user-facing HTML (comments, profile, search) client-side-tester
OpenAPI/Swagger URL, /graphql, REST routes api-tester
payment / coupon / redeem / invite / signup endpoints logic-tester
crawler surface thin; hidden paths likely content-discovery
JS bundles discovered in Turn 1 js-secret-scanner
any live web host headers-tester
needs 2+ provisioned accounts (self-service signup) use agentmail to spin up disposable inboxes (requires AGENTMAIL_API_KEY)

In greybox mode, every tester also reads results/$TARGET/context.md and targets source-known weak points over generic scans.

Wait for all testers to return. Aggregate their JSON into a candidate-finding inventory. Every probe emits a shared findings[] / summary{} shape; all HTTP goes through scripts/tls.py (or scripts/browser.py for Cloudflare targets).

5-minute rule. If a lead doesn't prove itself within 5 minutes, move on. No theoretical bugs.


Turn 3 — Exploit, verify, chain, judge

Sequential. Each step feeds the next.

3a. Exploit — filter BEFORE writing a PoC

Read references/judging.md. For each candidate finding:

  • Drop outright if it matches an auto-drop pattern (missing headers without PoC, version banners, self-XSS, logout CSRF, TRACE/TRACK, TLS cipher issues, SPF/DKIM/DMARC, info-leak 500s, rate-limit absent without an abuse scenario). Do NOT build a PoC for these.
  • Drop if the impact statement contains "could potentially" or requires a condition unverifiable black-box. In greybox mode, check context.md first — source may confirm/refute.
  • Build a PoC only when attacker action produces an observable, reproducible server-side outcome with real impact (data access, code execution, auth bypass, cost amplification, state change, financial).

For each survivor, write a curl_cffi PoC at results/$TARGET/exploits/<slug>.py. Drop anything that doesn't reproduce first run.

3b. Verify — parallel

For each PoC, dispatch verifieremit all verifier Tasks in one message. Each gets its own OUTDIR=results/$TARGET/verification/F<NNN>/, fresh auth, clean session, 3× reproducibility.

Aggregate into verification/verified_findings.json.

3c. Chain

Dispatch chain-builder on verified findings. It maps the 7 canonical patterns in references/chains.md and looks for creative combinations.

3d. Judge

Dispatch judge. It applies the 4-gate filter from references/judging.md, scores confidence + CVSS, deduplicates, enforces the exclusion list.

Outputs: judging/approved_findings.json (the only findings that reach the report), judgment.md (audit trail), dropped_findings.md.


Turn 4 — Audit report

Dispatch report-writer with: TARGET, ENGAGEMENT_JSON, APPROVED_FINDINGS_JSON, TEMPLATE=$SKILL_DIR/references/audit-report-template.md, OUTPUT=results/$TARGET/audit-report.md.

The agent fills {{placeholders}} from engagement + approved findings. It does NOT re-filter — the judge already did that. Print a summary table to the user when done.


Failure modes to surface (don't swallow)

  • Docker not installed / not running → print the shim's error, stop.
  • Target unreachable / DNS fails.
  • Rate-limited or WAF-blocked → slow down, rotate fingerprint (tls.py --impersonate <name>), or swap to scripts/browser.py.
  • creds.md missing when the user asked for authenticated testing.

Reference docs (load on demand)

Related skills
Installs
28
Repository
workersio/spec
GitHub Stars
134
First Seen
14 days ago