context-cartography

Installation

SKILL.md

Context Cartography

Design what goes into an agent's context window — and what stays out.

The Boundary Rule

Designing or redesigning a context window?     →  Use this skill.
Context window is full and you need to cut?    →  Start at the TRIAGE entry point.
Need to validate whether your design works?    →  Use EDD after this skill.
One-off prompt or exploratory prototyping?     →  Don't use this skill.

Two Entry Points

Entry A — Full Design (greenfield/major redesign): All six steps: TASK → SURVEY → PRIORITIZE → SIZE → STRUCTURE → CUT.

Entry B — Triage (context too big): Start at CUT. If cuts reveal the wrong information is prioritized, work backwards through PRIORITIZE and SIZE.

The Flow

Step 1 — TASK

What will the agent do with this context? Be specific. Not "code assistance" but "review pull requests for a Python microservices codebase, checking style, correctness, and security."

Write it as a single sentence. If you need two, you may need two context configurations.

Output: One-sentence task definition.

Step 2 — SURVEY

Enumerate all candidate context sources. Don't filter yet. Common sources:

System instructions / role definition
Tool definitions and schemas
Code files (source, tests, configs)
Documentation (internal, API docs, specs)
Style guides / conventions
Architecture descriptions
Git history / recent changes
Conversation history
Retrieved documents (RAG)
Examples (few-shot)
Error logs / stack traces
External references (URLs, schemas)

Output: Complete list with estimated token cost for each.

Step 3 — PRIORITIZE

Priority	Meaning	Rule
P0 — Essential	Agent cannot do the task without this	Always include
P1 — Important	Significantly improves quality	Include if budget allows
P2 — Useful	Helps with edge cases or polish	Include only if space remains
P3 — Irrelevant	Does not affect this task	Never include

The non-obvious call: P1 items that seem essential but aren't (false P0), and P2 items that turn out to be load-bearing (hidden P0). The pattern catalog flags these.

Output: Prioritized list with P0/P1/P2/P3 ratings.

Step 4 — SIZE

For each P0 and P1 source, choose detail level:

Will the agent MODIFY this content?      → Full source
Will the agent CALL or REFERENCE it?     → Signatures + docstrings
Need to know it EXISTS and what it does? → One-line summary
Need the SHAPE of the system?            → Structural overview

Common mistakes: full source for files only referenced (waste); only summaries for files being modified (insufficient); uniform detail level (no signal hierarchy).

Output: Each P0/P1 source with size decision.

Step 5 — STRUCTURE

Task-relevant context goes closest to the instruction. Most important info last (near user message) or first (system prompt opening).
Label sections explicitly. Headers describe WHAT and WHY. Not ## Code but ## Source code to review — check style, correctness, security.
Separate instructions from reference. Mixing causes agent to treat reference as instructions or vice versa.
Use consistent formatting. One structure (markdown, XML, delimiters) throughout.
Self-documenting. A reviewer should be able to tell why each section is included from its header.

Output: Structured layout with section headers and ordering.

Step 6 — CUT

List every context element (from STRUCTURE)
State the specific agent behavior each supports
Can't state behavior in one sentence → cut candidate
Two elements support same behavior → keep more token-efficient one
For each cut candidate, verify via EDD ablation

Additive alternative: Start with ONLY P0, add P1 one at a time, measure each. Sidesteps loss aversion.

Shadow context: Don't delete cut items. Move to a shadow set; re-test when production anomalies occur — a cut item may be load-bearing for edge cases evals don't cover.

Output: Final context window + shadow context list.

Pattern Catalog

Concrete prioritization patterns. Each lists what matters more/less than developers expect.

Code Generation

Source	Priority	Surprise factor
Task specification	P0	Expected
Files new code interacts with (full)	P0	Expected
Existing test patterns	P0	Often missed — without these, code follows generic patterns
Naming conventions / style guide	P1	Often missed — models default to training distribution
Type definitions / interfaces	P1	Expected
Architectural constraints	P1	Expected
Unrelated modules	P3	Often included — adds noise
Full git history	P3	Often included — recent diff may be P1, full log is noise

Code Review

Source	Priority	Surprise factor
Diff / changed files (full)	P0	Expected
Style guide / linting rules	P0	Often P1'd — without it, reviews are generic
Test files for changed code	P0	Often missed — can't assess coverage without seeing tests
Architectural constraints	P1	Expected
Related affected files	P1	Expected
PR description / ticket	P1	Often missed — reviewer lacks intent, reviews syntax not semantics
Unrelated modules	P3	Expected
Build / CI config	P3	Often included

Debugging

Source	Priority	Surprise factor
Error output / stack trace	P0	Expected
Source files in trace (full)	P0	Expected
Recent changes (git diff)	P0	Often missed — "what changed?" is the #1 question
Related test files	P1	Expected
Dependency versions / env	P1	Often missed — version mismatches cause subtle bugs
Reproduction steps	P1	Expected
Unrelated source files	P3	Expected
Full git log	P3	Recent diff is P0; full history is noise

Multi-Step Planning / Architecture

Source	Priority	Surprise factor
High-level architecture	P0	Expected
File/directory structure	P0	Often missed — needs system shape, not file contents
Constraints (perf, compat, deadlines)	P0	Often missed — plans without constraints are wish lists
Dependency graph	P1	Expected
Existing patterns	P1	Expected
Full source of any file	P2	Often P0'd — planning needs structure, not implementation
Test files	P2	Not relevant until implementation

Q&A Over Documentation

Source	Priority	Surprise factor
Relevant doc sections	P0	Expected
Glossary / terminology	P0	Often missed — domain jargon causes hallucination
Examples from docs	P1	Expected
Source code	P2	Often P0'd — only relevant if question is about implementation
Full doc corpus	P3	Often included — retrieval should select, not dump

Test Writing

Source	Priority	Surprise factor
Implementation under test (full)	P0	Expected
Existing test files (patterns)	P0	Often missed — without these, tests follow generic framework patterns
Test framework docs / helpers	P1	Expected
Edge cases from specs/tickets	P1	Often missed — happy path only without spec-derived edges
Mock/fixture patterns	P1	Often missed — agent invents new patterns
Unrelated source files	P3	Expected

The Context Manifest

The skill's output artifact. Feeds into EDD for validation.

task: "Review pull requests for Python microservices — style, correctness, security"
token_budget: 8000
date: 2026-03-19

sources:
  - name: "PR diff"
    priority: P0
    size: full
    tokens_est: 2000
    supports: "Agent can see what changed"
  - name: "Style guide"
    priority: P0
    size: full
    tokens_est: 800
    supports: "Agent checks project-specific conventions, not generic ones"
  - name: "Test files for changed code"
    priority: P0
    size: full
    tokens_est: 1500
    supports: "Agent can assess test coverage of changes"
  - name: "PR description / ticket"
    priority: P1
    size: full
    tokens_est: 300
    supports: "Agent reviews intent, not just syntax"
  - name: "Architecture doc"
    priority: P1
    size: summary
    tokens_est: 400
    supports: "Agent catches architectural violations"

shadow:
  - name: "CI config"
    reason_cut: "Did not change review assertions in ablation test"
  - name: "Full git log"
    reason_cut: "Recent diff (in PR diff) was sufficient"

structure:
  order:
    - "System instructions (role + review criteria)"
    - "Style guide"
    - "Architecture summary"
    - "PR description"
    - "Test files"
    - "PR diff (closest to instruction)"
  rationale: "Task-relevant content (diff) placed last, near user message. Reference material earlier for lookup."

The manifest is versionable, diffable, testable (EDD writes assertions against supports claims), auditable.

Integration with EDD and Context-Eval

Phase	Skill	What happens
Design	context-cartography	Produce a context manifest
Validate	EDD	Write assertions from `supports` claims, run evals
Measure	context-eval	Pass rates, benefit-per-kilotoken, deadwood
Iterate	Loop back	Eval failures → update manifest → re-run

If a source claims "Agent checks project-specific conventions" but the eval shows generic conventions, that source isn't earning its tokens — redesign or cut.

Maintenance: When to Resurvey

Task scope changes — agent asked to do things the manifest wasn't designed for
Model upgrade — new model may need less or differently-structured context
Eval scores plateau or regress — current design hit ceiling
Token budget changes — what was cut may now fit

Treat the manifest like code: review, version, test.

Anti-Patterns

Anti-pattern	Symptom	Fix
Context stuffing	Include everything "just in case"	Start P0 only, add P1 one at a time with measurement
Uniform sizing	Every file at full detail	Use SIZE decision tree
Missing legend	No headers or labels	Label everything with WHAT and WHY
Priority inversion	P2 items at full detail while P0 summarized	Highest detail to most important info
Stale manifest	Designed 6 months ago, task evolved	Resurvey
Solo without validation	Designed but never tested	Always follow with EDD
Cargo-culting patterns	Copying catalog patterns without adapting	Patterns are starting points

Related skills

More from andurilcode/craftwork

Installs

Repository

andurilcode/craftwork

GitHub Stars

First Seen

Apr 2, 2026

Security Audits

Gen Agent Trust HubPass

SocketPass

SnykPass