context-cartography
Context Cartography
Design what goes into an agent's context window — and what stays out.
The Boundary Rule
Designing or redesigning a context window? → Use this skill.
Context window is full and you need to cut? → Start at the TRIAGE entry point.
Need to validate whether your design works? → Use EDD after this skill.
One-off prompt or exploratory prototyping? → Don't use this skill.
Two Entry Points
Entry A — Full Design (greenfield/major redesign): All six steps: TASK → SURVEY → PRIORITIZE → SIZE → STRUCTURE → CUT.
Entry B — Triage (context too big): Start at CUT. If cuts reveal the wrong information is prioritized, work backwards through PRIORITIZE and SIZE.
The Flow
Step 1 — TASK
What will the agent do with this context? Be specific. Not "code assistance" but "review pull requests for a Python microservices codebase, checking style, correctness, and security."
Write it as a single sentence. If you need two, you may need two context configurations.
Output: One-sentence task definition.
Step 2 — SURVEY
Enumerate all candidate context sources. Don't filter yet. Common sources:
- System instructions / role definition
- Tool definitions and schemas
- Code files (source, tests, configs)
- Documentation (internal, API docs, specs)
- Style guides / conventions
- Architecture descriptions
- Git history / recent changes
- Conversation history
- Retrieved documents (RAG)
- Examples (few-shot)
- Error logs / stack traces
- External references (URLs, schemas)
Output: Complete list with estimated token cost for each.
Step 3 — PRIORITIZE
| Priority | Meaning | Rule |
|---|---|---|
| P0 — Essential | Agent cannot do the task without this | Always include |
| P1 — Important | Significantly improves quality | Include if budget allows |
| P2 — Useful | Helps with edge cases or polish | Include only if space remains |
| P3 — Irrelevant | Does not affect this task | Never include |
The non-obvious call: P1 items that seem essential but aren't (false P0), and P2 items that turn out to be load-bearing (hidden P0). The pattern catalog flags these.
Output: Prioritized list with P0/P1/P2/P3 ratings.
Step 4 — SIZE
For each P0 and P1 source, choose detail level:
Will the agent MODIFY this content? → Full source
Will the agent CALL or REFERENCE it? → Signatures + docstrings
Need to know it EXISTS and what it does? → One-line summary
Need the SHAPE of the system? → Structural overview
Common mistakes: full source for files only referenced (waste); only summaries for files being modified (insufficient); uniform detail level (no signal hierarchy).
Output: Each P0/P1 source with size decision.
Step 5 — STRUCTURE
- Task-relevant context goes closest to the instruction. Most important info last (near user message) or first (system prompt opening).
- Label sections explicitly. Headers describe WHAT and WHY. Not
## Codebut## Source code to review — check style, correctness, security. - Separate instructions from reference. Mixing causes agent to treat reference as instructions or vice versa.
- Use consistent formatting. One structure (markdown, XML, delimiters) throughout.
- Self-documenting. A reviewer should be able to tell why each section is included from its header.
Output: Structured layout with section headers and ordering.
Step 6 — CUT
- List every context element (from STRUCTURE)
- State the specific agent behavior each supports
- Can't state behavior in one sentence → cut candidate
- Two elements support same behavior → keep more token-efficient one
- For each cut candidate, verify via EDD ablation
Additive alternative: Start with ONLY P0, add P1 one at a time, measure each. Sidesteps loss aversion.
Shadow context: Don't delete cut items. Move to a shadow set; re-test when production anomalies occur — a cut item may be load-bearing for edge cases evals don't cover.
Output: Final context window + shadow context list.
Pattern Catalog
Concrete prioritization patterns. Each lists what matters more/less than developers expect.
Code Generation
| Source | Priority | Surprise factor |
|---|---|---|
| Task specification | P0 | Expected |
| Files new code interacts with (full) | P0 | Expected |
| Existing test patterns | P0 | Often missed — without these, code follows generic patterns |
| Naming conventions / style guide | P1 | Often missed — models default to training distribution |
| Type definitions / interfaces | P1 | Expected |
| Architectural constraints | P1 | Expected |
| Unrelated modules | P3 | Often included — adds noise |
| Full git history | P3 | Often included — recent diff may be P1, full log is noise |
Code Review
| Source | Priority | Surprise factor |
|---|---|---|
| Diff / changed files (full) | P0 | Expected |
| Style guide / linting rules | P0 | Often P1'd — without it, reviews are generic |
| Test files for changed code | P0 | Often missed — can't assess coverage without seeing tests |
| Architectural constraints | P1 | Expected |
| Related affected files | P1 | Expected |
| PR description / ticket | P1 | Often missed — reviewer lacks intent, reviews syntax not semantics |
| Unrelated modules | P3 | Expected |
| Build / CI config | P3 | Often included |
Debugging
| Source | Priority | Surprise factor |
|---|---|---|
| Error output / stack trace | P0 | Expected |
| Source files in trace (full) | P0 | Expected |
| Recent changes (git diff) | P0 | Often missed — "what changed?" is the #1 question |
| Related test files | P1 | Expected |
| Dependency versions / env | P1 | Often missed — version mismatches cause subtle bugs |
| Reproduction steps | P1 | Expected |
| Unrelated source files | P3 | Expected |
| Full git log | P3 | Recent diff is P0; full history is noise |
Multi-Step Planning / Architecture
| Source | Priority | Surprise factor |
|---|---|---|
| High-level architecture | P0 | Expected |
| File/directory structure | P0 | Often missed — needs system shape, not file contents |
| Constraints (perf, compat, deadlines) | P0 | Often missed — plans without constraints are wish lists |
| Dependency graph | P1 | Expected |
| Existing patterns | P1 | Expected |
| Full source of any file | P2 | Often P0'd — planning needs structure, not implementation |
| Test files | P2 | Not relevant until implementation |
Q&A Over Documentation
| Source | Priority | Surprise factor |
|---|---|---|
| Relevant doc sections | P0 | Expected |
| Glossary / terminology | P0 | Often missed — domain jargon causes hallucination |
| Examples from docs | P1 | Expected |
| Source code | P2 | Often P0'd — only relevant if question is about implementation |
| Full doc corpus | P3 | Often included — retrieval should select, not dump |
Test Writing
| Source | Priority | Surprise factor |
|---|---|---|
| Implementation under test (full) | P0 | Expected |
| Existing test files (patterns) | P0 | Often missed — without these, tests follow generic framework patterns |
| Test framework docs / helpers | P1 | Expected |
| Edge cases from specs/tickets | P1 | Often missed — happy path only without spec-derived edges |
| Mock/fixture patterns | P1 | Often missed — agent invents new patterns |
| Unrelated source files | P3 | Expected |
The Context Manifest
The skill's output artifact. Feeds into EDD for validation.
task: "Review pull requests for Python microservices — style, correctness, security"
token_budget: 8000
date: 2026-03-19
sources:
- name: "PR diff"
priority: P0
size: full
tokens_est: 2000
supports: "Agent can see what changed"
- name: "Style guide"
priority: P0
size: full
tokens_est: 800
supports: "Agent checks project-specific conventions, not generic ones"
- name: "Test files for changed code"
priority: P0
size: full
tokens_est: 1500
supports: "Agent can assess test coverage of changes"
- name: "PR description / ticket"
priority: P1
size: full
tokens_est: 300
supports: "Agent reviews intent, not just syntax"
- name: "Architecture doc"
priority: P1
size: summary
tokens_est: 400
supports: "Agent catches architectural violations"
shadow:
- name: "CI config"
reason_cut: "Did not change review assertions in ablation test"
- name: "Full git log"
reason_cut: "Recent diff (in PR diff) was sufficient"
structure:
order:
- "System instructions (role + review criteria)"
- "Style guide"
- "Architecture summary"
- "PR description"
- "Test files"
- "PR diff (closest to instruction)"
rationale: "Task-relevant content (diff) placed last, near user message. Reference material earlier for lookup."
The manifest is versionable, diffable, testable (EDD writes assertions against supports claims), auditable.
Integration with EDD and Context-Eval
| Phase | Skill | What happens |
|---|---|---|
| Design | context-cartography | Produce a context manifest |
| Validate | EDD | Write assertions from supports claims, run evals |
| Measure | context-eval | Pass rates, benefit-per-kilotoken, deadwood |
| Iterate | Loop back | Eval failures → update manifest → re-run |
If a source claims "Agent checks project-specific conventions" but the eval shows generic conventions, that source isn't earning its tokens — redesign or cut.
Maintenance: When to Resurvey
- Task scope changes — agent asked to do things the manifest wasn't designed for
- Model upgrade — new model may need less or differently-structured context
- Eval scores plateau or regress — current design hit ceiling
- Token budget changes — what was cut may now fit
Treat the manifest like code: review, version, test.
Anti-Patterns
| Anti-pattern | Symptom | Fix |
|---|---|---|
| Context stuffing | Include everything "just in case" | Start P0 only, add P1 one at a time with measurement |
| Uniform sizing | Every file at full detail | Use SIZE decision tree |
| Missing legend | No headers or labels | Label everything with WHAT and WHY |
| Priority inversion | P2 items at full detail while P0 summarized | Highest detail to most important info |
| Stale manifest | Designed 6 months ago, task evolved | Resurvey |
| Solo without validation | Designed but never tested | Always follow with EDD |
| Cargo-culting patterns | Copying catalog patterns without adapting | Patterns are starting points |
More from andurilcode/craftwork
deep-document-processor
>
4summarizer
Apply this skill whenever the user asks to summarize, condense, distill, or compress any content — a document, article, meeting notes, conversation, codebase, book, research paper, video transcript, or any other source material. Triggers on phrases like 'summarize this', 'give me the TL;DR', 'condense this', 'what are the key points?', 'distill this down', 'brief me on this', 'what's the gist?', 'BLUF this', 'executive summary', 'compress this for me', or any request to reduce content while preserving its essential value. Also trigger when the user pastes a long text and implicitly wants it shortened, when they share a link and ask 'what does this say?', or when they ask for meeting notes or action items from a transcript. This skill does NOT apply to 'explain X to me' (use topic-explainer) or 'write a summary section for my doc' (use technical-writing). This skill is for when source material exists and needs to be compressed.
3inversion-premortem
Apply inversion and pre-mortem thinking whenever the user asks to evaluate a plan, strategy, architecture, feature, or decision before execution — or when they want to stress-test something that already exists. Triggers on phrases like "is this a good idea?", "what could go wrong?", "review this plan", "should we do this?", "are we missing anything?", "stress-test this", "what are the risks?", or any request to validate a decision or design. Use this skill proactively — if the user is about to commit to something, this skill should be consulted even if they don't ask for it explicitly.
3llms-txt-generator
Generate llms.txt-style context documents — token-budgeted, section-per-concept Markdown optimized for LLM and RAG consumption. Use this skill whenever someone asks to generate an llms.txt, create LLM-friendly documentation, produce a context document for a library or codebase, build a RAG-ready reference, make docs 'agent-readable', create a developer quick-reference, or says anything like 'generate context for X', 'make an llms.txt for this repo', 'create a reference doc for NotebookLM', 'turn these docs into something an LLM can use', 'context document', 'developer cheatsheet from docs'. Also trigger when someone provides a GitHub repo URL and asks for documentation synthesis, or when working inside a codebase and asked to produce a self-contained reference of how it works. This is the context engineer's doc generation tool — it turns sprawling documentation into precise, structured, token-efficient context.
3context-compressor
>
3probabilistic-thinking
Apply probabilistic and Bayesian thinking whenever the user needs to reason under uncertainty, compare risks, prioritize between options, update beliefs based on new evidence, or make decisions without complete information. Triggers on phrases like "what are the odds?", "how likely is this?", "should I be worried about X?", "which risk is bigger?", "does this data change anything?", "is this a signal or noise?", "what's the probability?", "how confident are we?", or any situation where decisions are being made based on incomplete or ambiguous evidence. Also trigger when someone is treating uncertain outcomes as certainties, or when probability language is being used loosely ("probably", "unlikely", "very likely") without quantification. Don't leave uncertainty unexamined.
3