distill
Distill
A skill for iteratively compressing complex systems down to their essential primitives — the minimal set of abstractions that captures the full behavioral essence while discarding accidental complexity.
Use distill when the user needs a cleaner mental model of something complicated.
If the real need is to generate non-obvious hypotheses, cross-domain analogies, or
mechanism transfers, prefer lateral-thinking instead.
Think of it like Andrej Karpathy reducing an automated research system to three files
(train.py, prepare.py, program.md). The goal is not summarization — it's
re-expression in minimal form.
Core Concepts
Primitives: The irreducible building blocks that can't be decomposed further without losing essential behavior. A good primitive set is:
- Complete — you can reconstruct the full system's behavior from just these pieces
- Orthogonal — each primitive captures something the others don't
- Minimal — removing any one primitive loses something essential
Distillation vs. Summarization: Summarization preserves information at lower fidelity. Distillation re-expresses the essence in a new, cleaner form. The output of distillation is often more useful than the original because it strips away accidental complexity.
Accidental vs. Essential Complexity (per Fred Brooks): Essential complexity is inherent to the problem. Accidental complexity comes from the implementation. Distillation separates them.
Workflow
The distillation process has two phases: Orient and Compress. These iterate until the user is satisfied with the decomposition.
Phase 1: Orient
Before proposing any decomposition, understand what the user cares about. This determines what counts as "essential."
Ask clarifying questions like:
- What are you trying to do with this understanding? (Build on top of it? Rewrite it? Teach it? Make decisions about it?)
- What layer are you most interested in? (Business logic? Data flow? API surface? Conceptual model?)
- Is there a part you already understand well vs. a part that's opaque?
- What's your current mental model, even if rough?
Keep this phase lightweight — 2-4 questions max. The goal is to calibrate, not to conduct a full interview. If the user's intent is obvious from context, skip straight to Phase 2.
Phase 2: Compress
This is the iterative core. Each turn follows this pattern:
Step 1: Propose Primitives
Present a candidate decomposition. Format depends on context (see Output Formats below), but always include:
- The primitive set — named, with a one-sentence description of each
- Proposed granularity — how many primitives, and why this number
- What was discarded — what you treated as accidental complexity and removed
- Confidence flags — where you're least sure about the decomposition
Example of a good proposal:
## Proposed Primitives (4)
1. **Ingestion** — accepts raw input (PDF, URL, repo path) and normalizes to
a common internal representation
2. **Chunking** — splits normalized input into semantically meaningful units
3. **Extraction** — pulls structured claims/facts/abstractions from each chunk
4. **Synthesis** — combines extracted pieces into the final compressed output
Discarded as accidental: file format handling, caching, logging, CLI argument parsing
Uncertain about: whether Chunking and Extraction are truly separate primitives
or two aspects of the same operation. Interested in your read.
Step 2: Invite Pushback
Explicitly ask the user to challenge the decomposition:
- "Does this match your intuition? What feels wrong?"
- "Are any of these actually the same primitive in disguise?"
- "Am I missing a primitive, or is one of these not truly essential?"
Step 3: Refine
Based on feedback, propose a revised decomposition. Show what changed and why. Repeat until the user says it feels right.
Convergence Signals
You're done when:
- The user confirms the primitive set matches their intuition
- Each primitive feels irreducible — you can't merge or remove any
- The user can explain the system to someone using only these primitives
- (For code) you could sketch a minimal implementation from just the primitive set
Output Formats
Choose the output format based on what the user needs. When in doubt, ask.
Conceptual Map (default for non-code inputs)
A structured document listing the primitives, their relationships, and how they compose to produce the full system's behavior.
# [System Name] — Distilled
## Primitives
1. **Name** — description
2. **Name** — description
## Relationships
- Primitive A feeds into Primitive B via [mechanism]
- Primitives C and D are independent but both required for [outcome]
## Reconstruction
Given these primitives, here's how the full system works: [narrative]
## What Was Discarded
- [thing] — accidental complexity because [reason]
Minimal Implementation (default for codebases)
A set of files (like Karpathy's 3 files) that capture the essential behavior. These should be:
- Actually runnable (or close to it)
- Named to reflect the primitives they embody
- Stripped of all accidental complexity
- Commented to explain what each piece maps to in the original
Behavioral Spec (when user wants a SKILL.md or similar)
A specification that captures what the system does without prescribing how. Useful when the distillation will be used to guide an agent or a rewrite.
Hybrid
For complex systems, combine formats: a conceptual map plus a minimal implementation, or a behavioral spec with a reference implementation.
Input-Specific Strategies
Codebases
- Start with the entry points — what gets called first?
- Trace the critical path for the most common operation
- Identify data structures that everything revolves around
- Look for the "God objects" — they often contain multiple primitives fused together
- Separate domain logic from infrastructure (HTTP, DB, auth, logging)
For multi-repo / polyglot codebases: look for the conceptual primitives that cross language boundaries, not the file-level structure.
Research Papers / Technical Documents
- What's the core claim or contribution?
- What's the minimal setup needed to understand that claim?
- What's the method, stripped of notation and formalism?
- What prior work is essential context vs. just literature review?
Transcripts / Conversations
- What decisions were made?
- What were the real alternatives considered (not just mentioned)?
- What constraints shaped the decisions?
- What's the underlying model/framework the participants are reasoning from?
Blog Posts / Articles
- What's the one idea that, if you understood it, you'd understand the whole piece?
- What evidence actually supports it vs. is just color?
- Is there an implicit framework the author is using?
Anti-Patterns
Avoid these common failure modes:
- Summarizing instead of distilling — if your output is just a shorter version of the input with the same structure, you're summarizing, not distilling
- Too many primitives — if you have more than 7, you probably haven't compressed enough. The sweet spot is usually 3-5.
- Confusing implementation with essence — "uses PostgreSQL" is implementation; "needs durable ordered storage" is essence
- Premature convergence — don't lock in after one pass. The first decomposition is usually wrong in interesting ways.
- Symmetry bias — don't force primitives to be at the same level of abstraction. Sometimes one primitive is genuinely bigger than the others.
Session Management
Distillation often spans multiple turns. Keep a running state:
- Current primitive set (numbered, for easy reference)
- Open questions
- What's been discarded and why
- Iteration count
If the session gets long, offer to write the current state to a file so the user can resume later or hand it to another agent session.
Update Check
On first use in a session, silently check for a newer version:
- Fetch
https://raw.githubusercontent.com/abpai/skills/main/versions.json. - Compare the version for
distillagainst this file'smetadata.version. - If the remote version is newer, pause before the main task and ask:
distill update available (local {X.Y} → remote {A.B}). Would you like me to update it for you first? I can run
npx skills update distillfor you. - If the user says yes, run the update before continuing.
- If the user says no, continue with the current local version.
- If the fetch fails or web access is unavailable, skip silently.
More from abpai/skills
human-writer
>-
28agent-browser
Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", or any task requiring programmatic web interaction.
28code-simplifier
Simplify and refine code for clarity, consistency, and maintainability while preserving all functionality. Focuses on recently modified code unless instructed otherwise.
25bun-expert
>
24dead-code-eliminator
>
23socratic-code-owner
>
23