lib-distill
lib-distill — Source Code Library to Claude Code Skill
Transform a source code library into a compact, actionable Claude Code skill.
Invocation
/lib-distill <path-to-library>
The <path-to-library> must be a directory containing source code with a recognizable manifest (package.json, Cargo.toml, setup.py, pyproject.toml, or go.mod).
Pipeline Overview
| Phase | Name | Output | Checkpoint? |
|---|---|---|---|
| 1 | Survey | survey.md — file inventory, size tier, chunk plan |
Yes |
| 2 | Extraction | extractions/<NN>-<slug>.md — structured data per chunk |
No |
| 3 | Synthesis | synthesis/<concern>.md — merged by implementation concern |
Yes |
| 4 | Compression | references/<concern>.md — compact skill reference files |
No |
| 5 | Package | SKILL.md + validation |
Yes |
Phase 1 — Survey
1.1 Parse Input
- Resolve
<path-to-library>to absolute path; abort if not a directory. - Derive
lib-namefrom the directory basename (e.g.,/path/to/signify-ts→signify-ts). - Create staging directory:
scripts/staging/distill-<lib-name>/with subdirsextractions/andsynthesis/.
1.2 Detect Language
Probe for manifest files in priority order:
| Manifest | Language | Source glob |
|---|---|---|
Cargo.toml |
Rust | **/*.rs |
package.json |
TypeScript/JavaScript | **/*.ts, **/*.js (exclude .d.ts) |
setup.py or pyproject.toml |
Python | **/*.py |
go.mod |
Go | **/*.go |
If multiple manifests exist, ask the user which language to target.
1.3 Inventory Source Files
- Glob all source files matching the language.
- Exclude patterns:
test*,*_test.*,*spec*,__pycache__,node_modules,target/,dist/,build/,vendor/,generated/,*.min.*, migration files. - Record each file: relative path, line count.
- Sort by line count descending.
1.4 Classify Size Tier
| Tier | Total Lines | Chunking Strategy | Reference Files |
|---|---|---|---|
| Small | <2,000 | Single pass (1-2 chunks) | 2-3 |
| Medium | 2,000-10,000 | Per-module chunks | 3-4 |
| Large | >10,000 | Per-file for large files, grouped for small | 4-6 |
1.5 Identify Public API Surface
Language-specific grep for exports:
| Language | Export markers |
|---|---|
| Python | __all__, classes/functions without _ prefix at module level |
| Rust | pub fn, pub struct, pub enum, pub trait, pub type, pub mod |
| TypeScript | export, export default, export type, export interface |
| Go | Capitalized names at package level |
Record each public API item: name, kind (class/function/type/const), source file, line number.
1.6 Classify Files by Content Type
Assign each source file one primary category:
| Category | Heuristic |
|---|---|
api-surface |
High density of exports, public classes/functions |
data-models |
Structs, interfaces, type definitions, enums, codex tables |
patterns |
Multi-step workflows, builder patterns, orchestration |
error-handling |
Error types, exception classes, result types |
configuration |
Config structs, defaults, env vars, constants |
skip |
Internal utilities, generated code, trivial glue |
1.7 Group into Chunks
- Target chunk size: 500-1500 lines (wider than spec-distill because code is more structured).
- Align chunks to module/file boundaries — never split a file across chunks.
- Each chunk gets: an ID (
01,02, ...), a slug (module or concern name), file list, total lines, primary content type.
1.8 Propose Reference File Groupings
Based on size tier, propose which extraction categories map to which output reference files:
- Small:
api.md,patterns.md(maybetypes.md) - Medium:
api.md,types.md,patterns.md,errors.md - Large:
api.md,types.md,patterns.md,errors.md,config.md, possibly splitapi.mdby module
1.9 Write Survey & Checkpoint
Write scripts/staging/distill-<lib-name>/survey.md containing:
- Library name, language, path, total files, total lines, size tier
- File inventory table (path, lines, content type)
- Public API summary (count by kind)
- Chunk plan table (ID, slug, files, lines, content type)
- Proposed reference file groupings
INTERACTIVE CHECKPOINT:
Phase 1 complete. Survey written to scripts/staging/distill-<lib-name>/survey.md
Summary:
- Library: <lib-name> (<language>)
- Size: <total-lines> lines across <file-count> files → <tier> tier
- Chunks: <chunk-count> chunks planned
- Reference files: <proposed list>
Review the survey and let me know:
1. Approve and continue to extraction
2. Adjust chunk groupings or file classifications
3. Exclude specific files or directories
Phase 2 — Extraction
2.1 Launch Parallel Extraction
For each chunk, launch a Task agent (batch 4-6 at a time) with:
Agent prompt:
You are extracting structured data from source code for a library distillation.
Library: <lib-name> (<language>)
Chunk: <chunk-id> — <slug>
Content type: <primary-category>
Files: <file-list>
Read each file listed. Extract into the 6 categories defined in the extraction template.
Write output to: scripts/staging/distill-<lib-name>/extractions/<chunk-id>-<slug>.md
Use the extraction template at:
.claude/skills/lib-distill/references/extraction-template.md
Each agent reads its assigned files and writes structured extraction using the template.
2.2 Verify Outputs
After all agents complete:
- Check that every expected
extractions/<chunk-id>-<slug>.mdexists. - Verify each file is non-empty and has at least 2 of the 6 category headers.
- Re-run any failed chunks (max 1 retry per chunk).
No interactive checkpoint — proceed directly to Phase 3.
Phase 3 — Synthesis
3.1 Merge Extractions
Read all extraction files. Merge content by implementation concern:
| Concern | Fed by categories |
|---|---|
| API Reference | Class & Type Signatures, Exports & Constants |
| Data Models | Class & Type Signatures (struct/enum focus) |
| Usage Patterns | Usage Patterns, Initialization & Lifecycle |
| Error Handling | Error Handling |
| Configuration | Configuration & Defaults |
3.2 Deduplicate and Cross-Reference
- Remove duplicate entries (same class/function extracted from re-exports).
- Resolve
[DEP: <module>]markers — link dependent types to their definitions. - Build inheritance/implementation trees where applicable.
- Flag
[UNCLEAR: ...]for ambiguous patterns or undocumented behavior.
3.3 Write Synthesis Files
Write scripts/staging/distill-<lib-name>/synthesis/<concern>.md for each concern.
3.4 Checkpoint
INTERACTIVE CHECKPOINT:
Phase 3 complete. Synthesis files written.
| Concern | Items | Cross-refs resolved | Unclear markers |
|---------|-------|--------------------|-----------------|
| <concern> | <count> | <count> | <count> |
...
Review synthesis files in scripts/staging/distill-<lib-name>/synthesis/
Let me know:
1. Approve and continue to compression
2. Flag items to investigate further
3. Resolve [UNCLEAR:] markers
Phase 4 — Compression
4.1 Apply Compression Formats
For each synthesis file, apply the appropriate compression format from references/compression-formats.md:
| Concern | Primary format | Secondary format |
|---|---|---|
| API Reference | Compact Signature Tables | Field Tables |
| Data Models | Field Tables, Code Templates | Import Maps |
| Usage Patterns | Checklists, Code Templates | Anti-Pattern Lists |
| Error Handling | Field Tables | Decision Trees |
| Configuration | Field Tables | Checklists |
4.2 Compression Rules
Always drop:
- Obvious docstrings and inline comments
- Private/internal methods and fields
- Import boilerplate and re-exports
- Test utilities and test helpers
- Trivial getters/setters
Always keep:
- Every public class + constructor signature
- Every exported function signature
- Every enum variant / codex entry
- Every error type + when it's thrown
- Every config default value
- Lifecycle methods (init, close, dispose)
Target: ~1 line of output per 5-10 lines of source code.
4.3 Write Reference Files
Write compressed output directly to .claude/skills/<lib-name>/references/<concern>.md.
- Target 5-15KB per file.
- If a file exceeds 15KB, split by sub-concern (e.g.,
api-core.md,api-network.md).
No interactive checkpoint — proceed to Phase 5.
Phase 5 — Package
5.1 Generate SKILL.md
Create .claude/skills/<lib-name>/SKILL.md with this structure:
---
name: <lib-name>
description: >
<1-2 sentence description of the library and when this skill activates>
---
# <lib-name> — <tagline>
## Overview
<2-3 paragraphs: what the library does, where it fits in the ecosystem,
key design philosophy>
## Quick Reference
<Table of the 10-20 most-used classes/functions with one-line descriptions>
## Import Guide
<How to import/use the library in each supported context>
## Reference Files
| File | Contents | Size |
|------|----------|------|
| references/api.md | ... | ...KB |
| ... | ... | ... |
## Usage Patterns
<3-5 compact code snippets showing common workflows>
## Anti-Patterns
<3-5 common mistakes with corrections>
Target: 4-8KB for SKILL.md itself.
5.2 Validate
Run these checks:
- SKILL.md has valid YAML frontmatter with
nameanddescription - Total skill size is 25-40KB
- No single reference file exceeds 15KB
- All files referenced in the index actually exist
- Public API coverage: at least 80% of Phase 1 public API items appear in output
- No raw source code blocks longer than 20 lines (compress further if found)
5.3 Final Checkpoint
INTERACTIVE CHECKPOINT:
Phase 5 complete. Skill package ready.
.claude/skills/<lib-name>/
├── SKILL.md (<size>KB)
└── references/
├── <file>.md (<size>KB)
...
Total: <total>KB
Validation:
- [x/✗] YAML frontmatter valid
- [x/✗] Total size in 25-40KB range
- [x/✗] No file >15KB
- [x/✗] All references exist
- [x/✗] API coverage ≥80%
- [x/✗] No oversized code blocks
Let me know:
1. Approve — clean staging and finalize
2. Adjust — specify what to change
On approval, delete scripts/staging/distill-<lib-name>/.
Error Recovery
- Phase 2 agent failure: Retry once, then extract the failing chunk inline (without Task agent).
- Oversized output: Re-compress with stricter heuristics (drop method bodies, keep only signatures).
- Undersized output (<25KB): Include more usage examples, expand anti-patterns, add edge case notes.
- Missing public APIs: Re-read source files for the missing items and append to relevant reference file.