lib-distill

SKILL.md

lib-distill — Source Code Library to Claude Code Skill

Transform a source code library into a compact, actionable Claude Code skill.

Invocation

/lib-distill <path-to-library>

The <path-to-library> must be a directory containing source code with a recognizable manifest (package.json, Cargo.toml, setup.py, pyproject.toml, or go.mod).

Pipeline Overview

Phase Name Output Checkpoint?
1 Survey survey.md — file inventory, size tier, chunk plan Yes
2 Extraction extractions/<NN>-<slug>.md — structured data per chunk No
3 Synthesis synthesis/<concern>.md — merged by implementation concern Yes
4 Compression references/<concern>.md — compact skill reference files No
5 Package SKILL.md + validation Yes

Phase 1 — Survey

1.1 Parse Input

  • Resolve <path-to-library> to absolute path; abort if not a directory.
  • Derive lib-name from the directory basename (e.g., /path/to/signify-tssignify-ts).
  • Create staging directory: scripts/staging/distill-<lib-name>/ with subdirs extractions/ and synthesis/.

1.2 Detect Language

Probe for manifest files in priority order:

Manifest Language Source glob
Cargo.toml Rust **/*.rs
package.json TypeScript/JavaScript **/*.ts, **/*.js (exclude .d.ts)
setup.py or pyproject.toml Python **/*.py
go.mod Go **/*.go

If multiple manifests exist, ask the user which language to target.

1.3 Inventory Source Files

  • Glob all source files matching the language.
  • Exclude patterns: test*, *_test.*, *spec*, __pycache__, node_modules, target/, dist/, build/, vendor/, generated/, *.min.*, migration files.
  • Record each file: relative path, line count.
  • Sort by line count descending.

1.4 Classify Size Tier

Tier Total Lines Chunking Strategy Reference Files
Small <2,000 Single pass (1-2 chunks) 2-3
Medium 2,000-10,000 Per-module chunks 3-4
Large >10,000 Per-file for large files, grouped for small 4-6

1.5 Identify Public API Surface

Language-specific grep for exports:

Language Export markers
Python __all__, classes/functions without _ prefix at module level
Rust pub fn, pub struct, pub enum, pub trait, pub type, pub mod
TypeScript export, export default, export type, export interface
Go Capitalized names at package level

Record each public API item: name, kind (class/function/type/const), source file, line number.

1.6 Classify Files by Content Type

Assign each source file one primary category:

Category Heuristic
api-surface High density of exports, public classes/functions
data-models Structs, interfaces, type definitions, enums, codex tables
patterns Multi-step workflows, builder patterns, orchestration
error-handling Error types, exception classes, result types
configuration Config structs, defaults, env vars, constants
skip Internal utilities, generated code, trivial glue

1.7 Group into Chunks

  • Target chunk size: 500-1500 lines (wider than spec-distill because code is more structured).
  • Align chunks to module/file boundaries — never split a file across chunks.
  • Each chunk gets: an ID (01, 02, ...), a slug (module or concern name), file list, total lines, primary content type.

1.8 Propose Reference File Groupings

Based on size tier, propose which extraction categories map to which output reference files:

  • Small: api.md, patterns.md (maybe types.md)
  • Medium: api.md, types.md, patterns.md, errors.md
  • Large: api.md, types.md, patterns.md, errors.md, config.md, possibly split api.md by module

1.9 Write Survey & Checkpoint

Write scripts/staging/distill-<lib-name>/survey.md containing:

  • Library name, language, path, total files, total lines, size tier
  • File inventory table (path, lines, content type)
  • Public API summary (count by kind)
  • Chunk plan table (ID, slug, files, lines, content type)
  • Proposed reference file groupings

INTERACTIVE CHECKPOINT:

Phase 1 complete. Survey written to scripts/staging/distill-<lib-name>/survey.md

Summary:
- Library: <lib-name> (<language>)
- Size: <total-lines> lines across <file-count> files → <tier> tier
- Chunks: <chunk-count> chunks planned
- Reference files: <proposed list>

Review the survey and let me know:
1. Approve and continue to extraction
2. Adjust chunk groupings or file classifications
3. Exclude specific files or directories

Phase 2 — Extraction

2.1 Launch Parallel Extraction

For each chunk, launch a Task agent (batch 4-6 at a time) with:

Agent prompt:

You are extracting structured data from source code for a library distillation.

Library: <lib-name> (<language>)
Chunk: <chunk-id> — <slug>
Content type: <primary-category>
Files: <file-list>

Read each file listed. Extract into the 6 categories defined in the extraction template.
Write output to: scripts/staging/distill-<lib-name>/extractions/<chunk-id>-<slug>.md

Use the extraction template at:
.claude/skills/lib-distill/references/extraction-template.md

Each agent reads its assigned files and writes structured extraction using the template.

2.2 Verify Outputs

After all agents complete:

  • Check that every expected extractions/<chunk-id>-<slug>.md exists.
  • Verify each file is non-empty and has at least 2 of the 6 category headers.
  • Re-run any failed chunks (max 1 retry per chunk).

No interactive checkpoint — proceed directly to Phase 3.


Phase 3 — Synthesis

3.1 Merge Extractions

Read all extraction files. Merge content by implementation concern:

Concern Fed by categories
API Reference Class & Type Signatures, Exports & Constants
Data Models Class & Type Signatures (struct/enum focus)
Usage Patterns Usage Patterns, Initialization & Lifecycle
Error Handling Error Handling
Configuration Configuration & Defaults

3.2 Deduplicate and Cross-Reference

  • Remove duplicate entries (same class/function extracted from re-exports).
  • Resolve [DEP: <module>] markers — link dependent types to their definitions.
  • Build inheritance/implementation trees where applicable.
  • Flag [UNCLEAR: ...] for ambiguous patterns or undocumented behavior.

3.3 Write Synthesis Files

Write scripts/staging/distill-<lib-name>/synthesis/<concern>.md for each concern.

3.4 Checkpoint

INTERACTIVE CHECKPOINT:

Phase 3 complete. Synthesis files written.

| Concern | Items | Cross-refs resolved | Unclear markers |
|---------|-------|--------------------|-----------------|
| <concern> | <count> | <count> | <count> |
...

Review synthesis files in scripts/staging/distill-<lib-name>/synthesis/
Let me know:
1. Approve and continue to compression
2. Flag items to investigate further
3. Resolve [UNCLEAR:] markers

Phase 4 — Compression

4.1 Apply Compression Formats

For each synthesis file, apply the appropriate compression format from references/compression-formats.md:

Concern Primary format Secondary format
API Reference Compact Signature Tables Field Tables
Data Models Field Tables, Code Templates Import Maps
Usage Patterns Checklists, Code Templates Anti-Pattern Lists
Error Handling Field Tables Decision Trees
Configuration Field Tables Checklists

4.2 Compression Rules

Always drop:

  • Obvious docstrings and inline comments
  • Private/internal methods and fields
  • Import boilerplate and re-exports
  • Test utilities and test helpers
  • Trivial getters/setters

Always keep:

  • Every public class + constructor signature
  • Every exported function signature
  • Every enum variant / codex entry
  • Every error type + when it's thrown
  • Every config default value
  • Lifecycle methods (init, close, dispose)

Target: ~1 line of output per 5-10 lines of source code.

4.3 Write Reference Files

Write compressed output directly to .claude/skills/<lib-name>/references/<concern>.md.

  • Target 5-15KB per file.
  • If a file exceeds 15KB, split by sub-concern (e.g., api-core.md, api-network.md).

No interactive checkpoint — proceed to Phase 5.


Phase 5 — Package

5.1 Generate SKILL.md

Create .claude/skills/<lib-name>/SKILL.md with this structure:

---
name: <lib-name>
description: >
  <1-2 sentence description of the library and when this skill activates>
---

# <lib-name> — <tagline>

## Overview
<2-3 paragraphs: what the library does, where it fits in the ecosystem,
key design philosophy>

## Quick Reference
<Table of the 10-20 most-used classes/functions with one-line descriptions>

## Import Guide
<How to import/use the library in each supported context>

## Reference Files
| File | Contents | Size |
|------|----------|------|
| references/api.md | ... | ...KB |
| ... | ... | ... |

## Usage Patterns
<3-5 compact code snippets showing common workflows>

## Anti-Patterns
<3-5 common mistakes with corrections>

Target: 4-8KB for SKILL.md itself.

5.2 Validate

Run these checks:

  • SKILL.md has valid YAML frontmatter with name and description
  • Total skill size is 25-40KB
  • No single reference file exceeds 15KB
  • All files referenced in the index actually exist
  • Public API coverage: at least 80% of Phase 1 public API items appear in output
  • No raw source code blocks longer than 20 lines (compress further if found)

5.3 Final Checkpoint

INTERACTIVE CHECKPOINT:

Phase 5 complete. Skill package ready.

.claude/skills/<lib-name>/
├── SKILL.md (<size>KB)
└── references/
    ├── <file>.md (<size>KB)
    ...
Total: <total>KB

Validation:
- [x/✗] YAML frontmatter valid
- [x/✗] Total size in 25-40KB range
- [x/✗] No file >15KB
- [x/✗] All references exist
- [x/✗] API coverage ≥80%
- [x/✗] No oversized code blocks

Let me know:
1. Approve — clean staging and finalize
2. Adjust — specify what to change

On approval, delete scripts/staging/distill-<lib-name>/.


Error Recovery

  • Phase 2 agent failure: Retry once, then extract the failing chunk inline (without Task agent).
  • Oversized output: Re-compress with stricter heuristics (drop method bodies, keep only signatures).
  • Undersized output (<25KB): Include more usage examples, expand anti-patterns, add edge case notes.
  • Missing public APIs: Re-read source files for the missing items and append to relevant reference file.
Weekly Installs
2
GitHub Stars
1
First Seen
Feb 26, 2026
Installed on
opencode2
claude-code2
github-copilot2
codex2
kimi-cli2
gemini-cli2