taxonomy-builder

SKILL.md

Taxonomy Builder (router, compatibility mode)

Build outline/taxonomy.yml from papers/core_set.csv.

P0 compatibility note:

  • The output contract stays the same (outline/taxonomy.yml, YAML list, >=2 levels, concrete descriptions).
  • Curated domain taxonomies now live in assets/domain_packs/*.yaml instead of Python prose.
  • scripts/run.py stays a deterministic scaffold/helper: detect domain pack -> load pack when available -> otherwise fall back to the generic builder.

Load Order

  1. references/overview.md
  2. references/taxonomy_principles.md
  3. If a domain pack applies, read its references/domain_pack_<domain>.md and assets/domain_packs/<domain>.yaml
  4. Otherwise read references/archetypes_generic.md
  5. Calibrate naming/description quality with references/examples_good.md and references/examples_bad.md

Current compatibility packs:

  • llm_agents
  • gen_image
  • embodied_ai

Inputs

  • papers/core_set.csv (required)
  • Optional: papers/papers_dedup.jsonl
  • Optional: DECISIONS.md, GOAL.md, queries.md

Outputs

  • outline/taxonomy.yml

Asset contract

  • assets/taxonomy_schema.json: machine-readable shape for domain packs / output expectations
  • assets/domain_packs/*.yaml: compatibility domain packs for supported domains

Script role

Use scripts/run.py only for deterministic help:

  • never overwrite non-placeholder user taxonomy
  • preserve current CLI flags / output path
  • load supported domain taxonomies from assets instead of hard-coded Python prose
  • keep the generic fallback builder for non-packed domains

When to refine manually

Refine the generated taxonomy before marking the unit DONE if:

  • top-level buckets feel like keyword clusters instead of chapter-level questions
  • leaf names are generic (Overview, Benchmarks, Open Problems, Misc)
  • descriptions lack scope cues or representative paper anchors
  • domain detection chose the wrong pack

Quick start

  • python .codex/skills/taxonomy-builder/scripts/run.py --help
  • python .codex/skills/taxonomy-builder/scripts/run.py --workspace <workspace_dir>

Execution notes

When running in compatibility mode, scripts/run.py currently reads:

  • papers/core_set.csv as the required corpus input
  • papers/papers_dedup.jsonl when present for extra title/abstract signals
  • GOAL.md, queries.md, and DECISIONS.md as optional domain/profile hints during pack selection

Script

Quick Start

  • python .codex/skills/taxonomy-builder/scripts/run.py --workspace <workspace_dir>

All Options

  • --workspace <dir>
  • --top-k <int>
  • --min-freq <int>
  • --unit-id <id>
  • --inputs <a;b;...>
  • --outputs <a;b;...>
  • --checkpoint <C*>

Examples

  • python .codex/skills/taxonomy-builder/scripts/run.py --workspace workspaces/<ws>

Troubleshooting

  • If the wrong domain pack is chosen, inspect GOAL.md, queries.md, and the pack detect rules before changing Python.
  • If outline/taxonomy.yml already contains a real non-placeholder taxonomy, the script intentionally returns without overwriting it.
  • If no pack matches, the script falls back to the generic builder.
Weekly Installs
28
GitHub Stars
304
First Seen
Jan 23, 2026
Installed on
claude-code23
gemini-cli23
codex22
cursor21
opencode21
github-copilot18