taxonomy-builder

Installation

SKILL.md

Taxonomy Builder (router, compatibility mode)

Build outline/taxonomy.yml from papers/core_set.csv.

P0 compatibility note:

The output contract stays the same (outline/taxonomy.yml, YAML list, >=2 levels, concrete descriptions).
Curated domain taxonomies now live in assets/domain_packs/*.yaml instead of Python prose.
scripts/run.py stays a deterministic scaffold/helper: detect domain pack -> load pack when available -> otherwise fall back to the generic builder.

Load Order

references/overview.md
references/taxonomy_principles.md
If a domain pack applies, read its references/domain_pack_<domain>.md and assets/domain_packs/<domain>.yaml
Otherwise read references/archetypes_generic.md
Calibrate naming/description quality with references/examples_good.md and references/examples_bad.md

Current compatibility packs:

llm_agents
gen_image
embodied_ai

Inputs

papers/core_set.csv (required)
Optional: papers/papers_dedup.jsonl
Optional: DECISIONS.md, GOAL.md, queries.md

Outputs

outline/taxonomy.yml

Asset contract

assets/taxonomy_schema.json: machine-readable shape for domain packs / output expectations
assets/domain_packs/*.yaml: compatibility domain packs for supported domains

Script role

Use scripts/run.py only for deterministic help:

never overwrite non-placeholder user taxonomy
preserve current CLI flags / output path
load supported domain taxonomies from assets instead of hard-coded Python prose
keep the generic fallback builder for non-packed domains

When to refine manually

Refine the generated taxonomy before marking the unit DONE if:

top-level buckets feel like keyword clusters instead of chapter-level questions
leaf names are generic (Overview, Benchmarks, Open Problems, Misc)
descriptions lack scope cues or representative paper anchors
domain detection chose the wrong pack

Quick start

python .codex/skills/taxonomy-builder/scripts/run.py --help
python .codex/skills/taxonomy-builder/scripts/run.py --workspace <workspace_dir>

Execution notes

When running in compatibility mode, scripts/run.py currently reads:

papers/core_set.csv as the required corpus input
papers/papers_dedup.jsonl when present for extra title/abstract signals
GOAL.md, queries.md, and DECISIONS.md as optional domain/profile hints during pack selection

Script

Quick Start

python .codex/skills/taxonomy-builder/scripts/run.py --workspace <workspace_dir>

All Options

--workspace <dir>
--top-k <int>
--min-freq <int>
--unit-id <id>
--inputs <a;b;...>
--outputs <a;b;...>
--checkpoint <C*>

Examples

python .codex/skills/taxonomy-builder/scripts/run.py --workspace workspaces/<ws>

Troubleshooting

If the wrong domain pack is chosen, inspect GOAL.md, queries.md, and the pack detect rules before changing Python.
If outline/taxonomy.yml already contains a real non-placeholder taxonomy, the script intentionally returns without overwriting it.
If no pack matches, the script falls back to the generic builder.

Related skills

More from willoscar/research-units-pipeline-skills

Installs

34

Repository

willoscar/resea…e-skills

GitHub Stars

429

First Seen

Jan 23, 2026

Security Audits

Gen Agent Trust HubPass