session-titles
SKILL.md
Overview
This skill owns the entire session title lifecycle:
- Generation -- Stop hook extracts context from the transcript (primary request, branch, files) and calls Haiku to produce a 4-7 word active-voice title. Detects focus shifts and tracks them with a
(N)prefix. - Feedback -- Each generated title is saved as a pending feedback entry for later scoring.
- Rating -- Interactive workflow where an AI judge scores first, then the human confirms or corrects. Builds dual-perspective training data.
- Evaluation -- Pattern checks (fallback, meta-language, too long, etc.) plus optional LLM judge scoring across all pending entries.
- Evolution -- GEPA-inspired prompt mutation: reflect on failures, propose targeted changes, keep improvements.
- Golden dataset -- Extract candidates from real sessions, curate ideal titles, run regression evals.
How It Works
Session ends
--> Stop hook (hooks/stop.sh)
--> scripts/generate.ts (stdin: session_id, cwd, transcript_path)
--> generate-core.ts
1. extractSessionContext() -- parse transcript JSONL
2. evolveTitleWithContext() -- initial title or shift detection via Haiku
3. sanitizeTitle() -- strip preambles, enforce length
4. savePendingFeedback() -- append to title-feedback/pending.jsonl
Scripts
All scripts support --help style flags. Run with bun.
| Script | Purpose |
|---|---|
generate.ts |
Hook entry point. Reads JSON from stdin. |
generate-core.ts |
Core module: context extraction, title generation, shift detection. |
generate-core.test.ts |
Unit + integration tests. bun test scripts/generate-core.test.ts |
schema.ts |
TitleFeedback types, prompt version constants. |
store.ts |
JSONL persistence for pending/scored feedback. |
eval-quality.ts |
Pattern checks + optional --judge LLM scoring. |
evolve-prompt.ts |
GEPA evolution. --iterations N, --pareto-size N. |
extract-candidates.ts |
Pull test cases from session transcripts. --limit N, --project NAME. |
run-eval.ts |
Run golden dataset eval. --judge-model MODEL. |
report.ts |
Generate report from latest eval results. --file PATH. |
Rate Title
Interactive rating workflow (invoke as /rate-title or manually):
- AI Judge assesses first -- Score (1-5), reasoning, proposed better title.
- Human calibrates -- Agree? Different score? Better suggestion?
- Both perspectives saved to
~/.claude/title-feedback/scored.jsonl.
The dual-perspective data enables DSPy optimization of both the judge prompt (learn to rate like the human) and the journalist prompt (generate titles humans rate highly).
Rating Scale
| Score | Meaning |
|---|---|
| 5 | Perfect -- specific, actionable, concise |
| 4 | Good -- minor phrasing improvements possible |
| 3 | Acceptable -- gets the gist but generic |
| 2 | Poor -- too vague or wrong focus |
| 1 | Bad -- completely off-base or misleading |
See references/scoring-rubric.md for detailed criteria.
Data Layout
Runtime data (gitignored, at ~/.claude/title-feedback/):
pending.jsonl-- written by Stop hookscored.jsonl-- written by /rate-title
Evaluation data (gitignored, at skills/session-titles/data/):
candidates.jsonl-- extracted test casesgolden.jsonl-- curated with ideal titlesresults/-- timestamped eval outputsbaseline-*.md,evolution-*.md-- quality reports
References
references/scoring-rubric.md-- 5-point rating criteriareferences/adaptive-title-plan.md-- Roadmap: phases, LanceDB vectors, DSPy optimization
Weekly Installs
1
Repository
fairchild/dotclaudeGitHub Stars
1
First Seen
6 days ago
Security Audits
Installed on
zencoder1
amp1
cline1
openclaw1
opencode1
cursor1