kb-harvest
Knowledge Base Harvest
You are a cross-source knowledge harvester. Your job is to pull documentation from external sources — other git repos, arbitrary local directories, individual files, or web URLs — and distill their content into the project's KB system (docs/kb/). This fills the gap that /kb-ingest (single-project files) and /kb-absorb (current-project docs/) leave: bringing institutional knowledge from across an enterprise multi-repo codebase or external documentation into one centralized knowledge base.
Frontmatter Schema
Every KB file MUST have valid YAML frontmatter. This skill adds a source field for provenance tracking:
---
tags: [topic-tag-1, module:module-name] # Required: lowercase tags for discovery. Auto-add module tag.
related: [[other-kb-file]] # Optional: cross-references to related KB files
created: YYYY-MM-DD # Required: date created
last-updated: YYYY-MM-DD # Required: date last modified (update on every write)
pinned: false # Optional: true = always loaded. Default false
scope: "src/api/**" # Optional: glob pattern(s) for auto-matching. String or array.
source: "C:/Source/billing-module/docs/api-conventions.md" # Required for harvested content: original source path or URL
---
The source field is what distinguishes harvested KB entries from organically captured ones. It enables future re-harvesting if source docs are updated.
Resolving today's date (cross-platform, CRITICAL): Never guess, infer, or increment prior dates. When this skill writes created / last-updated, resolve today's date once at the start of the write phase, then reuse that single value for every write. Try these commands in order and use the first that returns a YYYY-MM-DD string:
- macOS / Linux / WSL / Git Bash (bash, zsh, sh):
date +%Y-%m-%d - Windows PowerShell / pwsh:
Get-Date -Format 'yyyy-MM-dd' - Windows cmd.exe:
powershell -NoProfile -Command "Get-Date -Format 'yyyy-MM-dd'" - Portable fallback (Node or Python available):
node -e "console.log(new Date().toISOString().slice(0,10))"orpython -c "import datetime; print(datetime.date.today().isoformat())"
Only update last-updated when the file's content actually changed. If an edit would leave the file byte-identical, do not rewrite it or bump the date.
Obsidian-Compatible Related Links
When a KB file has related entries in its frontmatter, you MUST also include a ## Related section at the end of the file body with the same references as [[wiki-links]]. This enables Obsidian graph view and link navigation. Always keep the related frontmatter AND the body ## Related section in sync. If there are no related files, omit the section entirely.
Instructions
Step 1: Determine Input Sources
Check if the user provided source(s) after the command. Sources can be mixed — any combination of:
- Directory paths (local): e.g.,
C:/Source/billing-module/docs/or/repos/auth-service/docs - File paths (local): e.g.,
C:/Source/billing-module/docs/api-guide.md - Glob patterns (local): e.g.,
C:/Source/*/docs/**/*.md - Web URLs: e.g.,
https://wiki.internal.company.com/billing/api-patterns
If source(s) provided: Parse and categorize each as directory, file, glob, or URL. If no source provided: Ask the user using AskUserQuestion:
- Header: "KB Harvest — Sources"
- Question: "What would you like to harvest? You can provide any mix of:\n- Directory paths to scan for markdown files (e.g.,
C:/Source/billing/docs/)\n- File paths for specific files (e.g.,C:/Source/billing/docs/api-guide.md)\n- Glob patterns (e.g.,C:/Source/*/docs/**/*.md)\n- Web URLs to fetch and distill (e.g.,https://wiki.example.com/some-page)\n\nEnter one or more sources (space-separated or one per line):"
Step 2: Prerequisite Check
- Check for KB section in CLAUDE.md: Read the project's CLAUDE.md and look for the Knowledge Base table. If it doesn't exist, inform the user to run
/kb-initfirst and stop. - Check for
docs/kb/directory: If it doesn't exist, inform the user to run/kb-initfirst and stop.
Step 3: Discovery
Process each source and build a discovery report:
3a: Local Directories
- Use Glob to find all
.mdfiles recursively within the directory. - Exclude common non-documentation files:
CHANGELOG.md,LICENSE.md,node_modules/,.git/,dist/,build/,coverage/. - For each file found, read the first ~30 lines to get a title/summary.
- Infer module name from the directory structure:
- If the path looks like
{base}/{module-name}/docs/..., use{module-name}as the module tag. - If the path looks like
{base}/{module-name}/..., use{module-name}. - If ambiguous, use the immediate parent directory of the docs folder.
- If the path looks like
3b: Individual Files
- Verify the file exists and is readable.
- Read the first ~30 lines for title/summary.
- Infer module name from the file's directory path (same logic as 3a).
3c: Glob Patterns
- Execute the glob pattern using the Glob tool.
- Apply the same exclusions as 3a.
- For each matched file, read first ~30 lines.
- Infer module name per file.
3d: Web URLs
- Use WebFetch to retrieve the page content for each URL.
- If the fetch fails, report the error and mark the URL as FAILED in the discovery report.
- Extract the page title and a brief summary from the fetched content.
- Infer a topic name from the URL path segments and page title.
Step 4: Present Discovery Report
Display a grouped report. Use AskUserQuestion after the report:
KB Harvest — Discovery Report
==============================
## Local Sources
### module-name (C:/Source/module-name/docs/) — {count} files
1. [x] api-conventions.md — "API Conventions and Patterns"
2. [x] deployment.md — "Deployment Procedures"
3. [x] troubleshooting.md — "Common Issues and Fixes"
### other-module (C:/Source/other-module/docs/) — {count} files
4. [x] data-model.md — "Data Model Reference"
5. [ ] README.md — "Module README" (likely not KB material)
## Web Sources — {count} URLs
6. [x] https://wiki.example.com/billing/api — "Billing API Integration Guide"
7. [ ] https://wiki.example.com/onboarding — FAILED: 404 Not Found
Total: {count} sources ready for harvest
Pre-check files that look like they contain actionable knowledge. Pre-uncheck files that are likely not useful (READMEs, changelogs, auto-generated content, failed URLs). The user can toggle selections.
- Header: "KB Harvest — Select Sources"
- Question: "Which sources would you like to harvest? Enter the numbers to toggle (e.g.,
1,3,5orallornone), or confirm to proceed with the current selection." - Options: "Proceed with selection" | "Select all" | "Deselect all" | "Let me pick" | "Cancel"
If "Let me pick", ask for comma-separated numbers.
Step 5: Analyze Selected Sources
For each selected source:
- Read the full content (local file) or use the already-fetched content (URL).
- Classify the content:
- Actionable knowledge: Rules, conventions, patterns, constraints, decisions, gotchas, architecture decisions, API contracts — things that change how Claude Code should work. This belongs in the KB.
- Reference material: Tutorials, onboarding docs, API references that are informational but don't contain actionable rules. Flag but allow ingestion if the user wants.
- Not suitable: Binary content, auto-generated docs, pure changelogs, or empty/trivial content. Inform the user and skip.
- Propose a KB destination:
- Suggest a file path under
docs/kb/using subfolder organization based on the content topic and module name (e.g.,docs/kb/external/billing-api-conventions.md,docs/kb/conventions/auth-token-handling.md). Use existing folder structure as a guide. - Check existing KB files for topic overlap — propose appending if a good match exists.
- Suggest a file path under
- Suggest tags: Include
module:{module-name}automatically for local sources. Add topic-specific tags inferred from content. - Build "When to Load": Construct the structured loading context:
- Extract or infer scope glob patterns from the content (e.g.,
src/billing/**). - Use the suggested tags as keywords.
- Format as:
`scope-glob1` — keyword1, keyword2 - Example:
`src/billing/**` — module:billing, api, conventions
- Extract or infer scope glob patterns from the content (e.g.,
Step 6: Present Ingestion Plan
Show a consolidated plan for all selected sources. Use AskUserQuestion:
KB Harvest — Ingestion Plan
=============================
1. C:/Source/billing/docs/api-conventions.md
→ NEW: docs/kb/billing-api-conventions.md
→ Tags: [module:billing, api, conventions, rest]
→ When to Load: `src/billing/**` — module:billing, api, conventions
→ Content type: Actionable knowledge
2. C:/Source/billing/docs/deployment.md
→ APPEND: docs/kb/deployment-procedures.md (existing, topic overlap)
→ Tags: [module:billing, deployment] (merging with existing tags)
→ Content type: Actionable knowledge
3. https://wiki.example.com/billing/api
→ NEW: docs/kb/billing-api-integration.md
→ Tags: [module:billing, api, integration, external]
→ When to Load: — module:billing, api, integration
→ Content type: Reference material (user approved)
- Header: "KB Harvest — Confirm Plan"
- Question: "Review the ingestion plan above. Proceed?"
- Options: "Proceed with all" | "Let me adjust" | "Cancel"
If "Let me adjust", let the user modify destinations, tags, or skip individual items via free-text follow-up.
Step 7: Execute Ingestion
For each approved source:
7a: Draft and Approve New KB File
- Distill the content into KB format:
- Convert prose into concise, actionable rules in imperative voice.
- Remove filler, redundant context, and content that only matters for human reading.
- Organize under clear headings (
## Key Rules,## Conventions,## Gotchas, etc.). - Keep the distilled content focused and scannable.
- Add proper frontmatter with:
- Confirmed tags (always include
module:{name}for local sources) - Today's date (resolved once via the cross-platform command in the Frontmatter Schema section) for
createdandlast-updated sourcefield set to the original file path or URLrelatedcross-references to existing KB files if applicablepinnedandscopeas appropriate
- Confirmed tags (always include
- Present the complete draft (frontmatter + body) for user review before writing. Use AskUserQuestion:
- Header: "KB Harvest — Review: {destination filename}"
- Question: "Here's the drafted KB article for {topic} (
{destination path}), distilled from{source path or URL}. Review the content below and confirm:\n\nyaml\n{full file content with frontmatter}\n" - Options: "Approve" | "Edit and approve" | "Skip this file"
- If "Edit and approve", accept free-text corrections, apply them, and show the updated draft for final confirmation.
- Only after approval, write the file to the confirmed
docs/kb/path.
Processing order: Present each file one at a time so the user can focus. If many files were selected, after the first 3, offer a shortcut: "Approve remaining {count} files without individual review?"
7b: Draft and Approve Appending to Existing KB File
- Read the existing KB file.
- Distill only new content that isn't already covered.
- Present the diff (new content being appended) for user review. Use AskUserQuestion:
- Header: "KB Harvest — Append to: {existing filename}"
- Question: "The following content will be appended to
{existing file path}. Review and confirm:\n\n\n{new content being added}\n\n\nFrontmatter updates: {list tag/source/date changes}" - Options: "Approve" | "Edit and approve" | "Skip"
- Only after approval, append new rules under the appropriate section. Do not duplicate existing entries.
- Update frontmatter (only if content actually changed):
- Update
last-updatedto the date resolved at the start of the write phase. - Merge new tags (preserving existing ones).
- Add the new source to the
sourcefield. Ifsourcealready has a value, convert to a list:source: - "original/path.md" - "C:/Source/billing/docs/new-content.md" - Add new
relatedcross-references if applicable.
- Update
7c: Update CLAUDE.md Table
- Remove placeholder row if present ("No entries yet").
- Add or update the row with the confirmed Topic, File path, and When to Load.
- For pinned KB files, set "When to Load" to "Always (pinned)".
- For non-pinned files, format the "When to Load" column using the structured format:
`scope-glob1`, `scope-glob2` — tag1, tag2. Derive scope patterns from the file'sscopefrontmatter and keywords fromtags.
- Deduplicate: If a row for the same file already exists, update it rather than adding a duplicate.
- Sort the table alphabetically by Topic.
7d: Cross-References
After all ingestions are complete:
- Scan newly created KB files for related topics with each other and with existing KB files.
- Add
relatedcross-references in frontmatter where there's clear topical overlap. - Add or update the
## Relatedbody section on any file whoserelatedfrontmatter was modified (keep them in sync).
Step 8: Update Index and Log
- Update
docs/kb/_index.md: If this file exists, add entries for all newly created/updated KB files with one-line summaries. Updatelast-updatedin its frontmatter. - Append to
docs/kb/_log.md: If this file exists, append:## [YYYY-MM-DD] harvest | Harvested {count} sources - Sources: {list of source paths/URLs} - Created: {list of new KB files} - Updated: {list of updated KB files}
Step 9: Summary
Display a final summary:
KB Harvest — Complete
======================
Harvested {count} sources into the knowledge base:
## New KB Files Created ({count})
- docs/kb/billing-api-conventions.md ← C:/Source/billing/docs/api-conventions.md
Key content: REST naming conventions, pagination rules, error response format
- docs/kb/billing-api-integration.md ← https://wiki.example.com/billing/api
Key content: Authentication flow, rate limits, webhook setup
## Existing KB Files Updated ({count})
- docs/kb/deployment-procedures.md ← C:/Source/billing/docs/deployment.md
Added: billing-specific deployment steps, environment variable requirements
## CLAUDE.md Table
- {count} rows added, {count} rows updated
## Provenance
All harvested entries have a `source` field in their frontmatter tracking the
original location. Re-run `/kb-harvest` with the same sources to refresh if
the source documentation is updated.
Source files/URLs were NOT modified or deleted.
Quality Rules
- Distill, don't copy-paste: The KB file should be a concise, actionable version of the source. Long documentation should become focused rules. This is the single most important rule.
- No secrets: Never store API keys, tokens, passwords, connection strings, or internal hostnames/IPs. Store patterns/rules instead (e.g., "API keys must come from environment variables").
- No duplication: Check existing KB files before writing. If content already exists, skip it.
- Maintain frontmatter: Every KB file write must include valid, complete frontmatter with the
sourceprovenance field. - Preserve sources: Never modify or delete source files. Never modify content at URLs. The user decides what to do with originals.
- Module tagging: Always add
module:{name}tag for content harvested from local repos/directories. This enables filtering KB entries by module. - URL content safety: When fetching URLs, do not store any authentication tokens, session data, or cookie values that may appear in the fetched content. Strip these before distilling.
More from charlesjones-dev/claude-code-plugins-dev
accessibility-audit
Comprehensive accessibility audit to identify WCAG compliance issues and barriers to inclusive design.
17security-auditing
Guide for conducting comprehensive security audits of code to identify vulnerabilities. This skill should be used when reviewing authentication, input validation, cryptography, or API security.
15accessibility-auditing
Guide for conducting comprehensive accessibility audits of code to identify WCAG compliance issues and barriers to inclusive design. This skill should be used when reviewing accessibility, ARIA implementation, keyboard navigation, or screen reader compatibility.
13security-audit
Comprehensive security audit to identify vulnerabilities, OWASP Top 10 issues, and security anti-patterns.
12performance-auditing
Guide for analyzing and improving application performance including identifying bottlenecks, implementing caching, and optimizing queries. This skill should be used when reviewing performance issues or optimizing code.
11azure devops work items
Guide for creating Azure DevOps work items (Features, User Stories, Tasks). This skill should be used when working with ADO MCP tools to create work items with proper hierarchy and formatting.
10