kb CLI and Knowledge Base Pattern

Build and maintain a self-compiling Obsidian markdown knowledge base using the kb CLI. The LLM reads raw sources, writes cross-linked wiki articles, files Q&A results back into the corpus, and runs lint-and-heal passes. The CLI also supports codebase ingestion with deep inspection commands for code quality, architecture health, and symbol relationships.

Each topic lives in its own top-level folder (e.g. ai-harness/) with raw/, wiki/, outputs/, bases/ subtrees plus a topic-level log.md and CLAUDE.md. All topics share a single Obsidian vault at the repo root. Read references/architecture.md for the full rationale and the four-phase pipeline (ingest → compile → query → lint).

The topic's CLAUDE.md (symlinked to AGENTS.md) is the schema document — it tells the LLM the scope, conventions, current articles, and research gaps for that topic. Co-evolve it as the topic matures.

Prerequisites

Verify the kb binary is available:
```
kb version
```

For search and index commands, verify QMD is installed:

qmd --version
# If missing: npm install -g @tobilu/qmd

Supported source languages for codebase analysis: TypeScript (.ts), TSX (.tsx), JavaScript (.js), JSX (.jsx), Go (.go).

Pattern Overview

Based on Andrej Karpathy's LLM Wiki pattern, the KB treats the LLM as a compiler that reads raw source documents and produces a structured, cross-linked markdown wiki. The four-phase loop:

Ingest — Scrape/curate sources via kb CLI → raw/ (immutable staging)
Compile — LLM reads raw/, writes wiki/concepts/ articles (3000-4000 words, dense wikilinks)
Query — Q&A against wiki → file answers to outputs/queries/, promote strong answers to wiki
Lint — Automated structural checks + LLM-driven semantic healing

Read references/architecture.md for the full rationale, context-window vs RAG tradeoffs, and multi-topic vault design.

Related Skills

This skill orchestrates several companion skills for the LLM-driven phases:

obsidian-markdown — author wiki articles with valid Obsidian Flavored Markdown (wikilinks, callouts, embeds, properties).
obsidian-bases — create .base files under <topic>/bases/ for dashboard views, filters, and formulas.
obsidian-cli — interact with the running Obsidian vault from the command line (open notes, search, refresh indexes).

kb CLI Quick Reference

Topic management

kb topic new <slug> <title> <domain>     # scaffold a new topic
kb topic list                             # list all topics in the vault
kb topic info <slug>                      # topic metadata (counts, last log entry)

Ingestion (auto-generates frontmatter, auto-appends to log.md)

kb ingest url <url> --topic <slug>        # scrape a web URL via Firecrawl
kb ingest file <path> --topic <slug>      # convert local file (PDF, DOCX, EPUB, HTML, images w/OCR, etc.)
kb ingest youtube <url> --topic <slug>    # extract YouTube transcript
kb ingest bookmarks <path> --topic <slug> # ingest a bookmark-cluster markdown file
kb ingest codebase <path> --topic <slug>  # analyze a codebase into raw/codebase/

Codebase inspection

kb inspect smells [--type <smell-type>] --format json
kb inspect dead-code --format json
kb inspect complexity [--top N] --format json
kb inspect blast-radius [--min N] [--top N] --format json
kb inspect coupling [--unstable] --format json
kb inspect circular-deps --format json
kb inspect symbol <name> --format json
kb inspect file <path> --format json
kb inspect backlinks <name-or-path> --format json
kb inspect deps <name-or-path> --format json

Structural linting

kb lint [<slug>] [--save]                 # dead links, orphans, missing sources, format violations, stale content

Indexing and search (requires QMD)

kb index --topic <slug>                   # create or update QMD collection
kb search "<query>" --topic <slug>        # hybrid BM25 + vector search
kb search "<query>" --lex --topic <slug>  # keyword-only search
kb search "<query>" --vec --topic <slug>  # vector-only search

After running kb ingest or kb lint --save, the CLI auto-appends entries to <topic>/log.md. Manual log entries are still needed for compile, query, promote, and split operations (Procedure 5).

Command Dispatch

Map the user's intent to the correct command:

Intent	Command
Scaffold a new topic	`kb topic new <slug> <title> <domain>`
List all topics	`kb topic list`
Scrape a web URL	`kb ingest url <url> --topic <slug>`
Ingest a local file (PDF, DOCX, etc.)	`kb ingest file <path> --topic <slug>`
Extract a YouTube transcript	`kb ingest youtube <url> --topic <slug>`
Ingest bookmark clusters	`kb ingest bookmarks <path> --topic <slug>`
Analyze a codebase	`kb ingest codebase <path> --topic <slug> --progress never`
Find code smells	`kb inspect smells --format json`
Find dead exports and orphan files	`kb inspect dead-code --format json`
Rank functions by complexity	`kb inspect complexity --format json`
Find high-impact symbols (blast radius)	`kb inspect blast-radius --min 5 --format json`
Find unstable files (coupling)	`kb inspect coupling --unstable --format json`
Find circular imports	`kb inspect circular-deps --format json`
Look up a specific symbol	`kb inspect symbol <name> --format json`
Look up a specific file	`kb inspect file <path> --format json`
Find what depends on X (incoming refs)	`kb inspect backlinks <name-or-path> --format json`
Find what X depends on (outgoing deps)	`kb inspect deps <name-or-path> --format json`
Run structural lint	`kb lint <slug> --save`
Index vault for search	`kb index --topic <slug>`
Search the knowledge base	`kb search "<query>" --topic <slug> --format json`

Codebase Analysis Workflow

For codebase-specific analysis, the kb ingest codebase command must run before any inspect command.

Workflow A -- Code Analysis (no QMD required):

kb ingest codebase <path> --topic <slug> --> kb inspect <subcommand>

Workflow B -- Full Pipeline (requires QMD):

kb ingest codebase <path> --topic <slug> --> kb index --> kb search <query>

The vault is stored at <path>/.kb/vault/<topic-slug>/ by default. Later commands auto-discover this vault by walking up from the current working directory.

Ingest a Codebase

kb ingest codebase <path> --topic <slug> --progress never

Always use --progress never in agent contexts to prevent TTY progress bars from corrupting stdout.

Parse the JSON output from stdout to extract key values:

topicSlug -- the topic identifier for later commands
vaultPath -- absolute path to the vault root
topicPath -- absolute path to the topic directory
filesScanned, filesParsed, symbolsExtracted -- summary statistics
diagnostics -- check for warnings or errors

Stderr carries structured stage logs. Do not treat stderr content as failure evidence.

Key flags:

--output <dir> -- override vault root location
--topic <slug> -- override the topic slug
--include <pattern> -- re-include paths that would otherwise be ignored (repeatable)
--exclude <pattern> -- exclude additional paths from scanning (repeatable)
--semantic -- enable semantic analysis when adapters support it

Read references/cli-ingest-codebase.md for the full flag table and output schema.

Inspect the Vault

Run inspect subcommands to analyze code quality and architecture.

Shared flags for all inspect subcommands:

--format json -- always use JSON for programmatic parsing
--vault <path> -- explicit vault root (omit to auto-discover from cwd)
--topic <slug> -- explicit topic slug (omit if only one topic exists)

Tabular Subcommands

These return a list of rows sorted by the primary metric:

smells -- List symbols and files with detected code smells.

kb inspect smells --format json
kb inspect smells --type high-complexity --format json

dead-code -- List dead exports and orphan files.
```
kb inspect dead-code --format json
```

complexity -- Rank functions/methods by cyclomatic complexity. Default top 20.

kb inspect complexity --format json
kb inspect complexity --top 50 --format json

blast-radius -- Rank symbols by transitive dependent count.

kb inspect blast-radius --format json
kb inspect blast-radius --min 10 --top 20 --format json

coupling -- Rank files by instability (Ce / (Ca + Ce)).

kb inspect coupling --format json
kb inspect coupling --unstable --format json

circular-deps -- List files participating in circular import chains.
```
kb inspect circular-deps --format json
```

Detail Lookup Subcommands

These return field-value pairs for a single matched entity:

symbol <name> -- Case-insensitive substring match. Returns detail fields for a single match, or a summary table for multiple matches.
```
kb inspect symbol parseConfig --format json
```
file <path> -- Exact source path lookup. Use the source-relative path as stored in vault frontmatter.
```
kb inspect file src/config.ts --format json
```

Relation Subcommands

These return relation edges (target_path, type, confidence):

backlinks <name-or-path> -- Incoming references. Accepts a symbol name or file path.
```
kb inspect backlinks parseConfig --format json
```
deps <name-or-path> -- Outgoing dependencies. Accepts a symbol name or file path.
```
kb inspect deps src/config.ts --format json
```

Read references/cli-inspect.md for all column schemas and flag details.

Index the Vault

Index the vault content into QMD for search. This step requires QMD on PATH.

kb index --topic <slug>

The command is idempotent: it checks whether the collection already exists and chooses add (create) or update (refresh) automatically.

Key flags:

--embed (default true) -- run embedding after syncing files
--force-embed -- force re-embedding all documents
--context <text> -- attach human context to improve search relevance
--name <name> -- override the derived collection name

Read references/cli-search-index.md for the full output schema.

Search the Vault

Search indexed vault content with QMD. Requires a prior kb index run.

kb search "<query>" --topic <slug> --format json

Search modes:

Hybrid (default) -- combines lexical and vector search
Lexical (--lex) -- BM25 keyword search only
Vector (--vec) -- embedding-based semantic search

The --lex and --vec flags are mutually exclusive. Omit both for hybrid mode.

Key flags:

--limit N (default 10) -- maximum results
--min-score N -- minimum relevance threshold
--full -- return full document content instead of snippets
--all -- return all matches above the minimum score

Read references/cli-search-index.md for full details.

KB Maintenance Procedures

Procedure 1: Compile a wiki article

Read references/compilation-guide.md to anchor on length, style, wikilink density, and sourcing rules.
Identify candidate sources via kb search "<topic phrase>" --topic <slug> or read <topic>/wiki/index/Source Index.md.
Load the candidate raw sources fully into context.
Load <topic>/wiki/index/Concept Index.md for orientation on existing articles and wikilink targets (including in other topics).
Surface takeaways BEFORE drafting. Present to the user: 3-5 key takeaways from the sources, the entities/concepts this article will introduce or update, and anything that contradicts existing wiki articles. Ask: "Anything specific to emphasize or de-emphasize?" Wait for the response. Skip this step only if the user has explicitly asked for autonomous compilation.
Write the article to <topic>/wiki/concepts/<Article Title>.md following the obsidian-markdown skill for wikilink, callout, and frontmatter syntax. Use the frontmatter schema from references/frontmatter-schemas.md. Target 3000-4000 words with a Sources section, wikilinks to related articles, and code or diagram blocks where applicable.
Backlink audit -- do not skip. Grep every existing article in <topic>/wiki/concepts/ for mentions of the new article's title, aliases, or core entities. For each match, add a [[New Article]] wikilink at the first mention (and one later occurrence). This is the step most commonly skipped -- a compounding wiki depends on bidirectional links.
```
grep -rln "<new article title or key term>" <topic>/wiki/concepts/
```
Update the topic's indexes (Procedure 2).
Update <topic>/CLAUDE.md current-articles list.
Re-index the topic's collection: kb index --topic <slug>.
Append an entry to <topic>/log.md (Procedure 5) -- e.g., ## [YYYY-MM-DD] compile | <Article Title> (<word_count> words, <N> sources).

When updating an existing article (rather than writing new), use the Current / Proposed / Reason / Source diff format and contradiction-sweep workflow described in references/compilation-guide.md.

Procedure 2: Maintain topic indexes

After adding, renaming, or removing any wiki article:

<topic>/wiki/index/Dashboard.md -- update article count, total word count, featured sections, and any Obsidian Base embeds (use the obsidian-bases skill to author .base files and embed them).
<topic>/wiki/index/Concept Index.md -- insert/update the article row alphabetically with its one-line summary.
<topic>/wiki/index/Source Index.md -- for each new article, append rows for every source it cites, with a wikilink back to the article.
Optionally refresh the live view in Obsidian with the obsidian-cli skill (obsidian open <path>, obsidian search <query>).

Procedure 3: Query the wiki and file back the answer

A query has two phases: Phase A produces the answer by reading the wiki (never from general knowledge); Phase B files the answer back so the exploration compounds.

Precondition: Identify which topic(s) the question belongs to. If the question spans topics, load each topic's Concept Index.

Phase A -- Answer from the wiki

Read the topic's Concept Index first (<topic>/wiki/index/Concept Index.md). Scan the full index to identify candidate articles. Do NOT answer from general knowledge -- the wiki is the source of truth, even when the answer seems obvious. A contradiction between the wiki and general knowledge is itself valuable signal.
Locate relevant articles. At small scale (<30 articles), the index is enough. At larger scale, supplement with kb search "<phrase>" --topic <slug>. Also grep the topic for keywords: grep -rl "<keyword>" <topic>/wiki/concepts/.
Read the identified articles in full. Follow one level of [[wikilinks]] when targets look relevant to the question. Stop at one hop -- deeper traversal wastes context.
(Optional) Pull in raw sources if an article's claim is ambiguous and its sources: frontmatter points at a specific raw file worth verifying.
Synthesize the answer with these properties:
- Grounded in the wiki articles you just read -- every factual claim traces back to a [[Wiki Article]] citation.
- Notes agreements and disagreements between articles when they exist.
- Flags gaps explicitly: "The wiki has no article on X" or "[[Article Y]] does not yet cover Z".
- Suggests follow-up ingest targets or open questions.
Match format to question type:
- Factual → prose with inline [[wikilink]] citations.
- Comparison → table with rows per alternative, citations in cells.
- How-it-works → numbered steps with citations.
- What-do-we-know-about-X → structured summary with "Known", "Open questions", "Gaps".
- Visual → ASCII/Mermaid diagram, Marp deck (see references/tooling-tips.md), or matplotlib chart.

Phase B -- File back the answer

Save the answer to <topic>/outputs/queries/<YYYY-MM-DD> <Question Slug>.md with frontmatter: type: output, stage: query, informed_by: ["[[Article 1]]", "[[Article 2]]"]. See references/frontmatter-schemas.md for the full schema.
In the body, list which wiki articles informed the answer under informed_by: (as wikilinks) and call out new insights that should be absorbed back into those articles on the next compile pass.
When a filed-back insight contradicts or extends an article's claims, recompile the affected articles (Procedure 1).
Promote to wiki when the synthesis is durable. If the answer is a first-class reference (a comparison table, a trade-off analysis, a new concept synthesized from multiple articles), copy it to <topic>/wiki/concepts/<Title>.md following Procedure 1 standards and update the indexes (Procedure 2). Karpathy's pattern treats strong query answers as wiki citizens, not secondary artifacts.
Append to <topic>/log.md (Procedure 5) -- e.g., ## [YYYY-MM-DD] query | <Question Slug> plus a second line ## [YYYY-MM-DD] promote | <Title> if promoted.

Anti-patterns to avoid:

Answering from memory -- always read the wiki pages. The wiki may contradict what you think you know.
No citations -- every factual claim must trace back to a [[wikilink]].
Skipping the save -- good query answers compound the wiki's value. Always file to outputs/queries/; promote when durable.
Silent gaps -- surface missing coverage explicitly so the next ingest pass can fill it.

Procedure 4: Lint and heal

Run structural lint via the kb CLI:

kb lint <slug> --save

This checks dead wikilinks, orphan articles, missing source references, format violations, and stale content, saving a dated report to <topic>/outputs/reports/. For each issue, propose the fix with a diff before applying -- do not batch-apply changes:

Dead wikilink -- either create the missing article (Procedure 1) or rewrite the wikilink to point at an existing article.
Orphan article -- add incoming wikilinks from at least one related article, or remove the article if it is outside the topic's scope.
Missing source file -- an article's sources: frontmatter references a file absent from raw/. Either re-ingest (kb ingest url/file) or correct the reference.
Stale content -- article's updated: date is older than its source's scraped: date. Recompile with current sources.
Format violation -- fix missing frontmatter fields, H1 title, lead paragraph, or Sources section.

For deeper LLM-driven self-healing checks (inconsistencies across articles, missing coverage, wikilink audits, filed-back query absorption), read references/lint-procedure.md.

After the heal pass, append ## [YYYY-MM-DD] lint | <N> issues found, <M> fixed to <topic>/log.md.

Procedure 5: Append to log.md

The kb CLI auto-appends log entries for ingest and lint --save operations. Manual entries are needed for compile, query, promote, and split operations.

Format -- each entry is a single H2 heading with a consistent prefix so the log stays grep-able:

## [YYYY-MM-DD] <op> | <short description>

Where <op> is one of compile, query, promote, or split (ingest and lint are handled by kb).

Examples:

## [2026-04-04] compile | Transformer Architecture (3847 words, 6 sources)
## [2026-04-04] query | 2026-04-04 flash-attention-vs-paged-attention.md
## [2026-04-04] promote | FlashAttention vs PagedAttention (from query)
## [2026-04-05] split | "Inference Optimization" → KV Cache, Speculative Decoding

Optionally add a body paragraph under each entry with more context (key findings, source urls, decisions made). Keep entries terse -- the log is for skimming, not prose.

Quick recent-activity check -- the consistent prefix lets unix tools query the log:

grep "^## \[" <topic>/log.md | tail -10                  # last 10 events
grep "^## \[.*compile" <topic>/log.md | wc -l            # total compiles
grep "^## \[2026-04" <topic>/log.md                      # April 2026 events

Keep log.md at the topic root (not inside wiki/ or outputs/) so it sits alongside CLAUDE.md as a first-class topic artifact.

Output Format Selection

All inspect and search commands support --format:

json -- always use for programmatic parsing
table -- human-readable aligned columns (default)
tsv -- tab-separated for piping to Unix tools

The ingest codebase and index commands always output JSON to stdout.

Read references/output-formats.md for format examples and empty result handling.

Error Handling

CLI Errors

Error	Recovery
`unable to find a vault from <path>`	Run `kb ingest codebase <path> --topic <slug>` first
`QMD is not available`	Run `npm install -g @tobilu/qmd`
`no topics were found`	Run `kb ingest codebase` or `kb topic new` to populate the vault
`multiple topics were found`	Re-run with `--topic <slug>`
`no symbols matched "<query>"`	Use `inspect smells` or `inspect complexity` to discover valid names
`no file matched "<path>"`	Use exact source-relative path from vault frontmatter (e.g. `src/config.ts` not `./src/config.ts`)

KB Workflow Errors

Error	Recovery
`kb` not found	Install the `kb` binary and ensure it is on PATH. Verify with `kb version`
Topic not found	Run `kb topic list` to see available topics, or scaffold with `kb topic new`
Article exceeds 4000 words	Extract a sub-topic into its own article and wikilink to it
Cross-topic wikilink ambiguity	Disambiguate with full path: `[[other-topic/wiki/concepts/Article Name\|Display Name]]`
`log.md` missing in existing topic	Create manually and backfill from git: `git log --format='## [%ad] <op> \| %s' --date=short <topic>/`

Read references/error-handling.md for the full error catalog with causes and recovery steps.

Constraints

MUST DO

Run kb ingest codebase before any inspect command on that topic
Use --format json when parsing output programmatically
Use --progress never when running kb ingest codebase in a non-interactive context
Parse stdout only for command output; treat stderr as diagnostics
Use the topicSlug from ingest output for subsequent --topic flags
Read references/compilation-guide.md before writing wiki articles
Run backlink audits after every article compile (Procedure 1, step 7)
File query answers to outputs/queries/ (Procedure 3)
Append manual log entries for compile, query, promote, and split operations

MUST NOT DO

Pass both --lex and --vec to search
Pass --force-embed with --embed=false to index
Treat stderr content as failure evidence for kb ingest codebase
Assume vault location without running ingest or checking for .kb/vault/
Use relative paths like ./src/config.ts for inspect file -- use src/config.ts instead
Answer wiki queries from general knowledge -- the wiki is the source of truth
Skip the backlink audit when compiling articles
Batch-apply lint fixes without proposing diffs first