llm-wiki-en
LLM Wiki
Turn your LLM into a Wiki maintainer. The LLM incrementally builds and maintains a persistent, interconnected Markdown knowledge base. Knowledge is compiled once and continuously updated, rather than re-derived each time.
Inspired by:
- Karpathy - LLM Wiki — Incremental knowledge base architecture
- Compound Engineering Plugin — Knowledge compounding: every problem-solving experience should make the next one easier
When to Use
The AI should proactively identify and use this skill when:
- User wants to build a knowledge base - "Help me organize these materials", "I want to create a wiki"
- User provides new learning materials - Shares an article, paper, book chapter that needs organizing
- User wants to query existing knowledge - "What did I previously read about this concept?"
- User wants to maintain wiki health - "Check the wiki for contradictions"
- User solved a problem - "Done", "Fixed it", "That worked"
- User is doing long-term research - Weeks/months-long research topics requiring knowledge accumulation
When NOT to use:
- One-off Q&A that doesn't need persistent knowledge
Entry Point
Step 1: Determine User Intent
Based on what the user says, determine which operation to execute:
| User Intent | Operation |
|---|---|
| "Create a knowledge base", "Initialize wiki" | init |
| "Help me process this article", "I have new materials", "Check out this link" | ingest |
| "Done", "Fixed it", "Problem solved" | compound |
| "What is X?", "Summarize Y for me" | query |
| "Check the wiki", "Clean up the knowledge base" | lint |
| Intent unclear | Ask user to choose |
When intent is unclear, ask:
Which operation would you like to run?
1. init - Initialize knowledge base
2. ingest - Ingest new materials
3. compound - Document problem-solving experience
4. query - Query existing knowledge
5. lint - Health check
Step 2: Check if Initialized
Before any operation (except init itself), check if the ~/llm-wiki/ directory exists:
- Exists → Proceed with the target operation
- Does not exist → Auto-execute init first without asking, then proceed with the target operation
# Detection method
ls ~/llm-wiki/wiki/index.md 2>/dev/null
Operation: init
When to Execute
- User explicitly requests initialization
- Auto-executed when
~/llm-wiki/doesn't exist before other operations
Workflow
1. Create Directory Structure
mkdir -p ~/llm-wiki/raw/{articles,papers,books,notes,assets}
mkdir -p ~/llm-wiki/wiki/{entities,concepts,topics,sources,solutions}
Adjust raw/ subdirectories based on knowledge base topics mentioned in prior conversation.
2. Create index.md
# Wiki Index
## Overview
- [[overview]] - Overall summary
## Sources
<!-- Ingested raw material summaries, sorted by date descending -->
## Entities
<!-- People, organizations, products, etc., sorted by name -->
## Concepts
<!-- Theories, methods, terminology, etc., sorted by name -->
## Topics
<!-- Comprehensive analyses, comparisons, etc., sorted by name -->
## Solutions
<!-- Problem-solving experiences and insights (generated by compound) -->
3. Create log.md
# Wiki Log
<!-- Operation records appended here chronologically, format: ## [YYYY-MM-DD] operation | description -->
4. Create overview.md
---
type: overview
created: YYYY-MM-DD
---
# Knowledge Base Overview
> This Wiki is automatically maintained by LLM. You handle topic selection and questions; the LLM handles summarization, cross-referencing, archiving, and maintenance.
## Current Status
- Source count: 0
- Total pages: 0 (including index, log, overview)
- Last updated: -
## Key Findings
<!-- As knowledge accumulates, the most important findings will be summarized here -->
5. Output Confirmation
Wiki knowledge base initialized! ~/llm-wiki/
Next steps:
- Put materials in the raw/ directory, I'll organize them (ingest)
- Give me a link or text, I'll save and process it (ingest)
- Tell me when you've solved a problem, I'll document it (compound)
- Ask me about existing knowledge in the Wiki anytime (query)
- Let me check the Wiki's health (lint)
If auto-initialized (not user-initiated), simplify output to one line: Auto-initialized knowledge base ~/llm-wiki/, then proceed with the target operation.
Operation: ingest
Process new raw materials and integrate knowledge into the Wiki. A single new material may affect 10-15 Wiki pages.
Workflow
1. Determine Materials to Process
By priority:
- User specified specific material (link, text, file path) → Process only that material
- User says "process new materials" → Scan
raw/for unprocessed files - User says "process all new materials" → Batch process
Determining processed/unprocessed: Compare raw/ files against the source frontmatter field in wiki/sources/ summary pages. Has a corresponding summary page = processed.
Found 3 unprocessed materials:
1. raw/articles/attention-paper.pdf
2. raw/notes/meeting-2026-04-05.md
3. raw/papers/bert-paper.pdf
Process all, or select specific ones? (Default: all)
2. Save Raw Materials (URL/text only)
- URL → Fetch and save to
raw/articles/ - Text → Save to
raw/notes/ - Existing file → Read directly
3. Read and Extract
Read the raw material, identifying core arguments, key entities, important concepts, data/facts, and relationships to other sources.
4. Discuss with User (Recommended, skip for batch processing)
Core points of this material:
1. ...
2. ...
Key entities/concepts involved: A, B, C
Which aspects would you like to focus on?
5. Create Source Summary Page
Create in wiki/sources/, filename: YYYY-MM-DD-short-name.md
---
type: source
date: YYYY-MM-DD
source: raw/path/to/file
tags: [tag1, tag2]
---
# Source: Title
## Key Points
- Point 1
## Key Quotes
> Original quote
## Relationships to Other Sources
- Corroborates [[other-source]] on X
- Contradicts [[contradicting-source]] on Y
## Derived Concepts
- [[concept-a]]
6. Update Entity and Concept Pages
For each entity and concept mentioned in the material:
- Existing page → Append new information, cite source
- New page → Create using template
Entity/concept page template:
---
type: entity # or concept
created: YYYY-MM-DD
updated: YYYY-MM-DD
sources: [source-a, source-b]
---
# Name
## Definition
Brief description.
## Key Information
- Info point 1 (Source: [[source-a]])
## Relations
- Related concepts: [[concept-x]]
## Open Questions
- Unanswered questions
Note:
- When new information contradicts existing content, keep both versions with clear annotations
- Every factual claim must cite its source
7. Update Topic Pages (if needed)
8. Update index.md, overview.md
9. Append to log.md
## [YYYY-MM-DD] ingest | Material Title
- **Source**: raw/path/to/file
- **New pages**: page-a, page-b
- **Updated pages**: page-c, page-d
- **Impact scope**: N pages
10. Output Summary
Processing complete.
New:
- Source summary: [[source-name]]
- Entities: [[entity-a]], [[entity-b]]
- Concepts: [[concept-c]]
Updated:
- [[concept-d]] - Added details about X
Warning - Contradictions found:
- Description of Y in [[concept-d]] is inconsistent with [[source-old]]
Operation: compound
Document problem-solving experiences into wiki/solutions/. Knowledge compounding: invest time researching once, document it, solve it in minutes next time.
When to Execute
- User says "Done", "Fixed it", "Problem solved"
- Just completed a valuable debugging, exploration, or analysis process
- Discovered a pattern, trick, or best practice worth recording
Not worth recording: Typos, obvious minor fixes, one-off non-reproducible issues. Just tell the user why.
Dual Tracks
Bug Track (Problem Resolution): For fixing bugs, resolving errors.
---
type: solution
track: bug
date: YYYY-MM-DD
tags: [tag1, tag2]
---
# Problem Title
## Problem
1-2 sentence description.
## Symptoms
- Observable abnormal behavior
## Investigation
1. ❌ Attempt A → Reason for failure
2. ✅ Final solution
## Root Cause
Explanation of the cause.
## Solution
\`\`\`
// Before
...
// After
...
\`\`\`
## Prevention
How to avoid recurrence.
## Relations
- [[concept-a]]
Knowledge Track (Insights): For summarizing patterns, best practices, workflow tips.
---
type: solution
track: knowledge
date: YYYY-MM-DD
tags: [tag1, tag2]
---
# Insight Title
## Background
Context in which this experience was gained.
## Guidance
Specific practices, patterns, or recommendations.
## Why It Matters
Impact of following or not following this practice.
## When to Apply
Conditions under which this experience applies.
## Relations
- [[concept-a]]
Workflow
- Extract information from context — Problem description, investigation process, root cause, solution, key code
- Choose track — Solved a specific problem → Bug Track; Summarized experience/pattern → Knowledge Track
- Check for overlap — Search
wiki/solutions/for similar documents. High overlap → Update existing; Low or none → Create new - Write document —
wiki/solutions/YYYY-MM-DD-short-name.md - Update index.md, overview.md, log.md
- Output summary
Operation: query
Answer questions based on Wiki content. Good answers are archived back into the Wiki.
Core Principle
Good answers should be archived back into the Wiki. Multi-source synthesis, comparison tables, new discoveries → Save as new topic pages.
Workflow
- Read index.md to understand the full picture
- Locate relevant pages — Find the 2-5 most relevant pages (including sources, solutions)
- Synthesize answer — Cite with
[[wikilink]], annotate sources - Archive valuable answers — Save to
wiki/topics/as new pages - Suggest further exploration — Information gaps, materials that could be补充
Operation: lint
Detect contradictions, orphan pages, missing concepts, and other issues to maintain long-term Wiki health.
When to Execute
- User says "check the wiki", "clean up the knowledge base"
- Periodically when Wiki accumulates 20+ pages
- After adding a batch of important materials
6 Checks
| Check | Method |
|---|---|
| Contradiction detection | Compare descriptions of the same topic across different pages |
| Outdated information | Page updated date is much earlier than related sources |
| Orphan pages | Pages with 0 inbound [[wikilink]] |
| Missing pages | Referenced by [[wikilink]] but not yet created |
| Missing cross-references | Pages sharing 2+ sources but not linked to each other |
| Data gaps | "Open questions" on concept pages, unexplored directions in overview |
Workflow
- Read full picture (
ls -R wiki/+wiki/index.md) - Check each item
- Generate report (statistics + issues listed by priority)
- Ask user whether to auto-fix (create missing pages, add cross-references, etc.)
- Execute fixes, update log.md
Two-Layer Architecture
~/llm-wiki/
├── raw/ # Raw materials (immutable)
│ ├── articles/
│ ├── papers/
│ ├── books/
│ ├── notes/
│ └── assets/
└── wiki/ # LLM-maintained Wiki
├── index.md # Table of contents
├── log.md # Operation log
├── overview.md # Overall overview
├── entities/ # Entity pages
├── concepts/ # Concept pages
├── topics/ # Topic pages
├── sources/ # Source summaries
└── solutions/ # Solution documents
File Naming Convention
| Type | Path Format |
|---|---|
| Source summary | wiki/sources/YYYY-MM-DD-short-name.md |
| Entity page | wiki/entities/name.md |
| Concept page | wiki/concepts/concept-name.md |
| Topic page | wiki/topics/topic-name.md |
| Solution document | wiki/solutions/YYYY-MM-DD-short-name.md |
All filenames use lowercase English with hyphens.
Writing Standards
- Start each page with YAML frontmatter (
type,date,tags,sources) - Use
[[wikilink]]for inter-page links - Cite sources for every factual claim
- When new and old information contradict, keep both versions annotated
- Keep pages concise, focused on one topic
Obsidian Integration
The Wiki directory can be opened directly with Obsidian: Graph View, Web Clipper, Dataview, Marp plugins.
Examples
ingest
User: I put the Attention Is All You Need paper in raw/papers/
Agent: I'll process this paper.
[Reads raw/papers/attention-is-all-you-need.pdf]
[Extracts key information]
Core points of this paper:
1. Proposed the Transformer architecture, entirely based on attention mechanisms, abandoning RNN/CNN
2. Introduced Multi-Head Self-Attention and Positional Encoding
3. Achieved SOTA on machine translation tasks
Entities involved: Google Brain, Ashish Vaswani and 7 other authors
Concepts involved: Self-Attention, Multi-Head Attention, Positional Encoding, Layer Normalization
Which aspects would you like to focus on?
User: Focus on the design rationale behind Self-Attention
Agent: [Creates wiki/sources/2026-04-06-attention-paper.md]
[Creates wiki/concepts/self-attention.md - Focuses on design rationale]
[Creates wiki/entities/transformer.md]
[Creates wiki/entities/google-brain.md]
[Creates wiki/concepts/multi-head-attention.md]
[Creates wiki/concepts/positional-encoding.md]
[Updates wiki/overview.md]
[Updates wiki/index.md]
[Appends to wiki/log.md]
Processing complete. 6 new pages created. The Self-Attention concept page focuses on the design motivation behind Scaled Dot-Product.
compound
Bug Track:
User: Fixed it, it was a database connection pool configuration issue
Agent: This is worth documenting. Let me organize it.
[Reviews conversation context]
[Checks wiki/solutions/ for overlap]
[Creates wiki/solutions/2026-04-06-db-connection-pool-exhausted.md]
Experience documented. Bug Track.
Summary:
Problem: Database connection pool exhaustion under high concurrency, request timeouts
Root cause: Default pool size of 5 was insufficient, no idle timeout recycling configured
Solution: Adjusted pool size to 20, added idle timeout configuration
Linked to existing [[database-connections]] concept page in the Wiki.
Knowledge Track:
User: I noticed that debounce works better than throttle for search input handling
Agent: Great insight, let me document that.
[Creates wiki/solutions/2026-04-06-debounce-vs-throttle-search.md]
[Checks for related concept pages]
Experience documented. Knowledge Track.
Core recommendation: Use debounce for search input scenarios (waits until user stops typing),
over throttle (fires at fixed intervals), because search needs complete keywords.
There are no debounce or throttle concept pages in the Wiki yet. Should I create them?
query
User: What's the difference between Self-Attention and regular Attention?
Agent: [Reads wiki/index.md]
[Locates wiki/concepts/self-attention.md, wiki/concepts/attention.md]
[Synthesizes answer]
## Self-Attention vs Regular Attention
**Regular Attention** (Source: [[source-a]]):
- Query and Key/Value come from different sequences
- Typical scenario: target language attending to source language in translation
**Self-Attention** (Source: [[2026-04-06-attention-paper]]):
- Query, Key, and Value all come from the same sequence
- Each position can attend to all other positions in the sequence
| Dimension | Regular Attention | Self-Attention |
|-----------|-------------------|----------------|
| Q/K/V Source | Q ≠ KV | Q = K = V |
| Captures | Cross-sequence | Within-sequence |
| Complexity | O(n×m) | O(n²) |
This comparison has been saved as [[self-attention-vs-attention]].
lint
User: Check the wiki
Agent: [Reads wiki/index.md]
[Scans all wiki/ pages]
[Checks each item]
#### Orphan Pages (1)
1. [[layer-normalization]] - 0 inbound links
#### Missing Pages (2)
1. [[residual-connection]] - Referenced 4 times
2. [[softmax]] - Referenced 2 times
### Suggested Actions
1. [High] Create [[residual-connection]] (referenced 4 times)
2. [High] Create [[softmax]] (referenced 2 times)
3. [Medium] Add to [[positional-encoding]]: different architectures use different schemes
4. [Low] Add inbound links to [[layer-normalization]]
Should I auto-fix these?
User: Create 1 and 2 first, I'll look at 3 and 4 later
Agent: Alright, I'll create residual-connection and softmax.
[Creates wiki/concepts/residual-connection.md]
[Creates wiki/concepts/softmax.md]
[Updates wiki/index.md]
[Appends to wiki/log.md]
Created:
- [[residual-connection]] - Based on description from [[attention-paper]]
- [[softmax]] - As a key computation step in Self-Attention
Leaving contradictions and orphan pages for now. You can ask me to handle them anytime.
Use Cases
Active Learning (ingest):
- Academic research, reading notes, competitive analysis, course notes
Experience Accumulation (compound):
- Engineering practices (bug fixes, best practices), team knowledge base, workflow optimization, personal growth