wiki-ingest
wiki-ingest: Source Ingestion
Read the source. Write the wiki. Cross-reference everything. A single source typically touches 8-15 wiki pages.
Syntax standard: Write all Obsidian Markdown using proper Obsidian Flavored Markdown. Wikilinks as [[Note Name]], callouts as > [!type] Title, embeds as ![[file]], properties as YAML frontmatter. If the kepano/obsidian-skills plugin is installed, prefer its canonical obsidian-markdown skill for Obsidian syntax reference. Otherwise, follow the guidance in this skill.
Delta Tracking
Before ingesting any file, check .raw/.manifest.json to avoid re-processing unchanged sources.
# Check if manifest exists
[ -f .raw/.manifest.json ] && echo "exists" || echo "no manifest yet"
Manifest format (create if missing):
{
"sources": {
".raw/articles/article-slug-2026-04-08.md": {
"hash": "abc123",
"ingested_at": "2026-04-08",
"pages_created": ["wiki/sources/article-slug.md", "wiki/entities/Person.md"],
"pages_updated": ["wiki/index.md"]
}
}
}
Before ingesting a file:
- Compute a hash:
md5sum [file] | cut -d' ' -f1(orsha256sumon Linux). - Check if the path exists in
.manifest.jsonwith the same hash. - If hash matches, skip. Report: "Already ingested (unchanged). Use
forceto re-ingest." - If missing or hash differs, proceed with ingest.
After ingesting a file:
- Record
{hash, ingested_at, pages_created, pages_updated}in.manifest.json. - Write the updated manifest back.
Skip delta checking if the user says "force ingest" or "re-ingest".
URL Ingestion
Trigger: user passes a URL starting with https://.
Steps:
- Fetch the page using WebFetch.
- Clean (optional): if
defuddleis available (which defuddle 2>/dev/null), rundefuddle [url]to strip ads, nav, and clutter. Typically saves 40-60% tokens. Fall back to raw WebFetch output if not installed. - Derive slug from the URL path (last segment, lowercased, spaces→hyphens, strip query strings).
- Save to
.raw/articles/[slug]-[YYYY-MM-DD].mdwith a frontmatter header:--- source_url: [url] fetched: [YYYY-MM-DD] --- - Proceed with Single Source Ingest starting at step 2 (file is now in
.raw/).
Image / Vision Ingestion
Trigger: user passes an image file path (.png, .jpg, .jpeg, .gif, .webp, .svg, .avif).
Steps:
- Read the image file using the Read tool. Claude can process images natively.
- Describe the image contents: extract all text (OCR), identify key concepts, entities, diagrams, and data visible in the image.
- Save the description to
.raw/images/[slug]-[YYYY-MM-DD].md:--- source_type: image original_file: [original path] fetched: YYYY-MM-DD --- # Image: [slug] [Full description of image contents, transcribed text, entities visible, etc.] - Copy the image to
_attachments/images/[slug].[ext]if it's not already in the vault. - Proceed with Single Source Ingest on the saved description file.
Use cases: whiteboard photos, screenshots, diagrams, infographics, document scans.
Single Source Ingest
Trigger: user drops a file into .raw/ or pastes content.
Steps:
- Read the source completely. Do not skim.
- Discuss key takeaways with the user. Ask: "What should I emphasize? How granular?" Skip this if the user says "just ingest it."
- Create source summary in
wiki/sources/. Use the source frontmatter schema fromreferences/frontmatter.md. - Create or update entity pages for every person, org, product, and repo mentioned. One page per entity.
- Create or update concept pages for significant ideas and frameworks.
- Update relevant domain page(s) and their
_index.mdsub-indexes. - Update
wiki/overview.mdif the big picture changed. - Update
wiki/index.md. Add entries for all new pages. - Update
wiki/hot.mdwith this ingest's context. - Append to
wiki/log.md(new entries at the TOP):## [YYYY-MM-DD] ingest | Source Title - Source: `.raw/articles/filename.md` - Summary: [[Source Title]] - Pages created: [[Page 1]], [[Page 2]] - Pages updated: [[Page 3]], [[Page 4]] - Key insight: One sentence on what is new. - Check for contradictions. If new info conflicts with existing pages, add
> [!contradiction]callouts on both pages.
Batch Ingest
Trigger: user drops multiple files or says "ingest all of these."
Steps:
- List all files to process. Confirm with user before starting.
- Process each source following the single ingest flow. Defer cross-referencing between sources until step 3.
- After all sources: do a cross-reference pass. Look for connections between the newly ingested sources.
- Update index, hot cache, and log once at the end (not per-source).
- Report: "Processed N sources. Created X pages, updated Y pages. Here are the key connections I found."
Batch ingest is less interactive. For 30+ sources, expect significant processing time. Check in with the user after every 10 sources.
Context Window Discipline
Token budget matters. Follow these rules during ingest:
- Read
wiki/hot.mdfirst. If it contains the relevant context, don't re-read full pages. - Read
wiki/index.mdto find existing pages before creating new ones. - Read only 3-5 existing pages per ingest. If you need 10+, you are reading too broadly.
- Use PATCH for surgical edits. Never re-read an entire file just to update one field.
- Keep wiki pages short. 100-300 lines max. If a page grows beyond 300 lines, split it.
- Use search (
/search/simple/) to find specific content without reading full pages.
Contradictions
[!note] Custom callout dependency The
[!contradiction]callout type used below is a custom callout defined in.obsidian/snippets/vault-colors.css(auto-installed by/wikiscaffold). It renders with reddish-brown styling and an alert-triangle icon when the snippet is enabled. If the snippet is missing, Obsidian falls back to default callout styling, so the page still works without the visual flourish. See [[skills/wiki/references/css-snippets.md]] for the four custom callouts (contradiction,gap,key-insight,stale).
When new info contradicts an existing wiki page:
On the existing page, add:
> [!contradiction] Conflict with [[New Source]]
> [[Existing Page]] claims X. [[New Source]] says Y.
> Needs resolution. Check dates, context, and primary sources.
On the new source summary, reference it:
> [!contradiction] Contradicts [[Existing Page]]
> This source says Y, but existing wiki says X. See [[Existing Page]] for details.
Do not silently overwrite old claims. Flag and let the user decide.
What Not to Do
- Do not modify anything in
.raw/. These are immutable source documents. - Do not create duplicate pages. Always check the index and search before creating.
- Do not skip the log entry. Every ingest must be recorded.
- Do not skip the hot cache update. It is what keeps future sessions fast.