Content Memory Pipeline

Purpose: Take various content sources → convert to markdown → chunk them → refer to them for future use.

The pipeline is the core value. Folder layout, workspace sync, and integration with other context (e.g. Vesta 7) are secondary; those pieces will be part of context anyway.

Architecture

memory/ (project root): Content and chunks. No converted/ or chunked/ subfolders—chunks go directly in topic folders. Markdown in <folder>/markdown/.
workspace/ (optional): Source content for sync_and_chunk; copied to memory.
Convert to markdown/: PDF/DOCX → .md written in <folder>/markdown/ for each folder.
Chunk: Reads from source/<domain>/ or memory/<domain>/, writes to memory/<domain>/<topic>/.

When to Activate

"Add content to memory", "refresh memory", "ingest for agent"
"Sync workspace to memory", "convert and chunk"

Step 1: Convert to Markdown

Convert non-.md files to markdown in a markdown/ subfolder per folder:

python scripts/convert_to_markdown.py --source <path> [--memory <domain>]
python scripts/convert_to_markdown.py --from source/CBE/domain_journeys_approach

Writes .md in <folder>/markdown/ (e.g. CB Domain/foo.pdf → CB Domain/markdown/foo.md)
.md files are skipped

Step 2: Chunk to Memory

Chunk markdown and write directly into memory topic folders:

python scripts/chunk_markdown.py --memory <domain> [--incremental]

Reads from: source/<domain>/**/*.md or memory/<domain>/**/*.md.
Writes to: memory/<domain>/<topic>/ (no chunked/ subfolder)
--incremental: Only chunk new or modified files

Step 3: Sync Workspace (Convert + Copy + Chunk)

One command for workspace content:

python scripts/sync_and_chunk.py --workspace <topic> --memory <domain> [--incremental]

Converts non-.md to <folder>/markdown/ (in workspace)
Copies workspace/<topic> → memory/<domain>/<topic>
Chunks to memory/<domain>/<topic>/

Chunking Strategy

Slide decks (): One chunk per slide
Other docs (>200 lines): Split at # or ## boundaries
Small files (<200 lines): Single chunk

Each chunk includes: 

Key Behaviors

Take content sources – PDF, PPTX, DOCX, etc. (or workspace content).
Convert to markdown – Non-.md files → .md in place.
Chunk – Split markdown into referable chunks (by slide, by heading, or whole file).
Refer for future use – Chunks live where agents/context can find them; source attribution in each chunk.
Incremental – Use --incremental to skip unchanged files.

Project-Specific Transformers

Location	Scope
`memory/<name>/transformers/`	Memory-specific
`.content-memory/transformers/`	Workspace-level

Each .py exports EXTENSIONS and convert(path: Path) -> str.

Scripts

Script	Purpose
`convert_to_markdown.py`	Convert to markdown/
`chunk_markdown.py`	Chunk to memory
`sync_and_chunk.py`	Convert + copy + chunk (workspace)

Run from workspace root. Set CONTENT_MEMORY_ROOT if needed.

Troubleshooting

Issue	Fix
No markdown	Run convert; then chunk. Or sync workspace to memory first.
Missing markitdown	`pip install "markitdown[all]"`

content-memory

Content Memory Pipeline

Architecture

When to Activate

Step 1: Convert to Markdown

Step 2: Chunk to Memory

Step 3: Sync Workspace (Convert + Copy + Chunk)

Chunking Strategy

Key Behaviors

Project-Specific Transformers

Scripts

Troubleshooting

More from agilebydesign/agilebydesign-skills

solution-shaping

abd-cost-of-delay

abd-commit-msg

abd-acceptance-criteria

abd-proposal-respond

abd-context-to-memory