addon-postgres-document-pipeline-schema
SKILL.md
Add-on: Postgres Schema (Document Pipeline)
Use this skill to add relational tables that make the pipeline queryable, auditable, and reprocessable.
Inputs
Collect:
USE_ALEMBIC:yes(default when paired witharchitect-python-uv-fastapi-sqlalchemy).DOC_ID_KIND:uuid(default) orulid(only if user requests).
Required Tables (Core)
Create tables aligned with the system summary:
documentsjobsdocument_pagesdocument_chunksextracted_clausesvalidationsreports
Optional but recommended:
range_rules(versioned rule/policy set; “Range”)agent_runs(record extraction/validation model version + prompts)review_items/review_decisions(human review workflow)
Column Conventions
Prefer:
- stable ids (
uuidrecommended) created_at,updated_attimestampsstatusenums or constrained stringsmetadata_jsonbfor parser/cleanup provenance
documents (minimum)
idfilenamemime_typechecksumstorage_key(raw PDF key)statuspage_countcreated_at
document_pages (minimum)
iddocument_idpage_number(1-based)raw_markdownclean_markdownmetadata_jsonb(e.g., header/footer removal decisions)
document_chunks (minimum)
iddocument_idpage_numberchunk_index(0-based within page)heading(nullable)textchar_start,char_end(spans within the chosen page artifact)metadata_jsonb(parser, cleaned flag, section path)
jobs (minimum)
iddocument_idjob_type(e.g.,preprocess)status(queued|running|succeeded|failed)error(nullable)started_at,finished_at
Indices / Constraints (Audit-Friendly)
- Unique:
(document_id, page_number)ondocument_pages - Index:
(document_id, page_number, chunk_index)ondocument_chunks - Index:
(document_id, status)onjobs - Foreign keys everywhere; decide cascade behavior explicitly and justify it
Migration Workflow (Alembic)
When paired with architect-python-uv-fastapi-sqlalchemy:
- add SQLAlchemy models
- generate an Alembic migration
- ensure local compose startup can run migrations automatically (or via a documented command)
Guardrails
- Never store only a single giant JSON blob; store artifacts in S3/MinIO and ingest rows for querying.
- Preserve raw vs clean text for each page.
- Schema must support reprocessing and deterministic reproducibility (store parser versions + cleanup params in JSONB).
Decision Justification Rule
- Every non-trivial decision must include a concrete justification.
- Capture the alternatives considered and why they were rejected.
- State tradeoffs and residual risks for the chosen option.
- If justification is missing, treat the task as incomplete and surface it as a blocker.
Weekly Installs
1
Repository
ajrlewis/ai-skillsFirst Seen
4 days ago
Security Audits
Installed on
amp1
cline1
opencode1
cursor1
kimi-cli1
codex1