addon-clause-extraction-citations
Add-on: Clause Extraction (Citations + Confidence)
Use this skill after chunking to extract structured clause records suitable for validation.
The extraction stage operates on cleaned chunks, but must remain auditable back to the original document via page/chunk/span citations.
Compatibility
- Works best when paired with
addon-direct-llm-sdk(preferred) oraddon-langchain-llm. - Requires
addon-jsonl-chunking-citations(or an equivalent chunk table with spans).
Inputs
Collect:
CLAUSE_TYPES: default set:compensation,bonus,commission,benefits,termination,notice,severance,non_compete,non_solicit,confidentiality,ip_assignment,governing_law,dispute_resolution,work_location,remote_work,hours,pto
CONFIDENCE_REVIEW_THRESHOLD: default0.7.LLM_MODEL: user-provided; otherwise choose a stable, cost-appropriate model and justify.
Output Contract (extracted_clauses)
Each extracted clause must include:
document_idchunk_id(FK todocument_chunks)clause_typenormalized_text(agent-produced normalization)source_quote(verbatim quote from the chunk whenever possible)citation_jsonb:page_numberchunk_idchar_start,char_end- optional
raw_storage_keyfor PDF
confidence(0..1)review_neededboolean (set when confidence < threshold or ambiguities detected)agent_run_id/ provenance fields (model, prompt version)
Extraction Workflow
- Iterate
document_chunksfor a document in deterministic order (page asc, chunk_index asc). - For each chunk:
- run clause extraction (LLM-assisted allowed)
- enforce a strict JSON schema output (reject + retry or mark review-needed if schema fails)
- Deduplicate clauses:
- exact same
source_quote+clause_typewithin a document should not create duplicates
- exact same
- Store results in
extracted_clauseswith citations pointing to chunk spans.
Guardrails
- Never “invent” clauses: every clause must be anchored to a
source_quotefrom a specific chunk. - If the agent cannot quote the source confidently, mark as
review_neededrather than guessing. - Persist prompts/model versions for auditability.
Decision Justification Rule
- Every non-trivial decision must include a concrete justification.
- Capture the alternatives considered and why they were rejected.
- State tradeoffs and residual risks for the chosen option.
- If justification is missing, treat the task as incomplete and surface it as a blocker.
More from ajrlewis/ai-skills
architect-python-uv-fastapi-sqlalchemy
Use when scaffolding production-ready FastAPI services with uv, SQLAlchemy, Alembic, Postgres, Docker, and CI gates.
11addon-rag-ingestion-pipeline
Use when adding multi-format RAG ingest, chunk, embed, and retrieval pipelines; pair with architect-python-uv-batch or architect-python-uv-fastapi-sqlalchemy.
11addon-docling-legal-chunk-embed
Use when you need legal PDF to markdown extraction plus clause chunking and embedding prep; pair with addon-rag-ingestion-pipeline and architect-python-uv-batch.
10addon-llm-ancient-greek-translation
Use when adding Koine or Attic Greek translation to Next.js content flows; pair with ui-editorial-writing-surface and addon-nostr-nip23-longform.
10architect-python-uv-batch
Use when scaffolding production-ready Python uv batch or worker projects with Docker required by default.
10addon-human-pr-review-gate
Use when agent-generated code must pass a human PR review gate with trusted checks and merge blocks; pair with addon-decision-justification-ledger and architect-stack-selector.
9