prism-ade
SKILL.md
Prism ADE — Agentic Document Extraction
Purpose
Project-specific skill for the Prism codebase. Provides architecture context, coding patterns, and guidelines for working on the multi-agent document extraction pipeline.
When to Use
- Adding or modifying agents in
prism/agents/ - Working with the LangGraph orchestrator
- Implementing OpenAI Responses API calls
- Modifying extraction schemas or Structured Outputs
- Working on OCR, grounding, or validation logic
- Debugging the extraction pipeline
- Adding API endpoints or Playground features
Architecture Overview
START → ingest → parse → extract → ground → validate → (retry?) → END
↓ (fields_to_retry)
extract (re-loop, max 3)
Agents
| Agent | File | Input | Output |
|---|---|---|---|
| IngestNode | orchestrator.py |
file_path | images, OCR, digital_text |
| ParseAgent | parse_agent.py |
images/text | parsed_markdown (per-page) |
| ExtractAgent | extract_agent.py |
images/text + schema | extracted_data + _sources |
| GroundingAgent | grounding_agent.py |
extracted + OCR words | grounding_map (bboxes) |
| ValidatorAgent | validator_agent.py |
extracted + grounding | final_output + confidence |
| SchemaAgent | schema_agent.py |
images/text | suggested_schema |
Core Modules
| Module | File | Role |
|---|---|---|
| Config | core/config.py |
Env-based settings via dataclass |
| Types | core/types.py |
Pydantic models (BBox, OCRWord, Confidence) |
| State | core/state.py |
DocumentState TypedDict for LangGraph |
| Ingestion | core/ingestion.py |
PDF routing, image conversion, preprocessing |
| OCR | core/ocr.py |
Tesseract word-level bounding boxes |
| Grounding | core/grounding.py |
Fuzzy matching via thefuzz |
Key Patterns
1. OpenAI Responses API (Required)
All agents MUST use the Responses API — NOT Chat Completions.
from openai import OpenAI
client = OpenAI(api_key=settings.openai_api_key)
# Text-only call
resp = client.responses.create(
model=settings.openai_model,
instructions=SYSTEM_PROMPT, # system prompt
input=[{"role": "user", "content": prompt}], # user message
max_output_tokens=4096,
)
result = resp.output_text # response text
# Vision call (with images)
resp = client.responses.create(
model=settings.openai_model,
instructions=SYSTEM_PROMPT,
input=[{"role": "user", "content": [
{"type": "input_image", "image_url": f"data:image/png;base64,{b64}", "detail": "high"},
{"type": "input_text", "text": prompt},
]}],
max_output_tokens=4096,
)
Critical differences from Chat Completions:
instructions=replacesmessages=[{"role": "system", ...}]input=replacesmessages=[{"role": "user", ...}]- Content types:
input_image(notimage_url),input_text(nottext) - Response:
resp.output_text(notresp.choices[0].message.content) temperatureis NOT supported by all models (e.g., gpt-5-mini)
2. Structured Outputs (extract_agent.py)
Use json_schema format with strict: true — NOT json_object.
text_format = {
"format": {
"type": "json_schema",
"name": "extraction",
"strict": True,
"schema": strict_schema, # from _prepare_strict_schema()
}
}
resp = client.responses.create(
model=settings.openai_model,
instructions=SYSTEM_PROMPT,
input=[{"role": "user", "content": user_content}],
text=text_format,
max_output_tokens=4096,
)
Strict mode requirements:
additionalProperties: falseon ALL objects- ALL properties listed in
required - Nullable types:
["string", "null"](not just"string") _sourcesmirror object for provenance tracking
3. Text-Only Fallback
When Poppler is unavailable, digital PDFs use text-only paths:
ingest_node(): catches pdf_to_images failure, setsdocument_images = []- All agents check
if not images:and fall back todigital_textorocr_full_text extract_digital_text()uses--- Page N ---markers per pageparse_agentsplits onre.split(r"--- Page \d+ ---\n?", text)for per-page processing
4. Anti-Hallucination: Abstain Over Guess
The ValidatorAgent implements multi-layer verification:
- HIGH confidence: grounding score >= 90 AND source text found in OCR
- MEDIUM confidence: score >= threshold OR value found in ground truth text
- LOW confidence: score >= 50 — schedules re-extraction
- UNVERIFIED: score < 50 — abstains (returns
null)
Ground truth fallback: ground_truth_text = ocr_full_text or digital_text
5. LangGraph State Flow
# State is a TypedDict — each agent returns a partial dict to merge
def my_agent(state: DocumentState) -> dict:
# Read from state
images = state.get("document_images", [])
# Return updates (merged into state)
return {"my_output": result}
API Endpoints
| Method | Path | Purpose |
|---|---|---|
| POST | /v1/extract |
Full extraction pipeline |
| POST | /v1/parse |
Parse only (markdown + OCR) |
| POST | /v1/schema/suggest |
Auto-generate JSON schema |
| GET | /v1/runs/{run_id} |
Retrieve past run |
Common Commands
# Run server
uvicorn prism.api.app:app --reload --port 8000
# Run playground
streamlit run prism/playground/app.py
# Run tests
pytest tests/
# Quick extraction test
python -c "
from prism.agents.orchestrator import create_extraction_app
import json
schema = {'type': 'object', 'properties': {...}, 'required': [...]}
app = create_extraction_app()
result = app.invoke({'file_path': 'samples/your.pdf', 'json_schema': schema})
print(json.dumps(result.get('final_output', {}), indent=2))
"
Constraints
- Approved libraries: LangGraph, pytesseract, pypdf2, thefuzz, Pillow, pandas, numpy, scipy, OpenAI (Responses API)
- Cannot use: LandingAI ADE, PaddleOCR, EasyOCR, Kraken, PyMuPDF
- GPT-4o Vision for reading/understanding; Tesseract for spatial bounding boxes only
- No
temperature=0— unsupported by some models
File Structure
prism/
agents/
orchestrator.py # LangGraph graph builder + ingest_node
parse_agent.py # GPT-4o Vision → markdown
extract_agent.py # Structured Outputs → JSON
grounding_agent.py # thefuzz → bbox mapping
validator_agent.py # confidence scoring + retry logic
schema_agent.py # auto-generate JSON schema
core/
config.py # Settings dataclass + .env
types.py # Pydantic models
state.py # DocumentState TypedDict
ingestion.py # PDF/image loading + preprocessing
ocr.py # Tesseract OCR
grounding.py # Fuzzy matching engine
api/
app.py # FastAPI endpoints
playground/
app.py # Streamlit UI
Weekly Installs
1
Repository
sivag-lab/roth_mcpGitHub Stars
1
First Seen
6 days ago
Security Audits
Installed on
amp1
cline1
opencode1
cursor1
kimi-cli1
codex1