architect-employment-checker-system
SKILL.md
Architect: Employment Checker System (Structured Document Analysis)
Use this skill to scaffold the full multi-service system described in the Employment Checker System Summary:
- deterministic, page-aware PDF preprocessing
- structured clause extraction + Range/rule validation
- auditable outputs with exact page/chunk/span citations
- Postgres as system-of-record; object storage for artifacts
- background workers for all heavy document processing
This is not a semantic-search-first RAG architecture. Vector search is optional and should be excluded unless explicitly requested.
Compatibility
- Pair with
architect-python-uv-fastapi-sqlalchemyfor the Python API base. - Pair with
architect-nextjs-bun-apporarchitect-nextjs-vercel-appfor the Next.js UI base. - Strongly recommended add-ons:
addon-decision-justification-ledgeraddon-deterministic-eval-suiteaddon-human-pr-review-gateaddon-observability-python-structlogaddon-observability-nextjs-pino
Inputs
Collect:
PROJECT_NAME: repo/folder name (kebab-case).PY_MODULE: Python import module for the API (snake_case).WEB_APP: Next.js app folder name (defaultweb).API_SERVICE: API folder name (defaultapi).PYTHON_VERSION: default3.12.QUEUE_IMPL:celery(required for this architecture).S3_BUCKET: defaultdocuments.
Default local dev secrets (override if user provides):
MINIO_ROOT_USER=minioMINIO_ROOT_PASSWORD=minio123
Target Repo Layout (Monorepo)
Prefer a monorepo to keep compose wiring, API contracts, and review UI aligned:
apps/{{WEB_APP}}/ # Next.js
services/{{API_SERVICE}}/ # FastAPI + shared worker code
docker-compose.yml
If the user explicitly wants separate repos, keep the same service boundaries and contracts.
Service Contract (Non-Negotiables)
POST /documents/upload:- accepts a PDF upload
- stores the original PDF in S3/MinIO
- creates
documentsandjobsrows - enqueues preprocess/extract/validate via the queue
- returns
202 Acceptedwithjob_id(+document_idif available)
- No heavy PDF work inside request handlers.
- Every derived artifact must be traceable back to the original PDF via:
document_idpage_numberchunk_idandchunk_index- span markers (
char_start/char_end) or equivalent
Compose Baseline (Local Dev)
Include these services:
postgresredisminioapi(FastAPI)worker(Celery/RQ)frontend(Next.js)
Also include a bucket init step (one-shot container or API startup hook) to ensure S3_BUCKET exists.
Orchestration Workflow (High Level)
- Scaffold base API (
architect-python-uv-fastapi-sqlalchemy). - Add object storage integration (
addon-object-storage-minio-s3). - Add async jobs + worker (
addon-async-jobs-celery-redis). - Add Postgres schema for the pipeline (
addon-postgres-document-pipeline-schema). - Add preprocessing + page artifacts (
addon-pdf-preprocess-page-artifacts). - Add header/footer cleanup (
addon-header-footer-cleanup). - Add JSONL chunk artifacts (
addon-jsonl-chunking-citations). - Add clause extraction (
addon-clause-extraction-citations). - Add Range/rule validation (
addon-range-rules-validation). - Add report synthesis (
addon-report-synthesis-audit). - Scaffold UI surfaces (
ui-employment-checker-console).
Guardrails
- Store both
rawandcleanpage content. Never destructively overwrite the raw artifact. - Store canonical artifacts (PDF, markdown, JSONL, reports) in S3/MinIO; store queryable rows in Postgres.
- Make all worker stages idempotent by
document_id+stage(safe re-runs). - Persist job state transitions and error payloads for auditability.
- Prefer deterministic parsing and explicit page boundaries; avoid regex-only page splitting when possible.
Decision Justification Rule
- Every non-trivial decision must include a concrete justification.
- Capture the alternatives considered and why they were rejected.
- State tradeoffs and residual risks for the chosen option.
- If justification is missing, treat the task as incomplete and surface it as a blocker.
Weekly Installs
1
Repository
ajrlewis/ai-skillsFirst Seen
4 days ago
Security Audits
Installed on
amp1
cline1
opencode1
cursor1
kimi-cli1
codex1