architect-python-uv-batch
Architect: Python + uv Batch
Use this skill for non-API Python projects that run as CLI or scheduled jobs, especially data/PDF processing pipelines.
Docker is required by default for this runnable base architect skill. Only allow NO_DOCKER=yes when the user explicitly asks for a local-only exception.
Inputs
Collect these values:
PROJECT_NAME: kebab-case repo/folder name.MODULE_NAME: snake_case import package.PYTHON_VERSION: default3.12.USE_RAG:yesorno.EMBED_PROVIDER:openaiorsentence-transformers(default localsentence-transformers).NO_DOCKER: defaultno. Set toyesonly when user explicitly opts out of containerization.
Preflight Checks
Run before scaffolding:
command -v uv >/dev/null && uv --version || echo "uv-missing"
python3 --version
command -v docker >/dev/null && docker --version || echo "docker-missing"
Execution modes:
production-default:uv+dockeravailable; emit and validate container artifacts.local-no-docker: user explicitly setsNO_DOCKER=yes.offline-smoke:uvmissing and/or no network; scaffold file structure and stdlib smoke path, then record constraints inTEST_NOTES.md.
Production-default contract:
- Must create
Dockerfile,.dockerignore, anddocker-compose.yml. - Must include CI image build check.
- Must run at least one containerized smoke command in validation.
Scaffold Workflow
- Initialize project (
production-defaultorlocal-no-docker):
uv init --package {{PROJECT_NAME}}
cd {{PROJECT_NAME}}
If offline-smoke mode is required, create equivalent structure manually and keep commands runnable via:
PYTHONPATH=src python3 -m {{MODULE_NAME}}.cli ingest-pdf
python3 -m unittest discover -s tests -v
- Add core dependencies:
uv add pydantic-settings typer rich pypdf pandas orjson
uv add -d pytest ruff mypy
- If
USE_RAG=yes, add RAG dependencies:
uv add langchain-text-splitters chromadb
- For local embeddings:
uv add sentence-transformers
- For OpenAI embeddings:
uv add openai
- Create structure:
src/{{MODULE_NAME}}/
cli.py
core/config.py
pipelines/pdf_ingest.py
pipelines/normalize.py
rag/chunk.py
rag/embed.py
rag/store.py
tests/
test_ingest.py
data/inbox/
data/processed/
If NO_DOCKER=no, also create:
Dockerfile.dockerignoredocker-compose.yml
- Wire command entrypoint in
pyproject.toml:
[project.scripts]
{{PROJECT_NAME}} = "{{MODULE_NAME}}.cli:main"
Required Defaults
src/{{MODULE_NAME}}/core/config.py
from pydantic_settings import BaseSettings, SettingsConfigDict
class Settings(BaseSettings):
model_config = SettingsConfigDict(env_file=".env", env_file_encoding="utf-8")
environment: str = "development"
inbox_dir: str = "data/inbox"
processed_dir: str = "data/processed"
embed_provider: str = "sentence-transformers"
embed_model: str = "sentence-transformers/all-MiniLM-L6-v2"
openai_api_key: str | None = None
chroma_path: str = ".chroma"
settings = Settings()
src/{{MODULE_NAME}}/cli.py
import typer
from {{MODULE_NAME}}.pipelines.pdf_ingest import run_pdf_ingest
app = typer.Typer(no_args_is_help=True)
@app.command("ingest-pdf")
def ingest_pdf() -> None:
run_pdf_ingest()
def main() -> None:
app()
if __name__ == "__main__":
main()
src/{{MODULE_NAME}}/pipelines/pdf_ingest.py
import hashlib
from pathlib import Path
from pypdf import PdfReader
from {{MODULE_NAME}}.core.config import settings
def run_pdf_ingest() -> None:
inbox = Path(settings.inbox_dir)
out = Path(settings.processed_dir)
out.mkdir(parents=True, exist_ok=True)
for pdf_path in inbox.glob("*.pdf"):
text = "\n".join(page.extract_text() or "" for page in PdfReader(str(pdf_path)).pages)
suffix = hashlib.sha1(str(pdf_path).encode("utf-8")).hexdigest()[:8]
target = out / f"{pdf_path.name}.{suffix}.txt"
target.write_text(text, encoding="utf-8")
tests/test_ingest.py (import-safe for unittest discover)
from pathlib import Path
import sys
ROOT = Path(__file__).resolve().parents[1]
SRC = ROOT / "src"
if str(SRC) not in sys.path:
sys.path.insert(0, str(SRC))
from {{MODULE_NAME}}.pipelines.pdf_ingest import run_pdf_ingest
Dockerfile (NO_DOCKER=no)
FROM ghcr.io/astral-sh/uv:python3.12-bookworm-slim AS build
WORKDIR /app
COPY pyproject.toml uv.lock ./
COPY src ./src
RUN uv sync --frozen --no-dev
COPY data ./data
FROM python:3.12-slim AS run
WORKDIR /workspace
ENV PYTHONDONTWRITEBYTECODE=1
ENV PYTHONUNBUFFERED=1
RUN useradd --create-home --shell /bin/bash app
COPY /app/.venv /app/.venv
COPY /app /workspace
ENV PATH="/app/.venv/bin:$PATH"
ENV PYTHONPATH="/workspace/src"
RUN chown -R app:app /workspace
USER app
CMD ["python", "-m", "{{MODULE_NAME}}.cli", "ingest-pdf"]
.dockerignore (NO_DOCKER=no)
.git
.venv
__pycache__
.pytest_cache
.mypy_cache
.ruff_cache
data/processed
docker-compose.yml (NO_DOCKER=no)
services:
app:
build: .
environment:
INBOX_DIR: /workspace/data/inbox
PROCESSED_DIR: /workspace/data/processed
volumes:
- ./data:/workspace/data
command: ["python", "-m", "{{MODULE_NAME}}.cli", "ingest-pdf"]
Do not add a ports: block by default for batch workers. They are one-shot or background jobs, not HTTP services; use volumes and explicit commands instead.
CI + Quality
Create .github/workflows/ci.yml:
name: ci
on:
push:
pull_request:
jobs:
quality:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.12"
- uses: astral-sh/setup-uv@v5
- run: uv sync --frozen --dev
- run: uv run ruff check .
- run: uv run ruff check . --select D
- run: uv run mypy src
- run: uv run pytest -q
- uses: docker/setup-buildx-action@v3
- uses: docker/build-push-action@v6
with:
context: .
push: false
tags: {{PROJECT_NAME}}:ci
cache-from: type=gha
cache-to: type=gha,mode=max
Guardrails
-
Documentation contract for generated code:
- Python: write module docstrings and docstrings for public classes, methods, and functions.
- Next.js/TypeScript: write JSDoc for exported components, hooks, utilities, and route handlers.
- Add concise rationale comments only for non-obvious logic, invariants, or safety constraints.
- Apply this contract even when using template snippets below; expand templates as needed.
-
Keep batch jobs idempotent; output paths must be safe to re-run.
-
Make output names collision-safe across formats and paths (include extension and a stable suffix/hash).
-
Never commit secrets; use
.envand environment variables. -
Keep processing logic pure where possible, IO at pipeline edges.
-
If RAG is optional, keep
rag/*modules decoupled from non-RAG pipelines. -
Treat
NO_DOCKER=yesas an explicit exception, not a default path. -
Ensure tests can run via plain
python3 -m unittest discover -s tests -vwithout requiringPYTHONPATHfor imports. -
Ensure
uv.lockis committed before Docker build; the Dockerfile copies it explicitly for deterministicuv sync --frozeninstalls. -
Ensure any runtime output directories under
/workspaceare writable by the non-root app user.
Validation Checklist
- Confirm generated code includes required docstrings/JSDoc and rationale comments for non-obvious logic.
uv run ruff check .
uv run ruff check . --select D
uv run mypy src
uv run pytest -q
uv run {{PROJECT_NAME}} ingest-pdf
test -f uv.lock
docker build -t {{PROJECT_NAME}}:local .
docker compose run --rm app
local-no-docker (NO_DOCKER=yes):
uv run ruff check .
uv run ruff check . --select D
uv run mypy src
uv run pytest -q
uv run {{PROJECT_NAME}} ingest-pdf
Fallback (offline-smoke):
PYTHONPATH=src python3 -m {{MODULE_NAME}}.cli ingest-pdf
python3 -m unittest discover -s tests -v
Decision Justification Rule
- Every non-trivial decision must include a concrete justification.
- Capture the alternatives considered and why they were rejected.
- State tradeoffs and residual risks for the chosen option.
- If justification is missing, treat the task as incomplete and surface it as a blocker.