python-services
Installation
SKILL.md
Python Services & CLI
Modern Tooling
| Tool | Replaces | Purpose |
|---|---|---|
| uv | pip, virtualenv, pyenv, pipx | Package/dependency management |
| ruff | flake8, black, isort | Linting + formatting |
| ty | mypy, pyright | Type checking (Astral, faster) |
uv init --package myprojectfor distributable packages,uv initfor appsuv add <pkg>,uv add --group dev <pkg>, never edit pyproject.toml deps manuallyuv run <cmd>instead of activating venvs -- auto-activates the venv without explicit activationuv add --upgrade <pkg>to upgrade a single package without touching othersuv tree --outdatedto preview what would be upgraded before committinguv.lockgoes in version control- Use
[dependency-groups](PEP 735) for dev/test/docs, not[project.optional-dependencies] - PEP 723 inline metadata for standalone scripts with deps
ruff check --fix . && ruff format .for lint+format in one pass
Standard project layout:
src/mypackage/
__init__.py
main.py
services/
models/
tests/
conftest.py
test_main.py
pyproject.toml
See cli-tools.md for Click patterns, argparse, and CLI project layout.
Parallelism
| Workload | Approach |
|---|---|
| Many concurrent I/O calls | asyncio (gather, create_task) |
| CPU-bound computation | multiprocessing.Pool or concurrent.futures.ProcessPoolExecutor |
| Mixed I/O + CPU | asyncio.to_thread() to offload blocking work |
| Simple scripts, few connections | Stay synchronous |
Sync vs Async Decision
Use async (asyncio) when:
- I/O-bound work has multiple concurrent operations (HTTP calls, database queries, file I/O happening in parallel)
- WebSocket servers or long-lived connections require it
- The framework requires it (FastAPI async endpoints, aiohttp)
Stay synchronous when:
- Work is CPU-bound (computation, data transformation) -- async adds nothing, use multiprocessing instead
- Building simple scripts and CLI tools with sequential I/O
- All I/O is sequential anyway (one DB query, process result, one API call)
- The team lacks async debugging experience (asyncio stack traces are harder to read)
Rule of thumb: if the code is not waiting on multiple I/O operations concurrently, sync is simpler and correct. Do not add async complexity for a single sequential pipeline.
Key rule: Stay fully sync or fully async within a call path.
asyncio patterns:
asyncio.gather(*tasks)for concurrent I/O -- usereturn_exceptions=Truefor partial failure toleranceasyncio.TaskGroup(3.11+) for structured concurrency -- automatic cancellation of sibling tasks on failure; prefer overgatherwhen all tasks must succeedasyncio.Semaphore(n)to limit concurrency (rate limiting external APIs)asyncio.wait_for(coro, timeout=N)for timeoutsasyncio.Queuefor producer-consumerasyncio.Lockwhen coroutines share mutable state- Never block the event loop:
asyncio.to_thread(sync_fn)for sync libs,aiohttp/httpx.AsyncClientfor HTTP - Handle
CancelledError-- always re-raise after cleanup - Async generators (
async for) for streaming/pagination
multiprocessing for CPU-bound:
from concurrent.futures import ProcessPoolExecutor
with ProcessPoolExecutor(max_workers=4) as pool:
results = list(pool.map(cpu_task, items))
See fastapi.md for project structure, lifespan, config, DI, async DB, and repository pattern.
Background Jobs
- Return job ID immediately, process async. Client polls
/jobs/{id}for status - Celery:
@app.task(bind=True, max_retries=3, autoretry_for=(ConnectionError,))-- exponential backoff:raise self.retry(countdown=2**self.request.retries * 60) - Alternatives: Dramatiq (modern Celery), RQ (simple Redis), cloud-native (SQS+Lambda, Cloud Tasks)
- Idempotency is mandatory -- tasks may retry. Use idempotency keys for external calls, check-before-write, upsert patterns
- Dead letter queue for permanently failed tasks after max retries
- Task workflows:
chain(a.s(), b.s())for sequential,group(...)for parallel,chord(group, callback)for fan-out/fan-in
Resilience
Retries with tenacity:
from tenacity import retry, stop_after_attempt, wait_exponential_jitter, retry_if_exception_type
@retry(
retry=retry_if_exception_type((ConnectionError, TimeoutError)),
stop=stop_after_attempt(5) | stop_after_delay(60),
wait=wait_exponential_jitter(initial=1, max=30),
before_sleep=log_retry_attempt,
)
def call_api(url: str) -> dict: ...
- Retry only transient errors: network, 429/502/503/504. Never retry 4xx (except 429), auth errors, validation errors
- Every network call needs a timeout
@fail_safe(default=[])decorator for non-critical paths -- return cached/default on failurefunctools.lru_cache(maxsize=N)for pure-function memoization;functools.cache(unbounded) for small domains- Stack decorators:
@traced @with_timeout(30) @retry(...)-- separate infra from business logic
Connection pooling is mandatory for production: reuse httpx.AsyncClient() across requests, configure SQLAlchemy pool_size/max_overflow, use aiohttp.TCPConnector(limit=N).
Production Resilience
- Fail-fast config validation: use a Pydantic
BaseSettingsmodel withmodel_validatorto parse and validate all environment variables at startup. If invalid, crash before serving traffic. Never discover a missing secret on the first request that needs it. - Health endpoints: expose
/health(shallow liveness -- returns 200 if the process responds) and/ready(deep readiness -- verifies database, Redis, and critical dependencies are reachable). Load balancers route traffic based on/ready; orchestrators restart based on/health.
Observability
- structlog for JSON structured logging. Configure once at startup with
JSONRenderer,TimeStamper,merge_contextvars - Correlation IDs -- generate at ingress (
X-Correlation-IDheader), bind tocontextvars, propagate to downstream calls - Log levels: DEBUG=diagnostics, INFO=operations, WARNING=anomalies handled, ERROR=failures needing attention. Never log expected behavior at ERROR
- Prometheus metrics -- track latency (Histogram), traffic (Counter), errors (Counter), saturation (Gauge). Keep label cardinality bounded (no user IDs)
- OpenTelemetry for distributed tracing across services
Discipline
- Simplicity first -- every change as simple as possible, impact minimal code
- Only touch what's necessary -- avoid introducing unrelated changes
- No hacky workarounds -- if a fix feels wrong, step back and implement the clean solution
- Before adding a new abstraction, verify it appears in 3+ places. If not, inline it.
- Verify: see Verify section below -- pass all checks with zero warnings before declaring done
- Coverage target: 80%+ (
uv run pytest --cov --cov-report=html)
Testing Patterns
- pytest flags:
--lf(last failed),-x(stop on first failure),-k "pattern"(filter),--pdb(debugger on failure) - Fixtures: use
conftest.pyfor shared fixtures. Scope wisely:@pytest.fixture(scope="session")for expensive setup (DB connections),scope="function"(default) for test isolation tmp_path: built-in fixture for temp files -- no manual cleanup needed- Parametrize with IDs:
@pytest.mark.parametrize("input,expected", [...], ids=["empty", "single", "overflow"])for readable test names - Mock discipline: always
autospec=Trueon mocks to catch API drift.assert_awaited_once()for async mocks. - Test markers: register in
pyproject.tomlunder[tool.pytest.ini_options]withmarkers = ["slow", "integration"]. Run fast tests with-m "not slow". - Protocol duck typing: use
class Renderable(Protocol)for structural typing at service boundaries -- enables testing with plain objects instead of mocks - Context managers:
@contextmanagerfor connection/transaction lifecycle. Always implement__exit__cleanup.
Error Handling
- Validate inputs at boundaries before expensive ops. Report all errors at once when possible
- Use specific exceptions:
ValueError,TypeError,KeyError, not bareException raise ServiceError("upload failed") from e-- always chain to preserve debug trail- Convert external data to domain types (enums, Pydantic models) at system boundaries
- Batch processing:
BatchResult(succeeded={}, failed={})-- don't let one item abort the batch - Pydantic
BaseModelwithfield_validatorfor complex input validation
Migrations
- Separate schema and data migrations -- data backfills in their own migration file
- Renames/removals use expand-contract: add new column → backfill → switch reads → drop old (see
postgresqlskill for the full pattern) - Never edit a migration that has already run in a shared environment
- Alembic: use
--autogenerateas a starting point, always review generated SQL before committing - Test migrations against production-sized data -- a migration that takes 2ms on dev can lock a table for minutes in production
API Design
- Contract-first: define Pydantic
BaseModelrequest/response schemas and FastAPIresponse_modelbefore writing endpoint logic. The schema is the contract -- implementation follows. Generate OpenAPI docs from these models automatically. - Hyrum's Law awareness: every observable response field, ordering, or timing becomes a dependency for callers. Use explicit
response_modelandmodel_config = ConfigDict(extra="forbid")to control exactly what's serialized -- never return raw dicts or ORM objects from endpoints. - Addition over modification: add new optional fields (
field: str | None = None) rather than changing or removing existing ones. Removing a Pydantic field from a response model breaks callers silently. Deprecate first (Field(deprecated=True)), remove in a later version. - Consistent error structure: all exceptions should produce the same envelope:
{"error": {"code": "...", "message": "...", "details": ...}}. Register@app.exception_handlerforRequestValidationError,HTTPException, and application-specific exceptions to normalize into one format. Callers build error handling once. - Boundary validation via Pydantic: validate at the endpoint/handler level with Pydantic models and FastAPI's automatic request parsing. Internal services and repositories trust that input was validated at entry -- no redundant validation scattered through business logic.
- Third-party responses are untrusted data: validate shape and content of external API responses before using them in logic, rendering, or decision-making. A compromised or misbehaving service can return unexpected types, malicious content, or missing fields. Parse through a Pydantic model before use.
Verify
uv run pytestpasses with zero failuresuv run ruff check .passes with zero warnings- Coverage target: 80%+ (
uv run pytest --cov)
Related skills