api-design-ops

SKILL.md

API Design Ops

Comprehensive API design patterns covering REST (advanced), gRPC, and GraphQL. This skill provides decision frameworks, design patterns, and implementation guidance for building production APIs.

API Style Decision Tree

What kind of API do you need?
|
+-- Internal microservice-to-microservice?
|   +-- High throughput, low latency needed? --> gRPC
|   +-- Streaming (real-time data, logs)? --> gRPC (bidirectional streaming)
|   +-- Simple request/response, team comfort? --> REST
|
+-- Public-facing API?
|   +-- Third-party developers consuming it? --> REST (widest compatibility)
|   +-- Mobile app with varied data needs? --> GraphQL
|   +-- Browser-only, simple CRUD? --> REST
|
+-- Frontend for your own app?
|   +-- Multiple clients with different data shapes? --> GraphQL
|   +-- Single client, straightforward data? --> REST
|   +-- Real-time updates needed? --> GraphQL subscriptions or SSE
|
+-- IoT / embedded / constrained devices?
|   +-- Binary efficiency matters? --> gRPC
|   +-- HTTP-only environments? --> REST

Quick Comparison

Concern REST gRPC GraphQL
Transport HTTP/1.1+ HTTP/2 HTTP (any)
Serialization JSON (text) Protobuf (binary) JSON (text)
Schema OpenAPI (optional) .proto (required) SDL (required)
Browser support Native Via gRPC-Web/Connect Native
Caching HTTP caching built-in Custom Custom (normalized)
Learning curve Low Medium Medium-High
Code generation Optional Required Optional but recommended
Streaming SSE, WebSocket Native (4 patterns) Subscriptions
Over-fetching Common problem No (typed) Solved by design
File uploads Multipart native Chunked streaming Multipart spec (awkward)

REST Resource Design Quick Reference

Resource Naming

GET    /users                  # Collection
GET    /users/{id}             # Singleton
GET    /users/{id}/orders      # Sub-collection
POST   /users                  # Create
PUT    /users/{id}             # Full replace
PATCH  /users/{id}             # Partial update
DELETE /users/{id}             # Remove

# Naming rules:
# - Plural nouns for collections: /users NOT /user
# - Kebab-case for multi-word: /line-items NOT /lineItems
# - No verbs in URLs: POST /orders NOT POST /create-order
# - Max 3 levels deep: /users/{id}/orders (not /users/{id}/orders/{oid}/items/{iid}/details)

HTTP Methods and Status Codes

Method Success Empty Invalid Not Found Conflict
GET 200 200 (empty array) 400 404 -
POST 201 + Location - 400/422 - 409
PUT 200 - 400/422 404 409
PATCH 200 - 400/422 404 409
DELETE 204 204 (already gone) 400 404 409

HATEOAS (When Worth It)

Use when: public APIs where discoverability matters, long-lived APIs, APIs that evolve frequently. Skip when: internal microservices, mobile backends, tight coupling is acceptable.

{
  "id": "order-123",
  "status": "shipped",
  "_links": {
    "self": { "href": "/orders/order-123" },
    "track": { "href": "/orders/order-123/tracking" },
    "cancel": { "href": "/orders/order-123", "method": "DELETE" }
  }
}

Pagination Decision Tree

What's your data like?
|
+-- Stable data, UI needs "jump to page 5"?
|   --> Offset pagination: ?page=5&per_page=20
|   Tradeoff: Slow on large offsets (OFFSET 10000), inconsistent with inserts
|
+-- Large dataset, forward-only traversal?
|   --> Cursor pagination: ?after=eyJpZCI6MTIzfQ&limit=20
|   Tradeoff: No random page access, but consistent and fast
|
+-- Real-time feed, ordered by timestamp or ID?
|   --> Keyset pagination: ?created_after=2024-01-01T00:00:00Z&limit=20
|   Tradeoff: Requires a unique, sequential column; no page jumping

Response Envelope

{
  "data": [...],
  "pagination": {
    "total": 1432,
    "limit": 20,
    "has_more": true,
    "next_cursor": "eyJpZCI6MTQzMn0="
  }
}

Error Response Format (RFC 7807)

All APIs should use Problem Details (RFC 7807 / RFC 9457):

{
  "type": "https://api.example.com/errors/insufficient-funds",
  "title": "Insufficient Funds",
  "status": 422,
  "detail": "Account xxxx-1234 has a balance of $10.00, but the transfer requires $25.00.",
  "instance": "/transfers/txn-abc-123",
  "balance": 1000,
  "required": 2500
}

Field Reference

Field Required Description
type Yes URI identifying the error type (stable, documentable)
title Yes Human-readable summary (same for all instances of this type)
status Yes HTTP status code
detail Yes Human-readable explanation specific to this occurrence
instance No URI identifying the specific occurrence
(extensions) No Additional machine-readable fields

Validation Errors

{
  "type": "https://api.example.com/errors/validation",
  "title": "Validation Failed",
  "status": 422,
  "detail": "The request body contains 2 validation errors.",
  "errors": [
    { "field": "email", "message": "Must be a valid email address", "code": "invalid_format" },
    { "field": "age", "message": "Must be at least 18", "code": "out_of_range", "min": 18 }
  ]
}

Versioning Strategies

Strategy Example Pros Cons
URL path /v2/users Obvious, cacheable, easy routing URL pollution, hard to sunset
Accept header Accept: application/vnd.api.v2+json Clean URLs, content negotiation Hidden, harder to test
Query param /users?version=2 Easy to add Pollutes query string, caching issues
Date-based API-Version: 2024-01-15 Granular evolution (Stripe style) Complex implementation

Recommendation

  • Public APIs: URL path versioning (/v1/) - simplicity wins
  • Internal APIs: Header or no versioning (deploy in lockstep)
  • Evolving APIs: Date-based (Stripe model) if you have the engineering investment

Breaking Change Rules

A breaking change is anything that can cause existing clients to fail:

  • Removing a field from a response
  • Renaming a field
  • Changing a field's type
  • Adding a required field to a request
  • Changing URL structure
  • Changing error formats
  • Removing an endpoint

Non-breaking (safe):

  • Adding optional fields to requests
  • Adding fields to responses
  • Adding new endpoints
  • Adding new enum values (if client handles unknown values)

Rate Limiting Design

Algorithms

Algorithm Behavior Use When
Token bucket Allows bursts, refills at steady rate General API rate limiting
Sliding window Smooth distribution, no burst Strict fairness needed
Fixed window Simple, potential burst at boundary Low-stakes limiting
Leaky bucket Constant output rate Queue processing

Response Headers

X-RateLimit-Limit: 1000          # Max requests per window
X-RateLimit-Remaining: 743       # Requests left in current window
X-RateLimit-Reset: 1672531200    # Unix timestamp when window resets
Retry-After: 30                  # Seconds to wait (on 429)

429 Response Body

{
  "type": "https://api.example.com/errors/rate-limit-exceeded",
  "title": "Rate Limit Exceeded",
  "status": 429,
  "detail": "You have exceeded 1000 requests per hour. Try again in 30 seconds.",
  "retry_after": 30
}

Idempotency

Which Methods Need Idempotency Keys?

Method Idempotent by spec? Needs key?
GET Yes No
PUT Yes No (full replacement is naturally idempotent)
DELETE Yes No
PATCH No Recommended for critical operations
POST No Yes (always for payments, orders, transfers)

Implementation

POST /payments
Idempotency-Key: 550e8400-e29b-41d4-a716-446655440000
Content-Type: application/json

{ "amount": 2500, "currency": "usd", "customer": "cust_123" }

Server-side:

  1. Receive request with Idempotency-Key header
  2. Check if key exists in store (Redis, DB)
  3. If exists: return stored response (same status code + body)
  4. If not: process request, store response keyed by idempotency key
  5. Keys expire after 24-48 hours

Authentication Overview

Method Use When Security Level
API Key Server-to-server, internal, simple Low-Medium
JWT (Bearer) Stateless auth, microservices Medium-High
OAuth2 + PKCE Third-party access, user delegation High
mTLS Service mesh, zero-trust infra Very High

Decision Guide

Who is authenticating?
|
+-- Your own frontend? --> JWT (short-lived access + refresh token)
+-- Third-party developer? --> OAuth2 (client credentials for server, PKCE for SPA)
+-- Another internal service? --> mTLS or JWT with service accounts
+-- Quick prototype? --> API key (but plan migration)

Gotchas Table

Gotcha Problem Prevention
Breaking changes in "non-breaking" release Client crashes Additive-only policy, contract tests
N+1 in REST APIs 100 users = 101 queries Compound documents, ?include=, or GraphQL
Over-fetching Mobile gets 50 fields, needs 3 Sparse fieldsets ?fields=id,name or GraphQL
Under-fetching 3 requests to build one view Composite endpoints or BFF pattern
CORS misconfiguration Frontend can't reach API Explicit allowed origins, never * with credentials
Missing Content-Type 415 or silent parsing failure Validate Content-Type on every mutation endpoint
Large payloads without pagination OOM, timeouts Always paginate collections, set max page size
Inconsistent date formats Parsing hell ISO 8601 everywhere: 2024-01-15T10:30:00Z
No request IDs Impossible to debug Generate X-Request-ID, propagate through services
Enum evolution New value breaks old client Document that enums may grow, clients must handle unknown
Missing idempotency Duplicate charges, orders Idempotency keys on all POST endpoints with side effects
Unbounded query complexity GraphQL DoS Depth limiting, cost analysis, persisted queries

Reference Files

File Contents
references/rest-advanced.md Resource modeling, PATCH strategies, caching, webhooks, bulk ops
references/grpc.md Protobuf, service definitions, Go/Rust, streaming, error handling
references/graphql.md Schema design, resolvers, DataLoader, federation, performance
references/api-security.md JWT, OAuth2, CORS, rate limiting, OWASP API Top 10
Weekly Installs
2
GitHub Stars
9
First Seen
5 days ago
Installed on
amp2
cline2
openclaw2
opencode2
cursor2
kimi-cli2