data-validation

SKILL.md

Data Validation - Input Sanitisation & Schema Patterns

Validation patterns ensuring all data entering the system is validated at boundaries: user input via Zod (frontend), API requests via Pydantic (backend). No unvalidated data crosses a trust boundary.

Description

Defines Zod and Pydantic validation patterns for all data entering the system at trust boundaries. Covers form validation, API request schemas, type-safe contracts, Australian-specific validators (ABN, phone, postcode), and schema composition strategies.

When to Apply

Positive Triggers

  • Creating or modifying form inputs with user data
  • Defining API request/response schemas (Pydantic models)
  • Adding Zod schemas for frontend validation
  • Reviewing code for missing input validation
  • Building new API endpoints that accept POST/PUT/PATCH data
  • User mentions: "validation", "Zod", "Pydantic", "schema", "sanitise", "input"

Negative Triggers

  • Implementing authentication logic (use auth patterns directly)
  • Designing error response formats (use error-taxonomy instead)
  • Working on database model definitions (that is ORM schema, not input validation)

Core Directives

Validate at Boundaries, Trust Internally

[User Input] --Zod--> [Frontend] --fetch--> [API] --Pydantic--> [Service Layer]
     ^                                         ^
     |                                         |
  VALIDATE HERE                          VALIDATE HERE
  • Frontend boundary: Every form field validated with Zod before submission
  • API boundary: Every request body validated with Pydantic before processing
  • Internal code: Trust validated data — no redundant re-validation inside services

Naming Conventions

Layer Convention Example
Frontend Zod schema camelCase + Schema suffix loginFormSchema
Frontend inferred type PascalCase via z.infer type LoginForm = z.infer<typeof loginFormSchema>
Backend Pydantic model PascalCase + purpose suffix DocumentCreateRequest
Shared contract Same field names both sides email, password, title

Frontend Patterns (Zod + react-hook-form)

Basic Form Schema

The project already uses this pattern in apps/web/components/auth/login-form.tsx:

import * as z from 'zod';
import { zodResolver } from '@hookform/resolvers/zod';
import { useForm } from 'react-hook-form';

// 1. Define schema
const loginFormSchema = z.object({
  email: z.string().email('Please enter a valid email address'),
  password: z.string().min(6, 'Password must be at least 6 characters'),
});

// 2. Infer type (never define manually)
type LoginForm = z.infer<typeof loginFormSchema>;

// 3. Use with react-hook-form
const form = useForm<LoginForm>({
  resolver: zodResolver(loginFormSchema),
  defaultValues: { email: '', password: '' },
});

Schema Composition

Build complex schemas from reusable parts:

// Base schemas (reusable)
const emailSchema = z.string().email('Please enter a valid email address');
const passwordSchema = z.string().min(6, 'Password must be at least 6 characters');
const abnSchema = z.string().regex(/^\d{11}$/, 'ABN must be 11 digits');

// Composed schemas
const registerFormSchema = z
  .object({
    email: emailSchema,
    password: passwordSchema,
    confirmPassword: z.string(),
  })
  .refine((data) => data.password === data.confirmPassword, {
    message: 'Passwords do not match',
    path: ['confirmPassword'],
  });

API Request Validation

Validate data before sending to the backend:

const documentCreateSchema = z.object({
  title: z.string().min(1, 'Title is required').max(255),
  content: z.string().min(1, 'Content is required'),
  metadata: z.record(z.unknown()).optional(),
});

async function createDocument(input: unknown): Promise<Document> {
  // Validate before sending — fail fast on the client
  const validated = documentCreateSchema.parse(input);
  return apiClient.post('/api/documents', validated);
}

Australian-Specific Validators

// Australian Business Number (ABN) with checksum
const abnSchema = z.string().refine(
  (val) => {
    if (!/^\d{11}$/.test(val)) return false;
    const weights = [10, 1, 3, 5, 7, 9, 11, 13, 15, 17, 19];
    const digits = val.split('').map(Number);
    digits[0] -= 1;
    const sum = digits.reduce((acc, d, i) => acc + d * weights[i], 0);
    return sum % 89 === 0;
  },
  { message: 'Invalid ABN' }
);

// Australian phone number
const auPhoneSchema = z.string().regex(
  /^(\+61|0)[2-478]\d{8}$/,
  'Please enter a valid Australian phone number'
);

// Australian postcode
const postcodeSchema = z.string().regex(/^\d{4}$/, 'Postcode must be 4 digits');

// Date in DD/MM/YYYY format
const auDateSchema = z.string().regex(
  /^\d{2}\/\d{2}\/\d{4}$/,
  'Date must be in DD/MM/YYYY format'
);

Backend Patterns (Pydantic)

Request Model Convention

The project defines request models in route files or apps/backend/src/api/schemas/:

from pydantic import BaseModel, Field, field_validator
from typing import Optional


class DocumentCreateRequest(BaseModel):
    """Validated input for document creation."""

    title: str = Field(
        ...,
        min_length=1,
        max_length=255,
        description="Document title"
    )
    content: str = Field(
        ...,
        min_length=1,
        description="Document content"
    )
    metadata: Optional[dict] = Field(
        None,
        description="Additional metadata"
    )

    @field_validator("title")
    @classmethod
    def strip_title(cls, v: str) -> str:
        return v.strip()

Constrained Types

Use Pydantic's built-in constraints instead of custom validators where possible:

from pydantic import BaseModel, Field, EmailStr
from typing import Annotated


# Constrained types (prefer over custom validators)
PositiveInt = Annotated[int, Field(gt=0)]
PageSize = Annotated[int, Field(ge=1, le=100)]
ShortString = Annotated[str, Field(min_length=1, max_length=255)]


class PaginationParams(BaseModel):
    """Reusable pagination parameters."""

    page: PositiveInt = 1
    page_size: PageSize = 20


class SearchRequest(BaseModel):
    """Validated search input."""

    query: ShortString
    pagination: PaginationParams = PaginationParams()

Route Integration

from fastapi import APIRouter

router = APIRouter(prefix="/api/documents", tags=["documents"])


@router.post("/", status_code=201)
async def create_document(
    request: DocumentCreateRequest,  # Pydantic validates automatically
    user: User = Depends(get_current_user),
) -> DocumentResponse:
    """Create a new document. Input is validated by Pydantic."""
    # request is already validated — trust it here
    return await document_service.create(request, user)

Schema Location Convention

Schema Type Location When to Use
Route-specific Inline in route file Simple schemas used by one endpoint
Shared across routes apps/backend/src/api/schemas/ Schemas used by multiple endpoints
Domain models apps/backend/src/models/ Core business models (not request schemas)

Validation Checklist

When adding a new feature, verify:

  • Every form has a Zod schema with zodResolver
  • Form types are inferred via z.infer, not manually defined
  • Every POST/PUT/PATCH endpoint has a Pydantic request model
  • Error messages are user-friendly, not technical
  • Australian formats validated where applicable (ABN, phone, postcode, date)
  • Schema field names match between frontend Zod and backend Pydantic

Anti-Patterns

Pattern Problem Correct Approach
Validation logic inside route handlers Mixes concerns, hard to reuse or test Define Zod/Pydantic schemas separately and reference them
No error messages on validation failure Users see raw Pydantic/Zod errors Provide human-readable messages in en-AU on every field
Type coercion without validation (Number(input)) NaN and unexpected values slip through Use Zod .coerce.number() or Pydantic field_validator
Optional fields without defaults Downstream code must null-check everywhere Set sensible defaults via Field(default=...) or .default()
Manually defining types instead of inferring Types drift out of sync with schemas Use z.infer<typeof schema> and Pydantic model exports

Checklist

  • Zod schemas defined for all frontend form and API inputs
  • Pydantic models defined for all backend request bodies
  • Error messages written in en-AU (e.g., "Please enter a valid email address")
  • Validation occurs at system boundaries (frontend forms, API entry points)
  • Schema field names match between frontend Zod and backend Pydantic

Response Format

[AGENT_ACTIVATED]: Data Validation
[PHASE]: {Schema Design | Implementation | Review}
[STATUS]: {in_progress | complete}

{validation analysis or schema output}

[NEXT_ACTION]: {what to do next}

Integration Points

Error Taxonomy

Validation failures should use DATA_VALIDATION_* error codes from the error-taxonomy skill:

# Backend: structured validation error
raise HTTPException(
    status_code=422,
    detail={
        "detail": "Email format is invalid",
        "error_code": "DATA_VALIDATION_INVALID_FORMAT",
        "field": "email",
    },
)

Council of Logic (Shannon Check)

  • Schemas should be minimal — only validate what the endpoint actually uses
  • Avoid duplicating the same schema across files — compose from base schemas
  • Prefer built-in constraints (min_length, gt) over custom validators

API Contract

When api-contract skill is installed, Zod and Pydantic schemas should mirror each other to form a typed contract between frontend and backend.

Australian Localisation (en-AU)

  • Date Format: DD/MM/YYYY (not MM/DD/YYYY)
  • Phone: +61 or 0X XXXX XXXX
  • ABN: 11-digit with checksum validation
  • Postcode: 4 digits (0200–9999)
  • Currency: AUD ($) — validate as positive decimal
  • Spelling: sanitisation, organisation, analyse, centre, colour
Weekly Installs
3
GitHub Stars
1
First Seen
Feb 28, 2026
Installed on
gemini-cli3
opencode3
codebuddy3
github-copilot3
codex3
kimi-cli3