data-validation
Data Validation - Input Sanitisation & Schema Patterns
Validation patterns ensuring all data entering the system is validated at boundaries: user input via Zod (frontend), API requests via Pydantic (backend). No unvalidated data crosses a trust boundary.
Description
Defines Zod and Pydantic validation patterns for all data entering the system at trust boundaries. Covers form validation, API request schemas, type-safe contracts, Australian-specific validators (ABN, phone, postcode), and schema composition strategies.
When to Apply
Positive Triggers
- Creating or modifying form inputs with user data
- Defining API request/response schemas (Pydantic models)
- Adding Zod schemas for frontend validation
- Reviewing code for missing input validation
- Building new API endpoints that accept POST/PUT/PATCH data
- User mentions: "validation", "Zod", "Pydantic", "schema", "sanitise", "input"
Negative Triggers
- Implementing authentication logic (use auth patterns directly)
- Designing error response formats (use
error-taxonomyinstead) - Working on database model definitions (that is ORM schema, not input validation)
Core Directives
Validate at Boundaries, Trust Internally
[User Input] --Zod--> [Frontend] --fetch--> [API] --Pydantic--> [Service Layer]
^ ^
| |
VALIDATE HERE VALIDATE HERE
- Frontend boundary: Every form field validated with Zod before submission
- API boundary: Every request body validated with Pydantic before processing
- Internal code: Trust validated data — no redundant re-validation inside services
Naming Conventions
| Layer | Convention | Example |
|---|---|---|
| Frontend Zod schema | camelCase + Schema suffix |
loginFormSchema |
| Frontend inferred type | PascalCase via z.infer |
type LoginForm = z.infer<typeof loginFormSchema> |
| Backend Pydantic model | PascalCase + purpose suffix |
DocumentCreateRequest |
| Shared contract | Same field names both sides | email, password, title |
Frontend Patterns (Zod + react-hook-form)
Basic Form Schema
The project already uses this pattern in apps/web/components/auth/login-form.tsx:
import * as z from 'zod';
import { zodResolver } from '@hookform/resolvers/zod';
import { useForm } from 'react-hook-form';
// 1. Define schema
const loginFormSchema = z.object({
email: z.string().email('Please enter a valid email address'),
password: z.string().min(6, 'Password must be at least 6 characters'),
});
// 2. Infer type (never define manually)
type LoginForm = z.infer<typeof loginFormSchema>;
// 3. Use with react-hook-form
const form = useForm<LoginForm>({
resolver: zodResolver(loginFormSchema),
defaultValues: { email: '', password: '' },
});
Schema Composition
Build complex schemas from reusable parts:
// Base schemas (reusable)
const emailSchema = z.string().email('Please enter a valid email address');
const passwordSchema = z.string().min(6, 'Password must be at least 6 characters');
const abnSchema = z.string().regex(/^\d{11}$/, 'ABN must be 11 digits');
// Composed schemas
const registerFormSchema = z
.object({
email: emailSchema,
password: passwordSchema,
confirmPassword: z.string(),
})
.refine((data) => data.password === data.confirmPassword, {
message: 'Passwords do not match',
path: ['confirmPassword'],
});
API Request Validation
Validate data before sending to the backend:
const documentCreateSchema = z.object({
title: z.string().min(1, 'Title is required').max(255),
content: z.string().min(1, 'Content is required'),
metadata: z.record(z.unknown()).optional(),
});
async function createDocument(input: unknown): Promise<Document> {
// Validate before sending — fail fast on the client
const validated = documentCreateSchema.parse(input);
return apiClient.post('/api/documents', validated);
}
Australian-Specific Validators
// Australian Business Number (ABN) with checksum
const abnSchema = z.string().refine(
(val) => {
if (!/^\d{11}$/.test(val)) return false;
const weights = [10, 1, 3, 5, 7, 9, 11, 13, 15, 17, 19];
const digits = val.split('').map(Number);
digits[0] -= 1;
const sum = digits.reduce((acc, d, i) => acc + d * weights[i], 0);
return sum % 89 === 0;
},
{ message: 'Invalid ABN' }
);
// Australian phone number
const auPhoneSchema = z.string().regex(
/^(\+61|0)[2-478]\d{8}$/,
'Please enter a valid Australian phone number'
);
// Australian postcode
const postcodeSchema = z.string().regex(/^\d{4}$/, 'Postcode must be 4 digits');
// Date in DD/MM/YYYY format
const auDateSchema = z.string().regex(
/^\d{2}\/\d{2}\/\d{4}$/,
'Date must be in DD/MM/YYYY format'
);
Backend Patterns (Pydantic)
Request Model Convention
The project defines request models in route files or apps/backend/src/api/schemas/:
from pydantic import BaseModel, Field, field_validator
from typing import Optional
class DocumentCreateRequest(BaseModel):
"""Validated input for document creation."""
title: str = Field(
...,
min_length=1,
max_length=255,
description="Document title"
)
content: str = Field(
...,
min_length=1,
description="Document content"
)
metadata: Optional[dict] = Field(
None,
description="Additional metadata"
)
@field_validator("title")
@classmethod
def strip_title(cls, v: str) -> str:
return v.strip()
Constrained Types
Use Pydantic's built-in constraints instead of custom validators where possible:
from pydantic import BaseModel, Field, EmailStr
from typing import Annotated
# Constrained types (prefer over custom validators)
PositiveInt = Annotated[int, Field(gt=0)]
PageSize = Annotated[int, Field(ge=1, le=100)]
ShortString = Annotated[str, Field(min_length=1, max_length=255)]
class PaginationParams(BaseModel):
"""Reusable pagination parameters."""
page: PositiveInt = 1
page_size: PageSize = 20
class SearchRequest(BaseModel):
"""Validated search input."""
query: ShortString
pagination: PaginationParams = PaginationParams()
Route Integration
from fastapi import APIRouter
router = APIRouter(prefix="/api/documents", tags=["documents"])
@router.post("/", status_code=201)
async def create_document(
request: DocumentCreateRequest, # Pydantic validates automatically
user: User = Depends(get_current_user),
) -> DocumentResponse:
"""Create a new document. Input is validated by Pydantic."""
# request is already validated — trust it here
return await document_service.create(request, user)
Schema Location Convention
| Schema Type | Location | When to Use |
|---|---|---|
| Route-specific | Inline in route file | Simple schemas used by one endpoint |
| Shared across routes | apps/backend/src/api/schemas/ |
Schemas used by multiple endpoints |
| Domain models | apps/backend/src/models/ |
Core business models (not request schemas) |
Validation Checklist
When adding a new feature, verify:
- Every form has a Zod schema with
zodResolver - Form types are inferred via
z.infer, not manually defined - Every POST/PUT/PATCH endpoint has a Pydantic request model
- Error messages are user-friendly, not technical
- Australian formats validated where applicable (ABN, phone, postcode, date)
- Schema field names match between frontend Zod and backend Pydantic
Anti-Patterns
| Pattern | Problem | Correct Approach |
|---|---|---|
| Validation logic inside route handlers | Mixes concerns, hard to reuse or test | Define Zod/Pydantic schemas separately and reference them |
| No error messages on validation failure | Users see raw Pydantic/Zod errors | Provide human-readable messages in en-AU on every field |
Type coercion without validation (Number(input)) |
NaN and unexpected values slip through |
Use Zod .coerce.number() or Pydantic field_validator |
| Optional fields without defaults | Downstream code must null-check everywhere | Set sensible defaults via Field(default=...) or .default() |
| Manually defining types instead of inferring | Types drift out of sync with schemas | Use z.infer<typeof schema> and Pydantic model exports |
Checklist
- Zod schemas defined for all frontend form and API inputs
- Pydantic models defined for all backend request bodies
- Error messages written in en-AU (e.g., "Please enter a valid email address")
- Validation occurs at system boundaries (frontend forms, API entry points)
- Schema field names match between frontend Zod and backend Pydantic
Response Format
[AGENT_ACTIVATED]: Data Validation
[PHASE]: {Schema Design | Implementation | Review}
[STATUS]: {in_progress | complete}
{validation analysis or schema output}
[NEXT_ACTION]: {what to do next}
Integration Points
Error Taxonomy
Validation failures should use DATA_VALIDATION_* error codes from the error-taxonomy skill:
# Backend: structured validation error
raise HTTPException(
status_code=422,
detail={
"detail": "Email format is invalid",
"error_code": "DATA_VALIDATION_INVALID_FORMAT",
"field": "email",
},
)
Council of Logic (Shannon Check)
- Schemas should be minimal — only validate what the endpoint actually uses
- Avoid duplicating the same schema across files — compose from base schemas
- Prefer built-in constraints (
min_length,gt) over custom validators
API Contract
When api-contract skill is installed, Zod and Pydantic schemas should mirror each other to form a typed contract between frontend and backend.
Australian Localisation (en-AU)
- Date Format: DD/MM/YYYY (not MM/DD/YYYY)
- Phone:
+61or0X XXXX XXXX - ABN: 11-digit with checksum validation
- Postcode: 4 digits (0200–9999)
- Currency: AUD ($) — validate as positive decimal
- Spelling: sanitisation, organisation, analyse, centre, colour