ai-web-apps
AI-Enhanced Web Apps
Acknowledgement: Shared by Peter Bamuhigire, techguypeter.com, +256 784 464178.
Use When
- Designing or building an AI-enhanced web app on Next.js + Vercel AI SDK with streaming chat, tool calling, RAG, multi-modal input, or MCP-exposed tools.
- Wiring an AI feature into a multi-tenant SaaS and needing the module gate, token ledger, and provider abstraction for that feature.
- Reviewing an existing AI web-app route for missing budget, quota, guardrail, or fallback controls.
Do Not Use When
- The AI feature is purely backend or batch (no interactive web surface) — use
ai-llm-integrationorpython-saas-integrationinstead. - The task is a pure UX pattern question — use
ai-ux-patternsorux-for-ai. - The task is full metering / billing strategy — use
ai-metering-billingorai-saas-billing. - The task is prompt or threat design — use
ai-prompt-engineering,ai-security, orllm-security.
Required Inputs
- Context map, access patterns, and threat model from upstream skills (see Inputs table below).
- Product scope of the AI feature: user job, authority boundary, cost ceiling.
Workflow
- Read this
SKILL.md, then load only the deep-dive references needed for the concrete task. - Apply the decision tables before writing code — architecture first, implementation second.
- Produce the four formal outputs (module gate, token ledger, provider abstraction, guardrails) as named artifacts, not as scattered edits.
Quality Standards
- Every AI feature is OFF by default and gated at tenant + user + budget.
- Every model call writes a ledger row before the response reaches the user.
- Every structured output is validated by Zod; every rendered output is Markdown-sanitised.
maxTokensis set on every call; fallback is limited to transient provider errors.
Anti-Patterns
- Fetching API keys in feature code instead of a single provider factory.
- Gating an AI feature on a single flag and calling it done — three gates (entitlement, user, budget) are required.
- Rendering model output through
dangerouslySetInnerHTMLor unsanitised Markdown.
Outputs
- Module gate contract (feature catalog, tenant flags, kill switch) — see
references/module-gate.md. - Token ledger + budget schema — see
references/token-ledger-and-budgets.md. - Provider abstraction contract (factory, fallback, request schema) — see
references/provider-abstraction.md. - Output validation + guardrail rules — see
references/output-guardrails.md.
Evidence Produced
| Category | Artifact | Format | Example |
|---|---|---|---|
| Correctness | AI web app integration test plan | Markdown doc covering Next.js + Vercel AI SDK + MCP tool flow tests | docs/ai/web-app-tests.md |
| Security | AI web app security note | Markdown doc covering provider-key handling, prompt-injection defense, and per-tenant isolation | docs/ai/web-app-security.md |
References
- See the
Referencessection below.
This skill specialises the generic integration patterns in ai-architecture-patterns for a Next.js + Vercel AI SDK runtime. It is the AI-specialist seat in the architecture: it turns a feature concept plus upstream context/threat artifacts into four concrete deliverables the rest of the stack can depend on.
Prerequisites
Load the following before this skill, in order:
world-class-engineering— release gates and production bar.system-architecture-design— produces the context map and critical-flow table.database-design-engineering— produces the access-pattern list that shapes the ledger schema.vibe-security-skill— produces the threat model;ai-securityandllm-securityextend it for LLMs.
When this skill applies
- Introducing an AI-powered route, server action, or streaming UI in a Next.js web app.
- Wiring an MCP server or MCP client into an existing web app.
- Adding a new model, provider, or fallback pair to an existing AI feature.
- Bolting a token ledger, budget guard, or kill switch onto a feature that shipped without one.
- Reviewing an AI web-app PR against the house contract (see Outputs).
Inputs
| Artifact | Produced by | Required? | Why |
|---|---|---|---|
| Context map | system-architecture-design |
required | names the module boundary the AI feature sits inside |
| Critical-flow table | system-architecture-design |
required | sets latency + availability budget for the feature |
| Access-pattern list | database-design-engineering |
required | shapes the ledger, quota, and corpus-retrieval queries |
| Threat model | vibe-security-skill / ai-security / llm-security |
required | informs prompt-injection, PII, and abuse controls |
| Auth/authz matrix | vibe-security-skill |
required | drives the role check inside the module gate |
| SLO set | observability-monitoring |
optional | calibrates fallback thresholds and kill-switch triggers |
| Pricing / plan catalog | saas-subscription-mastery / subscription-billing |
optional | determines min_plan_tier in the feature catalog |
Outputs
| Artifact | Consumed by | Template |
|---|---|---|
| Module gate contract | ai-saas-billing, platform admin UI, feature routes |
references/module-gate.md |
| Token ledger + budget schema | ai-metering-billing, observability-monitoring, finance reporting |
references/token-ledger-and-budgets.md |
| Provider abstraction contract | every AI route, ai-feature-spec, ai-evaluation |
references/provider-abstraction.md |
| Output validation + guardrail rules | every AI route, ai-evaluation, ai-security |
references/output-guardrails.md |
| Streaming + UI implementation notes | frontend integrators | references/streaming-and-ui-patterns.md |
| MCP server/client contract | internal tool owners, ai-agents-tools |
references/mcp-integration.md |
Each of the four formal outputs has a template; other outputs are implementation references.
Non-negotiables
- AI features ship OFF. Enablement is an explicit tenant admin action recorded in audit.
- Every model call writes a ledger row with tenant, user, request, provider, model, tokens, cost, latency, status.
- Every call sets
maxTokens; cost estimation usesmaxTokens, not an optimistic expected completion. - Tool outputs are validated before they influence writes, prompts, or UI.
- Model fallback fires only on transient provider errors (429, 5xx). Never on 4xx input errors.
- Rendered output goes through a Markdown sanitiser;
dangerouslySetInnerHTMLis banned. - Secrets never reach the browser;
NEXT_PUBLIC_*for an AI key is a release blocker.
Decision rules
When to expose a tool via MCP vs inline tools:
Single app, single model, <5 tools -> inline tools in streamText
Reused across apps or across models -> MCP server (stdio transport)
Tools require out-of-process credentials -> MCP server, separate service account
Latency budget <150 ms per tool call -> inline; MCP handshake adds overhead
Wrong choice cost: inline-when-should-be-MCP means every app duplicates the tool surface; MCP-when-should-be-inline pays handshake latency for no composition benefit.
Provider / model selection (short form)
| Need | First choice | Fallback | Why not the other way |
|---|---|---|---|
| Latency-critical chat, short replies | Gemini Flash | GPT-3.5 | frontier model burns budget on a task that does not need it |
| High-quality reasoning, long tool chains | Claude Sonnet | GPT-4o | flash model hallucinates multi-step chains |
| Cheapest structured extraction | Gemini Flash | GPT-3.5 | frontier cost is a 10–30x multiplier with no quality gain |
Strict schema adherence (generateObject) |
GPT-4o | Claude Sonnet (tool-call) | flash models drift from the schema under pressure |
| Residency / on-prem constraint | regional deployment | fail closed | cross-region fallback is a compliance breach |
Full selection table: references/provider-abstraction.md.
Three-gate evaluation order
1. Module gate (entitlement + enablement + role + consent)
2. Quota gate (Redis counter, per user per feature per day)
3. Budget guard (ledger sum + estimated cost vs tenant cap)
4. Execute call (factory -> model with maxTokens)
5. Ledger write (onFinish, real tokens + real cost)
6. Guardrails (structural -> semantic -> render)
Skip step 1: free tenants consume paid features. Skip step 2: one user drains tenant cap. Skip step 3: fallback to cheaper model masks runaway spend. Skip step 5: cost attribution breaks; billing disputes become unresolvable. Skip step 6: XSS or bad-data writes reach production.
Sync route vs queued job
| Signal | Route (streaming) | Queue (background) |
|---|---|---|
| p95 total work | <25 s | >25 s |
| User must see partial output as it arrives | yes | no |
| Multi-step tool chain with web fetches | mixed | queue |
| Result is persisted regardless of UI | optional | queue |
| Retry on failure must be automatic | hard in route | natural in queue |
Wrong choice: running a 90-second multi-tool task in a route blows the 30 s Vercel function cap; running a fast chat through a queue destroys the streaming UX users expect.
Structural vs semantic vs render guardrail
Zod schema missing -> silent data corruption downstream
Business rule check missing -> valid-looking nonsense reaches users / DB
Markdown sanitiser missing -> XSS via model output
All three run; skipping any layer imports that layer's failure mode.
Architecture
User (React UI)
|
v Server Actions / HTTP
Next.js App Router (middleware: CORS, IP rate limit, auth)
|
v
Module Gate -> Quota Gate -> Budget Guard
|
v Provider Factory (allow-listed)
Vercel AI SDK -----------------------> OpenAI / Gemini / Anthropic
| ^
v | ai-fallback on 429/5xx
MCP Client (optional) ----------------> MCP Server(s) -> internal APIs + vector stores
|
v Guardrails (structural, semantic, render)
Ledger Write (onFinish) -> Streaming response back to client
The four dashed boxes in this diagram correspond to the four formal outputs. Everything else is glue.
Core workflow
1. Define the AI interaction contract
Write these down before touching code:
- User job to be done and success signal.
- Input shape + size limits + validation (Zod schema).
- Output shape + fallback shape when the model fails (Zod schema).
- Allowed tools, authority boundary per tool, and whether tool results can trigger writes.
- Latency budget (p50, p95) and cost budget per request + per user per day.
- Consent requirement (e.g. data leaving tenant boundary).
This is the ai-feature-spec deliverable. This skill consumes it.
2. Declare the three gates
Implement the module gate (references/module-gate.md) and the quota + budget pair (references/token-ledger-and-budgets.md) before writing the route. The route wires them together; it does not invent them.
Order in the route:
moduleGate -> quotaGate -> budgetGuard -> factory -> streamText -> guardrails -> ledgerWrite
3. Centralise the provider
All model access goes through the factory in references/provider-abstraction.md. Feature code never imports a provider SDK directly. This is how:
- Cost attribution stays correct (factory is the single place that knows which model is live).
- Fallback stays policy-driven, not ad hoc per route.
- Swapping a provider is a one-file change.
4. Stream safely
Streaming gives a great UX and a fragile validation point. Rules:
- Stream raw text to the UI for responsiveness.
- Do not persist or trigger side effects until
onFinishhas validated the final object. - Validate each tool call's parameters before executing the tool, never after.
- Write the ledger row in
onFinishwith real usage; never use the estimate as the billed amount.
Implementation: references/streaming-and-ui-patterns.md.
5. Enforce guardrails in three layers
Structural (Zod on generateObject and tools), semantic (business rules, PII scrub, confidence thresholds), render (Markdown-only with a sanitiser). Full rules in references/output-guardrails.md.
6. Observability
Every call logs a single request_id that joins the prompt trace, tool calls, and the ledger row. Metrics at minimum:
ai.requests_total{feature, model, status}ai.tokens_total{feature, model, kind=prompt|completion}ai.cost_micros_total{tenant, feature}ai.guard_failures_total{guard, feature}ai.fallback_triggered_total{from_model, to_model}
Alert on sustained fallback (>5% for 5 min), sustained guard failures (>1% for 10 min), and any tenant crossing 80% of monthly cap.
7. Setup
npm install ai @ai-sdk/openai @ai-sdk/google @ai-sdk/anthropic ai-fallback zod
npm install @modelcontextprotocol/sdk # if exposing or consuming MCP tools
npm install lru-cache # only for dev / single-instance rate limit
Production rate limits are Redis-backed (e.g. Upstash); the LRU snippet in references/streaming-and-ui-patterns.md is a dev fallback.
Anti-patterns
- Reading
OPENAI_API_KEYdirectly in a route handler. Fix: all keys resolve insidelib/ai/factory.ts; feature code callsgetSupportedModel(provider, model)only. A missing key becomes a typed error, not a runtime 500. - Shipping an AI feature with a single feature flag. Fix: three independent gates —
tenant_ai_features.enabled,roleMayUseFeature,checkBudget. All three must pass; any one is sufficient to block. - Estimating billing cost from the actual completion. Fix: estimate with
maxTokenspre-call for the budget guard; record the real usage in the ledgeronFinish. Estimating with expected completion lets runaway responses blow through the cap. toDataStreamResponse()withoutmaxTokensonstreamText. Fix:maxTokensis always set; the budget guard uses the same value. Otherwise a jailbroken prompt can produce a 10 000-token reply on your dime.- Rendering AI Markdown with
dangerouslySetInnerHTMLfor "formatting". Fix: render throughreact-markdown+rehype-sanitize, disable raw HTML, allow-list link protocols. Direct HTML injection is a live XSS vector. - Falling back to a cheaper model on any error. Fix: fallback only fires on transient provider errors (429, 5xx). A 400 from input validation must surface to the user so they can fix the request; masking it wastes money and hides bugs.
- Tool result used directly to authorise a write. Fix: model-driven flows propose; user-driven events dispose. A tool that "sends payment" surfaces a confirmation UI, and the write is triggered by the user clicking confirm, not by the model returning
status: "confirmed". - MCP client opened per request and never closed. Fix: close in
onFinishand on request abort; cache the tool list per worker. Otherwise stdio subprocesses accumulate and the worker OOMs. - Quota key using server-local date. Fix: use the UTC day (
toISOString().slice(0, 10)). Local-date keys cause midnight races for users in other timezones and let them double their quota on travel days. - Prompt includes raw tool output from an untrusted source. Fix: tag fetched content as
untrustedin the system prompt, strip to text-only, and never let untrusted content trigger privileged tools in the same turn. Seellm-securityfor the trust-tiering pattern.
Read next
skill-composition-standards— the house style and contract gates this skill implements.ai-architecture-patterns— the generic integration patterns this skill specialises.ai-security— prompt-injection defence, PII handling, abuse controls.ai-evaluation— offline evals, regression tests, AI-as-judge for quality gates.ai-metering-billing— depth on ledger-to-invoice, unit economics, rev-share models.llm-security— OWASP Top 10 for LLMs, trust-tiering, instruction hierarchy.ai-feature-spec— how to write the feature contract this skill consumes.ai-ux-patterns— streaming UX, confidence indicators, progressive disclosure.nextjs-app-router— server/client components, middleware, RBAC three-tier.
References
references/module-gate.md— feature catalog, tenant flags, kill switch, audit, rollout choreography.references/token-ledger-and-budgets.md— ledger schema, budget guard, quota gate, cost estimation, join queries.references/provider-abstraction.md— factory, request schema, model selection table, fallback policy.references/output-guardrails.md— three-layer guardrails, tool-result validation, streaming validation, prompt-injection hooks.references/streaming-and-ui-patterns.md— streaming route,useChat,streamUIgenerators, multi-modal, middleware pipeline, light rate limits.references/mcp-integration.md— MCP server + client, authority tiers, lifecycle, observability.
Source base: Despoudis, T. — Build AI-Enhanced Web Apps (Packt, 2024), adapted to the repository contract model.
More from peterbamuhigire/skills-web-dev
google-play-store-review
Google Play Store compliance and review readiness for Android apps. Use
76multi-tenant-saas-architecture
Use when designing or reviewing a multi-tenant SaaS platform — tenant
64jetpack-compose-ui
Jetpack Compose UI standards for beautiful, sleek, minimalistic Android
49gis-mapping
Use for web apps that need Leaflet-first GIS mapping, location selection,
49saas-accounting-system
Implement a complete double-entry accounting system inside any SaaS app.
47manual-guide
Generate end-user manuals and reference guides for ERP modules. Use when
38