AI-Enhanced Web Apps

Acknowledgement: Shared by Peter Bamuhigire, techguypeter.com, +256 784 464178.

Use When

Designing or building an AI-enhanced web app on Next.js + Vercel AI SDK with streaming chat, tool calling, RAG, multi-modal input, or MCP-exposed tools.
Wiring an AI feature into a multi-tenant SaaS and needing the module gate, token ledger, and provider abstraction for that feature.
Reviewing an existing AI web-app route for missing budget, quota, guardrail, or fallback controls.

Do Not Use When

The AI feature is purely backend or batch (no interactive web surface) — use ai-llm-integration or python-saas-integration instead.
The task is a pure UX pattern question — use ai-ux-patterns or ux-for-ai.
The task is full metering / billing strategy — use ai-metering-billing or ai-saas-billing.
The task is prompt or threat design — use ai-prompt-engineering, ai-security, or llm-security.

Required Inputs

Context map, access patterns, and threat model from upstream skills (see Inputs table below).
Product scope of the AI feature: user job, authority boundary, cost ceiling.

Workflow

Read this SKILL.md, then load only the deep-dive references needed for the concrete task.
Apply the decision tables before writing code — architecture first, implementation second.
Produce the four formal outputs (module gate, token ledger, provider abstraction, guardrails) as named artifacts, not as scattered edits.

Quality Standards

Every AI feature is OFF by default and gated at tenant + user + budget.
Every model call writes a ledger row before the response reaches the user.
Every structured output is validated by Zod; every rendered output is Markdown-sanitised.
maxTokens is set on every call; fallback is limited to transient provider errors.

Anti-Patterns

Fetching API keys in feature code instead of a single provider factory.
Gating an AI feature on a single flag and calling it done — three gates (entitlement, user, budget) are required.
Rendering model output through dangerouslySetInnerHTML or unsanitised Markdown.

Outputs

Module gate contract (feature catalog, tenant flags, kill switch) — see references/module-gate.md.
Token ledger + budget schema — see references/token-ledger-and-budgets.md.
Provider abstraction contract (factory, fallback, request schema) — see references/provider-abstraction.md.
Output validation + guardrail rules — see references/output-guardrails.md.

Evidence Produced

Category	Artifact	Format	Example
Correctness	AI web app integration test plan	Markdown doc covering Next.js + Vercel AI SDK + MCP tool flow tests	`docs/ai/web-app-tests.md`
Security	AI web app security note	Markdown doc covering provider-key handling, prompt-injection defense, and per-tenant isolation	`docs/ai/web-app-security.md`

References

See the References section below.

This skill specialises the generic integration patterns in ai-architecture-patterns for a Next.js + Vercel AI SDK runtime. It is the AI-specialist seat in the architecture: it turns a feature concept plus upstream context/threat artifacts into four concrete deliverables the rest of the stack can depend on.

Prerequisites

Load the following before this skill, in order:

world-class-engineering — release gates and production bar.
system-architecture-design — produces the context map and critical-flow table.
database-design-engineering — produces the access-pattern list that shapes the ledger schema.
vibe-security-skill — produces the threat model; ai-security and llm-security extend it for LLMs.

When this skill applies

Introducing an AI-powered route, server action, or streaming UI in a Next.js web app.
Wiring an MCP server or MCP client into an existing web app.
Adding a new model, provider, or fallback pair to an existing AI feature.
Bolting a token ledger, budget guard, or kill switch onto a feature that shipped without one.
Reviewing an AI web-app PR against the house contract (see Outputs).

Inputs

Artifact	Produced by	Required?	Why
Context map	`system-architecture-design`	required	names the module boundary the AI feature sits inside
Critical-flow table	`system-architecture-design`	required	sets latency + availability budget for the feature
Access-pattern list	`database-design-engineering`	required	shapes the ledger, quota, and corpus-retrieval queries
Threat model	`vibe-security-skill` / `ai-security` / `llm-security`	required	informs prompt-injection, PII, and abuse controls
Auth/authz matrix	`vibe-security-skill`	required	drives the role check inside the module gate
SLO set	`observability-monitoring`	optional	calibrates fallback thresholds and kill-switch triggers
Pricing / plan catalog	`saas-subscription-mastery` / `subscription-billing`	optional	determines `min_plan_tier` in the feature catalog

Outputs

Artifact	Consumed by	Template
Module gate contract	`ai-saas-billing`, platform admin UI, feature routes	`references/module-gate.md`
Token ledger + budget schema	`ai-metering-billing`, `observability-monitoring`, finance reporting	`references/token-ledger-and-budgets.md`
Provider abstraction contract	every AI route, `ai-feature-spec`, `ai-evaluation`	`references/provider-abstraction.md`
Output validation + guardrail rules	every AI route, `ai-evaluation`, `ai-security`	`references/output-guardrails.md`
Streaming + UI implementation notes	frontend integrators	`references/streaming-and-ui-patterns.md`
MCP server/client contract	internal tool owners, `ai-agents-tools`	`references/mcp-integration.md`

Each of the four formal outputs has a template; other outputs are implementation references.

Non-negotiables

AI features ship OFF. Enablement is an explicit tenant admin action recorded in audit.
Every model call writes a ledger row with tenant, user, request, provider, model, tokens, cost, latency, status.
Every call sets maxTokens; cost estimation uses maxTokens, not an optimistic expected completion.
Tool outputs are validated before they influence writes, prompts, or UI.
Model fallback fires only on transient provider errors (429, 5xx). Never on 4xx input errors.
Rendered output goes through a Markdown sanitiser; dangerouslySetInnerHTML is banned.
Secrets never reach the browser; NEXT_PUBLIC_* for an AI key is a release blocker.

Decision rules

When to expose a tool via MCP vs inline `tools:`

Single app, single model, <5 tools             -> inline tools in streamText
Reused across apps or across models             -> MCP server (stdio transport)
Tools require out-of-process credentials        -> MCP server, separate service account
Latency budget <150 ms per tool call            -> inline; MCP handshake adds overhead

Wrong choice cost: inline-when-should-be-MCP means every app duplicates the tool surface; MCP-when-should-be-inline pays handshake latency for no composition benefit.

Provider / model selection (short form)

Need	First choice	Fallback	Why not the other way
Latency-critical chat, short replies	Gemini Flash	GPT-3.5	frontier model burns budget on a task that does not need it
High-quality reasoning, long tool chains	Claude Sonnet	GPT-4o	flash model hallucinates multi-step chains
Cheapest structured extraction	Gemini Flash	GPT-3.5	frontier cost is a 10–30x multiplier with no quality gain
Strict schema adherence (`generateObject`)	GPT-4o	Claude Sonnet (tool-call)	flash models drift from the schema under pressure
Residency / on-prem constraint	regional deployment	fail closed	cross-region fallback is a compliance breach

Full selection table: references/provider-abstraction.md.

Three-gate evaluation order

1. Module gate   (entitlement + enablement + role + consent)
2. Quota gate    (Redis counter, per user per feature per day)
3. Budget guard  (ledger sum + estimated cost vs tenant cap)
4. Execute call  (factory -> model with maxTokens)
5. Ledger write  (onFinish, real tokens + real cost)
6. Guardrails    (structural -> semantic -> render)

Skip step 1: free tenants consume paid features. Skip step 2: one user drains tenant cap. Skip step 3: fallback to cheaper model masks runaway spend. Skip step 5: cost attribution breaks; billing disputes become unresolvable. Skip step 6: XSS or bad-data writes reach production.

Sync route vs queued job

Signal	Route (streaming)	Queue (background)
p95 total work	<25 s	>25 s
User must see partial output as it arrives	yes	no
Multi-step tool chain with web fetches	mixed	queue
Result is persisted regardless of UI	optional	queue
Retry on failure must be automatic	hard in route	natural in queue

Wrong choice: running a 90-second multi-tool task in a route blows the 30 s Vercel function cap; running a fast chat through a queue destroys the streaming UX users expect.

Structural vs semantic vs render guardrail

Zod schema missing          -> silent data corruption downstream
Business rule check missing -> valid-looking nonsense reaches users / DB
Markdown sanitiser missing  -> XSS via model output

All three run; skipping any layer imports that layer's failure mode.

Architecture

User (React UI)
  |
  v  Server Actions / HTTP
Next.js App Router (middleware: CORS, IP rate limit, auth)
  |
  v
Module Gate  ->  Quota Gate  ->  Budget Guard
  |
  v  Provider Factory (allow-listed)
Vercel AI SDK  ----------------------->  OpenAI / Gemini / Anthropic
  |                                       ^
  v                                       |  ai-fallback on 429/5xx
MCP Client (optional)  ---------------->  MCP Server(s) -> internal APIs + vector stores
  |
  v  Guardrails (structural, semantic, render)
Ledger Write (onFinish)  ->  Streaming response back to client

The four dashed boxes in this diagram correspond to the four formal outputs. Everything else is glue.

Core workflow

1. Define the AI interaction contract

Write these down before touching code:

User job to be done and success signal.
Input shape + size limits + validation (Zod schema).
Output shape + fallback shape when the model fails (Zod schema).
Allowed tools, authority boundary per tool, and whether tool results can trigger writes.
Latency budget (p50, p95) and cost budget per request + per user per day.
Consent requirement (e.g. data leaving tenant boundary).

This is the ai-feature-spec deliverable. This skill consumes it.

2. Declare the three gates

Implement the module gate (references/module-gate.md) and the quota + budget pair (references/token-ledger-and-budgets.md) before writing the route. The route wires them together; it does not invent them.

Order in the route:

moduleGate -> quotaGate -> budgetGuard -> factory -> streamText -> guardrails -> ledgerWrite

3. Centralise the provider

All model access goes through the factory in references/provider-abstraction.md. Feature code never imports a provider SDK directly. This is how:

Cost attribution stays correct (factory is the single place that knows which model is live).
Fallback stays policy-driven, not ad hoc per route.
Swapping a provider is a one-file change.

4. Stream safely

Streaming gives a great UX and a fragile validation point. Rules:

Stream raw text to the UI for responsiveness.
Do not persist or trigger side effects until onFinish has validated the final object.
Validate each tool call's parameters before executing the tool, never after.
Write the ledger row in onFinish with real usage; never use the estimate as the billed amount.

Implementation: references/streaming-and-ui-patterns.md.

5. Enforce guardrails in three layers

Structural (Zod on generateObject and tools), semantic (business rules, PII scrub, confidence thresholds), render (Markdown-only with a sanitiser). Full rules in references/output-guardrails.md.

6. Observability

Every call logs a single request_id that joins the prompt trace, tool calls, and the ledger row. Metrics at minimum:

ai.requests_total{feature, model, status}
ai.tokens_total{feature, model, kind=prompt|completion}
ai.cost_micros_total{tenant, feature}
ai.guard_failures_total{guard, feature}
ai.fallback_triggered_total{from_model, to_model}

Alert on sustained fallback (>5% for 5 min), sustained guard failures (>1% for 10 min), and any tenant crossing 80% of monthly cap.

7. Setup

npm install ai @ai-sdk/openai @ai-sdk/google @ai-sdk/anthropic ai-fallback zod
npm install @modelcontextprotocol/sdk     # if exposing or consuming MCP tools
npm install lru-cache                     # only for dev / single-instance rate limit

Production rate limits are Redis-backed (e.g. Upstash); the LRU snippet in references/streaming-and-ui-patterns.md is a dev fallback.

Anti-patterns

Reading OPENAI_API_KEY directly in a route handler. Fix: all keys resolve inside lib/ai/factory.ts; feature code calls getSupportedModel(provider, model) only. A missing key becomes a typed error, not a runtime 500.
Shipping an AI feature with a single feature flag. Fix: three independent gates — tenant_ai_features.enabled, roleMayUseFeature, checkBudget. All three must pass; any one is sufficient to block.
Estimating billing cost from the actual completion. Fix: estimate with maxTokens pre-call for the budget guard; record the real usage in the ledger onFinish. Estimating with expected completion lets runaway responses blow through the cap.
toDataStreamResponse() without maxTokens on streamText. Fix: maxTokens is always set; the budget guard uses the same value. Otherwise a jailbroken prompt can produce a 10 000-token reply on your dime.
Rendering AI Markdown with dangerouslySetInnerHTML for "formatting". Fix: render through react-markdown + rehype-sanitize, disable raw HTML, allow-list link protocols. Direct HTML injection is a live XSS vector.
Falling back to a cheaper model on any error. Fix: fallback only fires on transient provider errors (429, 5xx). A 400 from input validation must surface to the user so they can fix the request; masking it wastes money and hides bugs.
Tool result used directly to authorise a write. Fix: model-driven flows propose; user-driven events dispose. A tool that "sends payment" surfaces a confirmation UI, and the write is triggered by the user clicking confirm, not by the model returning status: "confirmed".
MCP client opened per request and never closed. Fix: close in onFinish and on request abort; cache the tool list per worker. Otherwise stdio subprocesses accumulate and the worker OOMs.
Quota key using server-local date. Fix: use the UTC day (toISOString().slice(0, 10)). Local-date keys cause midnight races for users in other timezones and let them double their quota on travel days.
Prompt includes raw tool output from an untrusted source. Fix: tag fetched content as untrusted in the system prompt, strip to text-only, and never let untrusted content trigger privileged tools in the same turn. See llm-security for the trust-tiering pattern.

References

references/module-gate.md — feature catalog, tenant flags, kill switch, audit, rollout choreography.
references/token-ledger-and-budgets.md — ledger schema, budget guard, quota gate, cost estimation, join queries.
references/provider-abstraction.md — factory, request schema, model selection table, fallback policy.
references/output-guardrails.md — three-layer guardrails, tool-result validation, streaming validation, prompt-injection hooks.
references/streaming-and-ui-patterns.md — streaming route, useChat, streamUI generators, multi-modal, middleware pipeline, light rate limits.
references/mcp-integration.md — MCP server + client, authority tiers, lifecycle, observability.

Source base: Despoudis, T. — Build AI-Enhanced Web Apps (Packt, 2024), adapted to the repository contract model.

ai-web-apps

AI-Enhanced Web Apps

Use When

Do Not Use When

Required Inputs

Workflow

Quality Standards

Anti-Patterns

Outputs

Evidence Produced

References

Prerequisites

When this skill applies

Inputs

Outputs

Non-negotiables

Decision rules

When to expose a tool via MCP vs inline tools:

Provider / model selection (short form)

Three-gate evaluation order

Sync route vs queued job

Structural vs semantic vs render guardrail

Architecture

Core workflow

1. Define the AI interaction contract

2. Declare the three gates

3. Centralise the provider

4. Stream safely

5. Enforce guardrails in three layers

6. Observability

7. Setup

Anti-patterns

Read next

References

More from peterbamuhigire/skills-web-dev

google-play-store-review

multi-tenant-saas-architecture

jetpack-compose-ui

gis-mapping

saas-accounting-system

manual-guide

When to expose a tool via MCP vs inline `tools:`