skills/missberg/envoy-skills/aigw-fundamentals

aigw-fundamentals

SKILL.md

Envoy AI Gateway Fundamentals

Envoy AI Gateway extends Envoy Gateway to provide a unified API gateway for generative AI services. It translates between client-facing APIs (e.g., OpenAI-compatible) and backend-specific APIs (OpenAI, Anthropic, AWS Bedrock, Azure OpenAI, GCP Vertex AI, Cohere, etc.).

Resource Hierarchy

GatewayClass (Gateway API)
  -> Gateway (Gateway API)
    -> AIGatewayRoute (AI Gateway CRD)
      -> rules with matches + backendRefs
        -> AIServiceBackend (AI Gateway CRD)
          -> Backend (Envoy Gateway) or InferencePool (Gateway API extension)

Core AI Gateway CRDs (aigateway.envoyproxy.io/v1alpha1)

CRD Purpose
AIGatewayRoute Binds AI backends to a Gateway. Defines routing rules (header matches, e.g. x-ai-eg-model), backend refs, timeouts, and optional LLM cost capture. Generates HTTPRoute and HTTPRouteFilter under the hood.
AIServiceBackend Describes a single AI backend: its API schema and the Envoy Gateway Backend it attaches to. backendRef must be a Backend (gateway.envoyproxy.io); it cannot reference a Kubernetes Service directly. Use a Backend with FQDN endpoints (e.g., to a K8s service DNS) for in-cluster backends.
BackendSecurityPolicy Backend authentication: API key, AWS credentials, Azure credentials, GCP credentials, Anthropic API key. Attaches to AIServiceBackend or InferencePool. Only one BackendSecurityPolicy can target a given AIServiceBackend or InferencePool; multiple policies cause reconciliation failure.
GatewayConfig Gateway-scoped ExtProc config (resources, env vars). Reference via annotation aigateway.envoyproxy.io/gateway-config: <name> on the Gateway. Same namespace as Gateway.
MCPRoute Model Context Protocol routing for MCP tools.
QuotaPolicy Rate limiting and quota management.

Envoy Gateway Resources Used by AI Gateway

  • Gateway, GatewayClass, HTTPRoute — standard Gateway API
  • Backend — external endpoints (FQDN, port) for AI providers
  • BackendTLSPolicy — TLS validation for Backend (use gateway.networking.k8s.io/v1 with Envoy Gateway v1.6+)
  • ClientTrafficPolicy — client-facing settings (buffer limits, timeouts). Required for AI: set connection.bufferLimit (e.g., 50Mi) because default 32KiB is too small for AI requests.
  • EnvoyProxy — customizes Envoy deployment

API Schemas (schema.name in AIServiceBackend)

Supported values (from ai-gateway codebase):

  • OpenAI — OpenAI API, OpenAI-compatible backends
  • Cohere — Cohere API
  • AWSBedrock — AWS Bedrock
  • AzureOpenAI — Azure OpenAI
  • GCPVertexAI — GCP Vertex AI (Gemini)
  • GCPAnthropic — Anthropic on GCP Vertex AI
  • Anthropic — Native Anthropic API
  • AWSAnthropic — Anthropic on AWS Bedrock

Routing Model

  • x-ai-eg-model header: The AI Gateway filter extracts the model from the request body and injects it into this header. Use it in AIGatewayRoute matches to route by model.
  • BackendRefs: Reference AIServiceBackend by name (default). Can also reference InferencePool (group: inference.networking.k8s.io, kind: InferencePool) for self-hosted models.
  • Priority: Use priority in backendRefs for failover (lower number = higher priority).
  • Weight: Use weight for traffic splitting across backends.

BackendSecurityPolicy Types

Type Use Case
APIKey OpenAI, generic API key in Authorization header
AnthropicAPIKey Anthropic (x-api-key header)
AzureAPIKey Azure OpenAI (api-key header)
AzureCredentials Azure OpenAI with OAuth/client secret
AWSCredentials AWS Bedrock (IRSA, Pod Identity, or credentials file)
GCPCredentials GCP Vertex AI (service account or workload identity)

Two-Tier Gateway Pattern

  • Tier One Gateway: Central entry point; handles auth, top-level routing, global rate limiting.
  • Tier Two Gateway: Fine-grained control over self-hosted models; InferencePool with endpoint picker for LLM optimization.

Naming Conventions

  • Use kebab-case for resource names
  • AIServiceBackend and Backend often share the same name for clarity
  • BackendSecurityPolicy names typically indicate provider: my-backend-openai-apikey

Implementation Notes

  • AIGatewayRoute generates an HTTPRoute (same name) and HTTPRouteFilters (host rewrite, 404 fallback). The AI Gateway controller uses the Envoy Gateway Extension Server to fine-tune xDS.
  • ExtProc sidecar: AI Gateway injects an External Processor as a sidecar in the Envoy Proxy Pod. It reads a filter config Secret, performs request/response transformation, and injects provider credentials. Model name is extracted from the request body and set in x-ai-eg-model for routing.

Checklist

  • Understand AIGatewayRoute → AIServiceBackend → Backend chain (Backend required, not K8s Service)
  • Know which schema.name matches your provider
  • At most one BackendSecurityPolicy per AIServiceBackend or InferencePool
  • BackendSecurityPolicy required for cloud providers (OpenAI, Anthropic, AWS, Azure, GCP)
  • ClientTrafficPolicy with bufferLimit for AI workloads
  • BackendTLSPolicy for HTTPS backends (hostname validation)
Weekly Installs
1
First Seen
5 days ago
Installed on
amp1
cline1
opencode1
cursor1
kimi-cli1
codex1