aigw-route by missberg/envoy-skills

Create an AIGatewayRoute that attaches AI service backends to a Gateway. The route uses rules with matches (typically on x-ai-eg-model header) and backendRefs to route traffic. The AI Gateway ExtProc extracts the model name from the request body and injects it into x-ai-eg-model before routing—clients do not need to set this header. AI Gateway generates an HTTPRoute (same name) and HTTPRouteFilters (host rewrite, 404 fallback) from this.

Instructions

Step 1: Ensure Gateway has buffer limit for AI workloads

Envoy Gateway defaults to 32KiB buffer limit, which is too small for AI requests. Attach a ClientTrafficPolicy to your Gateway:

apiVersion: gateway.envoyproxy.io/v1alpha1
kind: ClientTrafficPolicy
metadata:
  name: client-buffer-limit
  namespace: default  # TODO: Match your Gateway namespace
spec:
  targetRefs:
    - group: gateway.networking.k8s.io
      kind: Gateway
      name: ${GatewayName}  # TODO: Replace with your Gateway name
  connection:
    bufferLimit: 50Mi

Step 2: Create the AIGatewayRoute

apiVersion: aigateway.envoyproxy.io/v1alpha1
kind: AIGatewayRoute
metadata:
  name: ${RouteName}  # TODO: Replace with descriptive name (e.g., openai-route)
  namespace: default  # TODO: Match Gateway namespace
spec:
  parentRefs:
    - name: ${GatewayName}
      kind: Gateway
      group: gateway.networking.k8s.io
  rules:
    - matches:
        - headers:
            - type: Exact
              name: x-ai-eg-model
              value: ${ModelHeader}  # TODO: e.g., gpt-4o-mini, claude-3-5-sonnet
      backendRefs:
        - name: ${BackendNames}  # TODO: AIServiceBackend name(s)

Step 3: Multiple backends (traffic splitting or failover)

Traffic splitting by weight:

rules:
  - matches:
      - headers:
          - type: Exact
            name: x-ai-eg-model
            value: gpt-4o
    backendRefs:
      - name: openai-backend
        weight: 80
      - name: azure-openai-backend
        weight: 20

Failover by priority (lower number = higher priority):

rules:
  - matches:
      - headers:
          - type: Exact
            name: x-ai-eg-model
            value: gpt-4o
    backendRefs:
      - name: primary-openai
        priority: 0
      - name: fallback-openai
        priority: 1

Step 4: Catch-all for all models

To route all models to a single backend:

rules:
  - backendRefs:
      - name: my-openai-backend

Step 5: Timeouts for streaming

For streaming responses (e.g., chat completions with stream: true), increase the request timeout:

rules:
  - matches:
      - headers:
          - type: Exact
            name: x-ai-eg-model
            value: gpt-4o
    timeouts:
      request: 300s  # 5 minutes for long streaming
    backendRefs:
      - name: openai-backend

Step 6: Model name override

Override the model name sent to the backend:

backendRefs:
  - name: azure-openai-backend
    modelNameOverride: gpt-4o  # Azure deployment name

Step 7: InferencePool (self-hosted models)

For InferencePool backends (Gateway API Inference Extension; requires addon):

backendRefs:
  - name: my-inference-pool
    group: inference.networking.k8s.io
    kind: InferencePool

Constraints: Only one InferencePool per rule; cannot mix InferencePool with AIServiceBackend in the same rule. Cross-namespace references require ReferenceGrant in the target namespace.

Checklist

ClientTrafficPolicy with bufferLimit (50Mi) attached to Gateway
AIGatewayRoute parentRefs point to correct Gateway
backendRefs reference existing AIServiceBackend (or InferencePool) resources
Matches use x-ai-eg-model when routing by model
Timeouts configured for streaming if needed
Cross-namespace refs require ReferenceGrant in target namespace