aigw-route

SKILL.md

Create an AIGatewayRoute that attaches AI service backends to a Gateway. The route uses rules with matches (typically on x-ai-eg-model header) and backendRefs to route traffic. The AI Gateway ExtProc extracts the model name from the request body and injects it into x-ai-eg-model before routing—clients do not need to set this header. AI Gateway generates an HTTPRoute (same name) and HTTPRouteFilters (host rewrite, 404 fallback) from this.

Instructions

Step 1: Ensure Gateway has buffer limit for AI workloads

Envoy Gateway defaults to 32KiB buffer limit, which is too small for AI requests. Attach a ClientTrafficPolicy to your Gateway:

apiVersion: gateway.envoyproxy.io/v1alpha1
kind: ClientTrafficPolicy
metadata:
  name: client-buffer-limit
  namespace: default  # TODO: Match your Gateway namespace
spec:
  targetRefs:
    - group: gateway.networking.k8s.io
      kind: Gateway
      name: ${GatewayName}  # TODO: Replace with your Gateway name
  connection:
    bufferLimit: 50Mi

Step 2: Create the AIGatewayRoute

apiVersion: aigateway.envoyproxy.io/v1alpha1
kind: AIGatewayRoute
metadata:
  name: ${RouteName}  # TODO: Replace with descriptive name (e.g., openai-route)
  namespace: default  # TODO: Match Gateway namespace
spec:
  parentRefs:
    - name: ${GatewayName}
      kind: Gateway
      group: gateway.networking.k8s.io
  rules:
    - matches:
        - headers:
            - type: Exact
              name: x-ai-eg-model
              value: ${ModelHeader}  # TODO: e.g., gpt-4o-mini, claude-3-5-sonnet
      backendRefs:
        - name: ${BackendNames}  # TODO: AIServiceBackend name(s)

Step 3: Multiple backends (traffic splitting or failover)

Traffic splitting by weight:

rules:
  - matches:
      - headers:
          - type: Exact
            name: x-ai-eg-model
            value: gpt-4o
    backendRefs:
      - name: openai-backend
        weight: 80
      - name: azure-openai-backend
        weight: 20

Failover by priority (lower number = higher priority):

rules:
  - matches:
      - headers:
          - type: Exact
            name: x-ai-eg-model
            value: gpt-4o
    backendRefs:
      - name: primary-openai
        priority: 0
      - name: fallback-openai
        priority: 1

Step 4: Catch-all for all models

To route all models to a single backend:

rules:
  - backendRefs:
      - name: my-openai-backend

Step 5: Timeouts for streaming

For streaming responses (e.g., chat completions with stream: true), increase the request timeout:

rules:
  - matches:
      - headers:
          - type: Exact
            name: x-ai-eg-model
            value: gpt-4o
    timeouts:
      request: 300s  # 5 minutes for long streaming
    backendRefs:
      - name: openai-backend

Step 6: Model name override

Override the model name sent to the backend:

backendRefs:
  - name: azure-openai-backend
    modelNameOverride: gpt-4o  # Azure deployment name

Step 7: InferencePool (self-hosted models)

For InferencePool backends (Gateway API Inference Extension; requires addon):

backendRefs:
  - name: my-inference-pool
    group: inference.networking.k8s.io
    kind: InferencePool

Constraints: Only one InferencePool per rule; cannot mix InferencePool with AIServiceBackend in the same rule. Cross-namespace references require ReferenceGrant in the target namespace.

Checklist

  • ClientTrafficPolicy with bufferLimit (50Mi) attached to Gateway
  • AIGatewayRoute parentRefs point to correct Gateway
  • backendRefs reference existing AIServiceBackend (or InferencePool) resources
  • Matches use x-ai-eg-model when routing by model
  • Timeouts configured for streaming if needed
  • Cross-namespace refs require ReferenceGrant in target namespace
Weekly Installs
1
First Seen
5 days ago
Installed on
amp1
cline1
opencode1
cursor1
kimi-cli1
codex1