Groq Multi-Environment Setup

Overview

Configure Groq across environments with the right balance of cost, speed, and capability per tier. Groq's key differentiator is inference speed (100-300 tokens/second), but rate limits differ dramatically by plan: free tier is 30 RPM / 14,400 RPD for llama-3.1-70b, while paid tier removes most limits.

Prerequisites

Groq API key(s) per environment from console.groq.com
Environment variable management (.env.local, GitHub Secrets, or cloud secret manager)
Understanding of Groq's model tiers and rate limits

Environment Strategy

Environment	Model	Rate Limit Risk	Config Source
Development	`llama-3.1-8b-instant`	Low (small model)	`.env.local`
Staging	`llama-3.1-70b-versatile`	Medium	CI/CD secrets
Production	`llama-3.1-70b-versatile` or `llama-3.3-70b-specdec`	Managed with retry	Secret manager

Instructions

Step 1: Configuration Structure

config/
  groq/
    base.ts           # Shared Groq client setup
    development.ts    # Dev: fast small models, verbose logging
    staging.ts        # Staging: production models, test rate limits
    production.ts     # Prod: hardened retry, error handling
    index.ts          # Environment resolver

Step 2: Base Configuration with Groq SDK

// config/groq/base.ts
import Groq from "groq-sdk";

export const BASE_GROQ_CONFIG = {
  maxRetries: 3,
  timeout: 30000,  # 30000: 30 seconds in ms
};

Step 3: Environment-Specific Configs

// config/groq/development.ts
export const devConfig = {
  ...BASE_GROQ_CONFIG,
  apiKey: process.env.GROQ_API_KEY,
  model: "llama-3.1-8b-instant",      // fastest, cheapest for dev iteration
  maxTokens: 1024,  # 1024: 1 KB
  temperature: 0.7,
  logRequests: true,                   // verbose logging in dev
};

// config/groq/staging.ts
export const stagingConfig = {
  ...BASE_GROQ_CONFIG,
  apiKey: process.env.GROQ_API_KEY_STAGING,
  model: "llama-3.1-70b-versatile",   // match production model
  maxTokens: 4096,  # 4096: 4 KB
  temperature: 0.3,
  logRequests: false,
};

// config/groq/production.ts
export const productionConfig = {
  ...BASE_GROQ_CONFIG,
  apiKey: process.env.GROQ_API_KEY_PROD,
  model: "llama-3.1-70b-versatile",   // or llama-3.3-70b-specdec for faster
  maxTokens: 4096,  # 4 KB
  temperature: 0.3,
  maxRetries: 5,                       // more retries for production reliability
  logRequests: false,
};

Step 4: Environment Resolver with Groq Client

// config/groq/index.ts
import Groq from "groq-sdk";

type Env = "development" | "staging" | "production";

function detectEnvironment(): Env {
  const env = process.env.NODE_ENV || "development";
  if (env === "production") return "production";
  if (env === "staging") return "staging";
  return "development";
}

let _client: Groq | null = null;

export function getGroqClient(): Groq {
  if (_client) return _client;

  const env = detectEnvironment();
  const configs = { development: devConfig, staging: stagingConfig, production: productionConfig };
  const config = configs[env];

  if (!config.apiKey) {
    throw new Error(`GROQ_API_KEY not configured for ${env} environment`);
  }

  _client = new Groq({
    apiKey: config.apiKey,
    maxRetries: config.maxRetries,
    timeout: config.timeout,
  });

  return _client;
}

export function getModelConfig() {
  const env = detectEnvironment();
  const configs = { development: devConfig, staging: stagingConfig, production: productionConfig };
  return configs[env];
}

Step 5: Usage with Rate Limit Handling

// lib/groq-service.ts
import { getGroqClient, getModelConfig } from "../config/groq";

export async function complete(prompt: string): Promise<string> {
  const groq = getGroqClient();
  const { model, maxTokens, temperature } = getModelConfig();

  try {
    const completion = await groq.chat.completions.create({
      model,
      messages: [{ role: "user", content: prompt }],
      max_tokens: maxTokens,
      temperature,
    });
    return completion.choices[0].message.content || "";
  } catch (err: any) {
    if (err.status === 429) {  # HTTP 429 Too Many Requests
      const retryAfter = parseInt(err.headers?.["retry-after"] || "10");
      console.warn(`Groq rate limited. Retry after ${retryAfter}s`);
      throw new Error(`Rate limited on model ${model}. Retry after ${retryAfter}s`);
    }
    throw err;
  }
}

Error Handling

Issue	Cause	Solution
`401 Unauthorized`	Invalid API key for environment	Verify `GROQ_API_KEY` in secret manager
`429 rate_limit_exceeded`	Free tier limit hit	Switch to paid plan or implement request queuing
Model not found	Deprecated model ID	Check console.groq.com/docs/models for current list
Slow responses in dev	Using 70b model for iteration	Switch dev config to `llama-3.1-8b-instant`

Examples

Check Which Config Is Active

import { getModelConfig } from "./config/groq";

const cfg = getModelConfig();
console.log(`Model: ${cfg.model}, max_tokens: ${cfg.maxTokens}`);

Test Rate Limits Per Environment

set -euo pipefail
# Quick check: what's my current rate limit status?
curl -s "https://api.groq.com/openai/v1/models" \
  -H "Authorization: Bearer $GROQ_API_KEY" | jq '.data[].id'

Resources

Next Steps

For deployment configuration, see groq-deploy-integration.

Output

Configuration files or code changes applied to the project
Validation report confirming correct implementation
Summary of changes made and their rationale

groq-multi-env-setup