groq-multi-env-setup

SKILL.md

Groq Multi-Environment Setup

Overview

Configure Groq across environments with the right balance of cost, speed, and capability per tier. Groq's key differentiator is inference speed (100-300 tokens/second), but rate limits differ dramatically by plan: free tier is 30 RPM / 14,400 RPD for llama-3.1-70b, while paid tier removes most limits.

Prerequisites

  • Groq API key(s) per environment from console.groq.com
  • Environment variable management (.env.local, GitHub Secrets, or cloud secret manager)
  • Understanding of Groq's model tiers and rate limits

Environment Strategy

Environment Model Rate Limit Risk Config Source
Development llama-3.1-8b-instant Low (small model) .env.local
Staging llama-3.1-70b-versatile Medium CI/CD secrets
Production llama-3.1-70b-versatile or llama-3.3-70b-specdec Managed with retry Secret manager

Instructions

Step 1: Configuration Structure

config/
  groq/
    base.ts           # Shared Groq client setup
    development.ts    # Dev: fast small models, verbose logging
    staging.ts        # Staging: production models, test rate limits
    production.ts     # Prod: hardened retry, error handling
    index.ts          # Environment resolver

Step 2: Base Configuration with Groq SDK

// config/groq/base.ts
import Groq from "groq-sdk";

export const BASE_GROQ_CONFIG = {
  maxRetries: 3,
  timeout: 30000,  # 30000: 30 seconds in ms
};

Step 3: Environment-Specific Configs

// config/groq/development.ts
export const devConfig = {
  ...BASE_GROQ_CONFIG,
  apiKey: process.env.GROQ_API_KEY,
  model: "llama-3.1-8b-instant",      // fastest, cheapest for dev iteration
  maxTokens: 1024,  # 1024: 1 KB
  temperature: 0.7,
  logRequests: true,                   // verbose logging in dev
};

// config/groq/staging.ts
export const stagingConfig = {
  ...BASE_GROQ_CONFIG,
  apiKey: process.env.GROQ_API_KEY_STAGING,
  model: "llama-3.1-70b-versatile",   // match production model
  maxTokens: 4096,  # 4096: 4 KB
  temperature: 0.3,
  logRequests: false,
};

// config/groq/production.ts
export const productionConfig = {
  ...BASE_GROQ_CONFIG,
  apiKey: process.env.GROQ_API_KEY_PROD,
  model: "llama-3.1-70b-versatile",   // or llama-3.3-70b-specdec for faster
  maxTokens: 4096,  # 4 KB
  temperature: 0.3,
  maxRetries: 5,                       // more retries for production reliability
  logRequests: false,
};

Step 4: Environment Resolver with Groq Client

// config/groq/index.ts
import Groq from "groq-sdk";

type Env = "development" | "staging" | "production";

function detectEnvironment(): Env {
  const env = process.env.NODE_ENV || "development";
  if (env === "production") return "production";
  if (env === "staging") return "staging";
  return "development";
}

let _client: Groq | null = null;

export function getGroqClient(): Groq {
  if (_client) return _client;

  const env = detectEnvironment();
  const configs = { development: devConfig, staging: stagingConfig, production: productionConfig };
  const config = configs[env];

  if (!config.apiKey) {
    throw new Error(`GROQ_API_KEY not configured for ${env} environment`);
  }

  _client = new Groq({
    apiKey: config.apiKey,
    maxRetries: config.maxRetries,
    timeout: config.timeout,
  });

  return _client;
}

export function getModelConfig() {
  const env = detectEnvironment();
  const configs = { development: devConfig, staging: stagingConfig, production: productionConfig };
  return configs[env];
}

Step 5: Usage with Rate Limit Handling

// lib/groq-service.ts
import { getGroqClient, getModelConfig } from "../config/groq";

export async function complete(prompt: string): Promise<string> {
  const groq = getGroqClient();
  const { model, maxTokens, temperature } = getModelConfig();

  try {
    const completion = await groq.chat.completions.create({
      model,
      messages: [{ role: "user", content: prompt }],
      max_tokens: maxTokens,
      temperature,
    });
    return completion.choices[0].message.content || "";
  } catch (err: any) {
    if (err.status === 429) {  # HTTP 429 Too Many Requests
      const retryAfter = parseInt(err.headers?.["retry-after"] || "10");
      console.warn(`Groq rate limited. Retry after ${retryAfter}s`);
      throw new Error(`Rate limited on model ${model}. Retry after ${retryAfter}s`);
    }
    throw err;
  }
}

Error Handling

Issue Cause Solution
401 Unauthorized Invalid API key for environment Verify GROQ_API_KEY in secret manager
429 rate_limit_exceeded Free tier limit hit Switch to paid plan or implement request queuing
Model not found Deprecated model ID Check console.groq.com/docs/models for current list
Slow responses in dev Using 70b model for iteration Switch dev config to llama-3.1-8b-instant

Examples

Check Which Config Is Active

import { getModelConfig } from "./config/groq";

const cfg = getModelConfig();
console.log(`Model: ${cfg.model}, max_tokens: ${cfg.maxTokens}`);

Test Rate Limits Per Environment

set -euo pipefail
# Quick check: what's my current rate limit status?
curl -s "https://api.groq.com/openai/v1/models" \
  -H "Authorization: Bearer $GROQ_API_KEY" | jq '.data[].id'

Resources

Next Steps

For deployment configuration, see groq-deploy-integration.

Output

  • Configuration files or code changes applied to the project
  • Validation report confirming correct implementation
  • Summary of changes made and their rationale
Weekly Installs
17
GitHub Stars
1.6K
First Seen
Jan 25, 2026
Installed on
codex16
antigravity16
mcpjam15
claude-code15
windsurf15
zencoder15