production-deployment
Production Deployment
Complete production deployment guide for ElevenLabs API integration including rate limiting patterns, comprehensive error handling strategies, monitoring setup, and testing frameworks.
Overview
This skill provides battle-tested patterns for deploying ElevenLabs API integration to production environments with:
- Rate Limiting: Concurrency-aware rate limiting respecting plan limits
- Error Handling: Comprehensive error recovery and retry strategies
- Monitoring: Real-time metrics, logging, and alerting
- Testing: Load testing, concurrency validation, and production readiness checks
Quick Start
1. Setup Monitoring Infrastructure
bash scripts/setup-monitoring.sh --project-name "my-elevenlabs-app" \
--log-level "info" \
--metrics-port 9090
This script:
- Configures Winston logging with rotation
- Sets up Prometheus metrics endpoints
- Creates health check endpoints
- Initializes error tracking
2. Deploy Production Configuration
bash scripts/deploy-production.sh --environment "production" \
--api-key "$ELEVENLABS_API_KEY" \
--concurrency-limit 10 \
--region "us-east-1"
This script:
- Validates environment variables
- Applies rate limiting configuration
- Configures error handling middleware
- Sets up monitoring integrations
- Performs smoke tests
3. Test Rate Limiting
bash scripts/test-rate-limiting.sh --concurrency 20 \
--duration 60 \
--plan-tier "pro"
This script:
- Simulates concurrent requests
- Validates queue behavior
- Measures latency under load
- Generates performance report
ElevenLabs Concurrency Limits
Limits by Plan Tier
| Plan | Multilingual v2 | Turbo/Flash | STT | Music |
|---|---|---|---|---|
| Free | 2 | 4 | 8 | N/A |
| Starter | 3 | 6 | 12 | 2 |
| Creator | 5 | 10 | 20 | 2 |
| Pro | 10 | 20 | 40 | 2 |
| Scale | 15 | 30 | 60 | 3 |
| Business | 15 | 30 | 60 | 3 |
| Enterprise | Elevated | Elevated | Elevated | Highest |
Queue Management
When concurrency limits are exceeded:
- Requests are queued alongside lower-priority requests
- Typical latency increase: ~50ms
- Response headers include:
current-concurrent-requests,maximum-concurrent-requests
Real-World Capacity
A concurrency limit of 5 can typically support ~100 simultaneous audio broadcasts depending on:
- Audio generation speed
- User behavior patterns
- Request distribution
Rate Limiting Patterns
1. Token Bucket Algorithm
Best for: Variable rate limiting with burst capacity
// See templates/rate-limiter.js.template for full implementation
const limiter = new TokenBucketRateLimiter({
capacity: 10, // Max concurrent requests
refillRate: 2, // Tokens per second
queueSize: 100 // Max queued requests
});
2. Sliding Window with Priority Queue
Best for: Enforcing strict concurrency limits with prioritization
# See templates/error-handler.py.template for full implementation
limiter = SlidingWindowRateLimiter(
max_concurrent=10
window_size=60
priority_levels=3
)
3. Adaptive Rate Limiting
Best for: Self-adjusting to API response headers
Monitors current-concurrent-requests and maximum-concurrent-requests headers to dynamically adjust rate limits.
Error Handling Strategies
Error Categories
1. Rate Limit Errors (429)
- Implement exponential backoff
- Queue requests for retry
- Monitor queue depth
2. Service Errors (500-599)
- Retry with exponential backoff
- Circuit breaker pattern
- Fallback to cached audio
3. Client Errors (400-499)
- Log for debugging
- Do not retry
- Return meaningful error to user
4. Network Errors
- Retry with linear backoff
- Timeout after 30 seconds
- Circuit breaker after 5 failures
Circuit Breaker Pattern
// Automatically opens circuit after threshold failures
const circuitBreaker = new CircuitBreaker({
failureThreshold: 5
resetTimeout: 60000
monitorInterval: 5000
});
Monitoring Setup
Key Metrics to Track
Request Metrics:
elevenlabs_requests_total- Total requests by statuselevenlabs_requests_duration_seconds- Request latency histogramelevenlabs_concurrent_requests- Current concurrent requestselevenlabs_queue_depth- Queued requests waiting
Error Metrics:
elevenlabs_errors_total- Total errors by typeelevenlabs_retries_total- Total retry attemptselevenlabs_circuit_breaker_state- Circuit breaker state
Business Metrics:
elevenlabs_characters_generated- Total characters processedelevenlabs_audio_duration_seconds- Total audio durationelevenlabs_quota_used_percentage- Quota utilization
Logging Best Practices
Structure logs with:
- Request ID for tracing
- User ID for analysis
- Timestamp in ISO 8601
- Error stack traces
- Performance metrics
Log Levels:
error- Failures requiring attentionwarn- Degraded performance, retriesinfo- Request completion, key eventsdebug- Detailed execution flow
Alerting Rules
Critical Alerts:
- Error rate > 5% over 5 minutes
- Circuit breaker open for > 1 minute
- Queue depth > 500 requests
Warning Alerts:
- Latency p95 > 2 seconds
- Quota usage > 90%
- Retry rate > 20%
Testing Frameworks
Load Testing
Simulate production traffic patterns:
# Gradual ramp-up test
bash scripts/test-rate-limiting.sh \
--pattern "ramp-up" \
--start-rps 1 \
--end-rps 10 \
--duration 300
Concurrency Validation
Verify concurrency limits are enforced:
# Burst test
bash scripts/test-rate-limiting.sh \
--pattern "burst" \
--concurrency 50 \
--iterations 100
Chaos Testing
Test error handling under adverse conditions:
# Simulate API failures
bash scripts/test-rate-limiting.sh \
--pattern "chaos" \
--failure-rate 0.1 \
--duration 120
Production Checklist
Pre-Deployment
- Environment variables configured
- Rate limiting configured for plan tier
- Error handling middleware implemented
- Monitoring and logging configured
- Health check endpoints created
- Load testing completed
- Chaos testing completed
Post-Deployment
- Smoke tests passed
- Metrics dashboard configured
- Alerts configured and tested
- On-call rotation established
- Runbooks documented
- Backup/fallback strategy tested
Scripts
setup-monitoring.sh
Configures comprehensive monitoring infrastructure:
- Winston logging with daily rotation
- Prometheus metrics exporter
- Health check endpoints
- Error tracking integration
- Custom metric collectors
Usage:
bash scripts/setup-monitoring.sh \
--project-name "my-app" \
--log-level "info" \
--metrics-port 9090 \
--health-port 8080
deploy-production.sh
Production deployment orchestration:
- Environment validation
- Dependency installation
- Configuration deployment
- Service health checks
- Smoke test execution
- Rollback on failure
Usage:
bash scripts/deploy-production.sh \
--environment "production" \
--api-key "$ELEVENLABS_API_KEY" \
--concurrency-limit 10 \
--skip-tests false
test-rate-limiting.sh
Comprehensive rate limiting test suite:
- Concurrency limit validation
- Queue behavior testing
- Latency measurement
- Error rate tracking
- Performance reporting
Usage:
bash scripts/test-rate-limiting.sh \
--concurrency 20 \
--duration 60 \
--plan-tier "pro" \
--pattern "ramp-up"
validate-config.sh
Production configuration validator:
- Environment variable checks
- API key validation
- Rate limit configuration
- Monitoring setup verification
- Security audit
Usage:
bash scripts/validate-config.sh \
--config-file "config/production.json" \
--strict true
rollback.sh
Automated rollback script:
- Reverts to previous deployment
- Restores configuration
- Validates health checks
- Notifies team
Usage:
bash scripts/rollback.sh \
--deployment-id "deploy-123" \
--reason "High error rate"
Templates
rate-limiter.js.template
Token bucket rate limiter with priority queue:
- Configurable capacity and refill rate
- Priority-based request queuing
- Automatic backpressure handling
- Prometheus metrics integration
rate-limiter.py.template
Sliding window rate limiter with async support:
- Strict concurrency enforcement
- Redis-backed for distributed systems
- Circuit breaker integration
- Comprehensive error handling
error-handler.js.template
Production-grade error handler:
- Error categorization and routing
- Exponential backoff retry logic
- Circuit breaker pattern
- Structured error logging
error-handler.py.template
Async error handler with context:
- Context-aware error handling
- Retry with jitter
- Error aggregation and reporting
- Integration with monitoring
monitoring-config.json.template
Complete monitoring configuration:
- Prometheus scrape configs
- Alert rules and thresholds
- Log aggregation settings
- Dashboard definitions
health-check.js.template
Comprehensive health check endpoint:
- API connectivity verification
- Rate limiter health
- Queue depth monitoring
- Dependency checks
Examples
Rate Limiting Example
Complete implementation showing:
- Token bucket rate limiter
- Priority queue management
- Backpressure handling
- Metrics collection
Location: examples/rate-limiting/
Error Handling Example
Production error handling patterns:
- Retry with exponential backoff
- Circuit breaker implementation
- Fallback strategies
- Error logging and alerting
Location: examples/error-handling/
Monitoring Example
Full monitoring stack setup:
- Prometheus metrics
- Grafana dashboards
- Winston logging
- Alert manager configuration
Location: examples/monitoring/
Best Practices
Rate Limiting
- Configure for your plan tier - Don't exceed concurrency limits
- Implement graceful degradation - Queue requests, don't drop
- Monitor queue depth - Alert on excessive queueing
- Use adaptive limiting - Adjust based on response headers
- Test under load - Validate behavior before production
Error Handling
- Categorize errors - Different strategies for different error types
- Implement retries carefully - Exponential backoff with jitter
- Use circuit breakers - Prevent cascade failures
- Log comprehensively - Include context for debugging
- Provide fallbacks - Cached audio, degraded experience
Monitoring
- Track key metrics - Request rate, latency, errors, concurrency
- Set meaningful alerts - Actionable, not noisy
- Use structured logging - JSON format for easy parsing
- Create dashboards - Real-time visibility
- Test alerts - Verify notification channels work
Testing
- Load test gradually - Ramp up to avoid overwhelming API
- Simulate realistic patterns - User behavior, not raw requests
- Test error scenarios - Chaos engineering
- Validate concurrency - Ensure limits are enforced
- Monitor during tests - Use production monitoring stack
Troubleshooting
High Error Rate
Symptoms: Error rate > 5%
Diagnosis:
- Check Prometheus metrics:
rate(elevenlabs_errors_total[5m]) - Review error logs for patterns
- Verify API key is valid
- Check quota remaining
Resolution:
- If rate limiting: Reduce request rate or upgrade plan
- If service errors: Implement circuit breaker, contact support
- If client errors: Fix request validation
High Latency
Symptoms: p95 latency > 2 seconds
Diagnosis:
- Check concurrency:
elevenlabs_concurrent_requests - Check queue depth:
elevenlabs_queue_depth - Review response headers:
current-concurrent-requests
Resolution:
- Increase concurrency limit (upgrade plan if needed)
- Optimize request payload size
- Implement request coalescing
- Use Turbo/Flash models for lower latency
Circuit Breaker Open
Symptoms: Requests failing immediately
Diagnosis:
- Check circuit breaker state metric
- Review error logs for failure pattern
- Check ElevenLabs status page
Resolution:
- Wait for automatic reset (default 60s)
- If persistent: Check API connectivity
- Manual reset if resolved: Restart service
Resources
Contributing
When updating this skill:
- Test scripts thoroughly in production-like environment
- Update templates with latest best practices
- Add examples for new patterns
- Update troubleshooting guide
- Validate with
validate-skill.sh
Version: 1.0.0 Last Updated: 2025-10-29 Maintainer: ElevenLabs Plugin Team