Perplexity Reliability Patterns

Overview

Production reliability patterns for Perplexity Sonar API integrations. Perplexity performs live web searches per request, making response times variable and dependent on search complexity -- unlike static LLM inference.

Prerequisites

Perplexity API key configured
Caching layer (Redis recommended)
Understanding of search-augmented generation latency

Instructions

Step 1: Cache Identical Queries

Perplexity's web search is expensive per call. Cache results for repeated queries within a time window.

import hashlib, json

class PerplexityCache:
    def __init__(self, redis_client, ttl=600):  # 600: timeout: 10 minutes
        self.r = redis_client
        self.ttl = ttl

    def get_or_search(self, client, messages, model="sonar", **kwargs):
        key = self._cache_key(messages, model, **kwargs)
        cached = self.r.get(key)
        if cached:
            return json.loads(cached)
        result = client.chat.completions.create(
            model=model, messages=messages, **kwargs
        )
        self.r.setex(key, self.ttl, json.dumps(result.to_dict()))
        return result

    def _cache_key(self, messages, model, **kwargs):
        data = json.dumps({"m": messages, "model": model, **kwargs}, sort_keys=True)
        return f"pplx:{hashlib.sha256(data.encode()).hexdigest()}"

Step 2: Model Tier Fallback

If sonar-pro times out or errors, fall back to sonar for a faster but shallower response.

def resilient_search(client, messages, timeout=30):
    try:
        return client.chat.completions.create(
            model="sonar-pro", messages=messages, timeout=timeout
        )
    except Exception:
        return client.chat.completions.create(
            model="sonar", messages=messages, timeout=15
        )

Step 3: Streaming with Timeout Protection

Perplexity streams can stall on complex searches. Set per-chunk timeouts.

import time

def stream_with_timeout(client, messages, chunk_timeout=10):
    stream = client.chat.completions.create(
        model="sonar", messages=messages, stream=True
    )
    last_chunk = time.time()
    full_response = ""
    citations = []

    for chunk in stream:
        if time.time() - last_chunk > chunk_timeout:
            raise TimeoutError("Stream stalled")
        last_chunk = time.time()
        delta = chunk.choices[0].delta.content or ""
        full_response += delta
        if hasattr(chunk, 'citations'):
            citations = chunk.citations
        yield delta

    return full_response, citations

Step 4: Citation Validation

Verify cited URLs are accessible before presenting to users.

import aiohttp

async def validate_citations(citations: list[str]) -> list[dict]:
    validated = []
    async with aiohttp.ClientSession() as session:
        for url in citations[:5]:  # limit to top 5
            try:
                async with session.head(url, timeout=aiohttp.ClientTimeout(total=5)) as r:
                    validated.append({"url": url, "status": r.status, "valid": r.status < 400})  # HTTP 400 Bad Request
            except:
                validated.append({"url": url, "status": 0, "valid": False})
    return validated

Error Handling

Issue	Cause	Solution
Slow responses (>15s)	Complex search query	Use sonar instead of sonar-pro
Stream stalls	Search taking too long	Per-chunk timeout detection
Stale results	Cached data too old	Reduce TTL for time-sensitive queries
Broken citation links	Source pages moved	Validate URLs before displaying

Examples

Basic usage: Apply perplexity reliability patterns to a standard project setup with default configuration options.

Advanced scenario: Customize perplexity reliability patterns for production environments with multiple constraints and team-specific requirements.

Resources

Perplexity API Docs

Output

Configuration files or code changes applied to the project
Validation report confirming correct implementation
Summary of changes made and their rationale

perplexity-reliability-patterns