LangChain SDK Patterns

Overview

Production-grade patterns every LangChain application should use: type-safe structured output, provider fallbacks, async batch processing, streaming, caching, and retry logic.

Pattern 1: Structured Output with Zod

import { ChatOpenAI } from "@langchain/openai";
import { ChatPromptTemplate } from "@langchain/core/prompts";
import { z } from "zod";

const ExtractedData = z.object({
  entities: z.array(z.object({
    name: z.string(),
    type: z.enum(["person", "org", "location"]),
    confidence: z.number().min(0).max(1),
  })),
  language: z.string(),
  summary: z.string(),
});

const model = new ChatOpenAI({ model: "gpt-4o-mini" });
const structuredModel = model.withStructuredOutput(ExtractedData);

const prompt = ChatPromptTemplate.fromTemplate(
  "Extract entities from this text:\n\n{text}"
);

const chain = prompt.pipe(structuredModel);

const result = await chain.invoke({
  text: "Satya Nadella announced Microsoft's new AI lab in Seattle.",
});
// result is fully typed: { entities: [...], language: "en", summary: "..." }

Pattern 2: Provider Fallbacks

import { ChatOpenAI } from "@langchain/openai";
import { ChatAnthropic } from "@langchain/anthropic";

const primary = new ChatOpenAI({
  model: "gpt-4o",
  maxRetries: 2,
  timeout: 10000,
});

const fallback = new ChatAnthropic({
  model: "claude-sonnet-4-20250514",
});

// Automatically falls back on any error (rate limit, timeout, 500)
const robustModel = primary.withFallbacks({
  fallbacks: [fallback],
});

// Works identically to a normal model
const chain = prompt.pipe(robustModel).pipe(new StringOutputParser());

Pattern 3: Async Batch Processing

import { ChatOpenAI } from "@langchain/openai";
import { ChatPromptTemplate } from "@langchain/core/prompts";
import { StringOutputParser } from "@langchain/core/output_parsers";

const chain = ChatPromptTemplate.fromTemplate("Summarize: {text}")
  .pipe(new ChatOpenAI({ model: "gpt-4o-mini" }))
  .pipe(new StringOutputParser());

const texts = ["Article 1...", "Article 2...", "Article 3..."];
const inputs = texts.map((text) => ({ text }));

// Process all inputs with controlled concurrency
const results = await chain.batch(inputs, {
  maxConcurrency: 5,   // max 5 parallel API calls
});
// results: string[] — one summary per input

Pattern 4: Streaming with Token Callbacks

import { ChatOpenAI } from "@langchain/openai";
import { ChatPromptTemplate } from "@langchain/core/prompts";
import { StringOutputParser } from "@langchain/core/output_parsers";

const chain = ChatPromptTemplate.fromTemplate("{input}")
  .pipe(new ChatOpenAI({ model: "gpt-4o-mini", streaming: true }))
  .pipe(new StringOutputParser());

// Stream string chunks
const stream = await chain.stream({ input: "Tell me a story" });
for await (const chunk of stream) {
  process.stdout.write(chunk);  // each chunk is a string fragment
}

Pattern 5: Retry with Exponential Backoff

import { ChatOpenAI } from "@langchain/openai";

// Built-in retry handles transient failures
const model = new ChatOpenAI({
  model: "gpt-4o-mini",
  maxRetries: 3,        // retries with exponential backoff
  timeout: 30000,       // 30s timeout per request
});

// Manual retry wrapper for custom logic
async function invokeWithRetry<T>(
  chain: { invoke: (input: any) => Promise<T> },
  input: any,
  maxRetries = 3,
): Promise<T> {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      return await chain.invoke(input);
    } catch (error: any) {
      if (attempt === maxRetries - 1) throw error;
      const delay = Math.min(1000 * 2 ** attempt, 30000);
      console.warn(`Retry ${attempt + 1}/${maxRetries} after ${delay}ms`);
      await new Promise((r) => setTimeout(r, delay));
    }
  }
  throw new Error("Unreachable");
}

Pattern 6: Caching (Python)

from langchain_openai import ChatOpenAI
from langchain_core.globals import set_llm_cache
from langchain_community.cache import SQLiteCache

# Enable persistent caching — identical inputs skip the API
set_llm_cache(SQLiteCache(database_path=".langchain_cache.db"))

llm = ChatOpenAI(model="gpt-4o-mini")

# First call: hits API (~500ms)
r1 = llm.invoke("What is 2+2?")

# Second identical call: cache hit (~0ms, no cost)
r2 = llm.invoke("What is 2+2?")

Pattern 7: RunnableLambda for Custom Logic

import { RunnableLambda } from "@langchain/core/runnables";

// Wrap any function as a Runnable to use in chains
const cleanInput = RunnableLambda.from((input: { text: string }) => ({
  text: input.text.trim().toLowerCase(),
}));

const addMetadata = RunnableLambda.from((result: string) => ({
  answer: result,
  timestamp: new Date().toISOString(),
  model: "gpt-4o-mini",
}));

const chain = cleanInput
  .pipe(prompt)
  .pipe(model)
  .pipe(new StringOutputParser())
  .pipe(addMetadata);

Anti-Patterns to Avoid

Anti-Pattern	Why	Better
Hardcoded API keys	Security risk	Use env vars + `dotenv`
No error handling	Silent failures	Use `.withFallbacks()` + try/catch
Sequential when parallel works	Slow	Use `RunnableParallel` or `.batch()`
Parsing raw LLM text	Fragile	Use `.withStructuredOutput(zodSchema)`
No timeout	Hanging requests	Set `timeout` on model constructor
No streaming in UIs	Bad UX	Use `.stream()` for user-facing output

Error Handling

Error	Cause	Fix
`ZodError`	LLM output doesn't match schema	Improve prompt or relax schema with `.optional()`
`RateLimitError`	API quota exceeded	Add `maxRetries`, use `.withFallbacks()`
`TimeoutError`	Slow response	Increase `timeout`, try smaller model
`OutputParserException`	Unparseable output	Switch to `.withStructuredOutput()`

Resources

Next Steps

Proceed to langchain-data-handling for data privacy patterns.

langchain-sdk-patterns