ai-provider-google-gemini-sdk
Google Gemini SDK Patterns
Quick Guide: Use the
@google/genaipackage (the unified SDK, NOT the deprecated@google/generative-ai) for all Gemini API interactions. All operations flow through a centralGoogleGenAIclient with service accessors:ai.modelsfor generation,ai.chatsfor multi-turn,ai.filesfor uploads,ai.cachesfor context caching. UseresponseMimeType: "application/json"withresponseJsonSchemafor structured output. Access response text viaresponse.text(property, not method). Streaming usesgenerateContentStreamreturning an async iterable -- iterate withfor await.
<critical_requirements>
CRITICAL: Before Using This Skill
All code must follow project conventions in CLAUDE.md (kebab-case, named exports, import ordering,
import type, named constants)
(You MUST use @google/genai (the new unified SDK) -- NOT the deprecated @google/generative-ai package)
(You MUST access response text via response.text (a property) -- NOT response.text() (the old SDK used a method call))
(You MUST pass model as a string parameter in every API call -- there is no getGenerativeModel() step)
(You MUST use config for all generation parameters (temperature, safetySettings, tools, systemInstruction) -- NOT top-level properties)
(You MUST never hardcode API keys -- use environment variables via process.env.GEMINI_API_KEY or GOOGLE_API_KEY)
</critical_requirements>
Auto-detection: Gemini, gemini, GoogleGenAI, @google/genai, ai.models.generateContent, generateContentStream, ai.chats, ai.files, ai.caches, gemini-2.5-flash, gemini-2.5-pro, gemini-2.0-flash, gemini-3-flash, gemini-embedding, GEMINI_API_KEY, GOOGLE_API_KEY, FunctionCallingConfigMode, createUserContent, createPartFromUri, responseMimeType, responseJsonSchema
When to use:
- Building applications that call Google Gemini models directly (Gemini 2.x, 2.5, 3.x)
- Processing multimodal input: images, video, audio, PDFs
- Implementing function calling / tool use with custom functions or built-in tools (Google Search, code execution)
- Extracting structured JSON data from LLM responses using response schemas
- Streaming text generation for user-facing output
- Creating embeddings for RAG pipelines or semantic search (text and multimodal)
- Caching large context (documents, code) to reduce cost and latency across multiple requests
- Multi-turn chat sessions with automatic history management
Key patterns covered:
- Client initialization and environment-based configuration
- Text generation with
ai.models.generateContent() - Streaming with
ai.models.generateContentStream()andfor await - Multimodal input (inline base64, file upload, URIs)
- Function calling with
FunctionDeclarationand manual tool loops - Structured output with
responseMimeType+responseJsonSchema+ Zod - Chat sessions with
ai.chats.create()andsendMessage() - Embeddings with
ai.models.embedContent()(text and multimodal) - Context caching with
ai.caches.create() - Safety settings per-request via
config.safetySettings
When NOT to use:
- Multi-provider applications requiring provider switching -- use a unified provider SDK
- React-specific chat UI hooks (
useChat) -- use a framework-integrated AI SDK - When you need features unique to another provider's API -- use that provider's SDK directly
Examples Index
- Core: Setup & Configuration -- Client init, text generation, system instructions, error handling
- Multimodal Input -- Inline images, file upload, video, audio, PDF,
createPartFromUri - Streaming --
generateContentStream,sendMessageStream, abort patterns - Function Calling / Tools --
FunctionDeclaration,FunctionCallingConfigMode, manual tool loop, built-in tools - Structured Output -- JSON mode, Zod schemas,
responseJsonSchema, enum extraction - Chat Sessions --
ai.chats.create(), multi-turn, streaming chat, history - Advanced: Embeddings, Caching & Safety -- Embeddings, context caching, safety settings, token counting
- Quick API Reference -- Model IDs, method signatures, config parameters, safety enums
Philosophy
The @google/genai SDK is Google's unified client for the Gemini API and Vertex AI. It replaces the deprecated @google/generative-ai package with a cleaner, centralized architecture.
Core principles:
- Centralized client -- A single
GoogleGenAIinstance provides all API services viaai.models,ai.chats,ai.files,ai.caches. No scattered manager classes. - Model-per-call -- Pass the model ID string in every API call rather than binding to a model instance. This simplifies multi-model usage.
- Config object pattern -- All generation parameters (
temperature,systemInstruction,tools,safetySettings) go inside aconfigobject, keeping the top-level call clean. - Native multimodal -- Images, video, audio, and PDFs are first-class inputs via inline data or file upload. Gemini models handle all modalities natively.
- Response as property -- Access
response.textas a property (not a method). Accessresponse.functionCallsfor tool calls.
When to use the Gemini SDK directly:
- You primarily use Google Gemini models
- You need multimodal input (images, video, audio, PDF) as a core feature
- You want built-in tools like Google Search and code execution
- You need context caching for large documents
- You want the simplest path to Gemini API features
When NOT to use:
- You need to switch between multiple providers -- use a unified SDK
- You want React-specific chat hooks -- use a framework-integrated AI SDK
- You need features unique to another provider's API -- use that provider's SDK directly
Core Patterns
Pattern 1: Client Setup
Initialize the GoogleGenAI client. It can auto-read GOOGLE_API_KEY from the environment.
// lib/gemini.ts
import { GoogleGenAI } from "@google/genai";
const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
export { ai };
// Auto-reads GOOGLE_API_KEY from environment
const ai = new GoogleGenAI({});
Why good: Minimal setup, env var auto-detected, named export
// BAD: Using the old deprecated SDK
import { GoogleGenerativeAI } from "@google/generative-ai";
const genAI = new GoogleGenerativeAI("hardcoded-key"); // WRONG
const model = genAI.getGenerativeModel({ model: "gemini-2.0-flash" });
Why bad: Old deprecated package, hardcoded API key, model binding step no longer needed
See: examples/core.md for Vertex AI setup, environment variables, error handling
Pattern 2: Text Generation
Pass model and contents directly -- no getGenerativeModel() step.
const response = await ai.models.generateContent({
model: "gemini-2.5-flash",
contents: "Explain TypeScript generics briefly.",
config: {
systemInstruction: "You are a concise coding tutor.",
temperature: 0.3,
},
});
console.log(response.text);
Why good: Model specified per-call, system instruction in config, response.text as property
// BAD: Old SDK patterns that don't work
const model = genAI.getGenerativeModel({ model: "gemini-2.0-flash" });
const result = await model.generateContent("Hello");
console.log(result.response.text()); // text() was a method in old SDK
Why bad: getGenerativeModel() doesn't exist in new SDK, text() is a property not a method
See: examples/core.md for system instructions, temperature, thinking config
Pattern 3: Streaming
Use generateContentStream and iterate with for await.
const response = await ai.models.generateContentStream({
model: "gemini-2.5-flash",
contents: "Write a short story about a robot.",
});
for await (const chunk of response) {
process.stdout.write(chunk.text ?? "");
}
Why good: Async iteration, progressive output for better UX, null-safe text access
// BAD: Not consuming the stream
const stream = await ai.models.generateContentStream({
model: "gemini-2.5-flash",
contents: "Hello",
});
// Stream object returned but never iterated -- response lost
Why bad: Stream must be consumed via for await, otherwise response is silently lost
See: examples/streaming.md for chat streaming, abort patterns
Pattern 4: Multimodal Input
Gemini natively handles images, video, audio, and PDFs. Use inline base64 or file upload.
import * as fs from "node:fs";
const imageBase64 = fs.readFileSync("photo.jpg", { encoding: "base64" });
const response = await ai.models.generateContent({
model: "gemini-2.5-flash",
contents: [
{ text: "Describe this image in detail." },
{ inlineData: { mimeType: "image/jpeg", data: imageBase64 } },
],
});
console.log(response.text);
Why good: Inline base64 for small files, explicit MIME type, mixed text + media content
For larger files (over ~20 MB), use ai.files.upload() and reference via URI:
import { createUserContent, createPartFromUri } from "@google/genai";
const uploadedFile = await ai.files.upload({
file: "large-video.mp4",
config: { mimeType: "video/mp4" },
});
const response = await ai.models.generateContent({
model: "gemini-2.5-flash",
contents: createUserContent([
"Summarize this video.",
createPartFromUri(uploadedFile.uri, uploadedFile.mimeType),
]),
});
See: examples/multimodal.md for audio, PDF, video timestamps, file lifecycle
Pattern 5: Function Calling / Tool Use
Define function declarations with parametersJsonSchema, then handle tool calls in a loop.
import { GoogleGenAI, FunctionCallingConfigMode } from "@google/genai";
const getWeatherDeclaration = {
name: "get_weather",
description: "Get current weather for a city",
parametersJsonSchema: {
type: "object",
properties: {
location: { type: "string", description: "City name" },
},
required: ["location"],
},
};
const response = await ai.models.generateContent({
model: "gemini-2.5-flash",
contents: "What is the weather in Tokyo?",
config: {
tools: [{ functionDeclarations: [getWeatherDeclaration] }],
toolConfig: {
functionCallingConfig: { mode: FunctionCallingConfigMode.AUTO },
},
},
});
if (response.functionCalls && response.functionCalls.length > 0) {
const call = response.functionCalls[0];
console.log(`Call: ${call.name}`, call.args);
}
Why good: Typed declarations, explicit mode, args directly accessible
See: examples/tools.md for multi-turn tool loop, parallel calls, built-in Google Search
Pattern 6: Structured Output with Zod
Use responseMimeType: "application/json" with responseJsonSchema for guaranteed JSON output.
import { z } from "zod";
import { zodToJsonSchema } from "zod-to-json-schema";
const EventSchema = z.object({
name: z.string(),
date: z.string(),
participants: z.array(z.string()),
});
const response = await ai.models.generateContent({
model: "gemini-2.5-flash",
contents: "Alice and Bob meet next Tuesday for lunch.",
config: {
responseMimeType: "application/json",
responseJsonSchema: zodToJsonSchema(EventSchema),
},
});
const event = EventSchema.parse(JSON.parse(response.text ?? "{}"));
Why good: Zod schema for type safety, zodToJsonSchema converts automatically, parse validates response
See: examples/structured-output.md for enum output, complex schemas, raw JSON schema
Pattern 7: Chat Sessions
Use ai.chats.create() for multi-turn with automatic history management.
const chat = ai.chats.create({
model: "gemini-2.5-flash",
config: {
systemInstruction: "You are a helpful coding assistant.",
},
});
const r1 = await chat.sendMessage({ message: "What is TypeScript?" });
console.log(r1.text);
const r2 = await chat.sendMessage({ message: "How do generics work?" });
console.log(r2.text);
// Chat automatically maintains conversation history
Why good: Automatic history tracking, system instruction in config, simple sendMessage API
See: examples/chat.md for pre-loaded history, streaming chat, function calling in chat
Pattern 8: Error Handling
Check response.text for null and inspect finishReason for safety blocks.
const response = await ai.models.generateContent({
model: "gemini-2.5-flash",
contents: userPrompt,
});
// Check if content was blocked by safety filters
const candidate = response.candidates?.[0];
if (!candidate || candidate.finishReason === "SAFETY") {
console.warn("Response blocked by safety filters");
console.warn("Safety ratings:", candidate?.safetyRatings);
return null;
}
// Check for truncation
if (candidate.finishReason === "MAX_TOKENS") {
console.warn("Response was truncated -- consider increasing maxOutputTokens");
}
console.log(response.text);
Why good: Checks safety filter blocks, handles truncation, inspects finish reason
See: examples/core.md for try/catch error handling, HTTP error codes
Performance Optimization
Model Selection for Cost/Speed
General purpose -> gemini-2.5-flash (best price/performance balance)
Complex reasoning -> gemini-2.5-pro (deep reasoning, 1M context)
Budget / high-volume -> gemini-2.5-flash-lite (cheapest, fastest)
Latest frontier -> gemini-3-flash-preview (most capable, preview)
Embeddings (text) -> gemini-embedding-001 (text only, stable)
Embeddings (multimodal) -> gemini-embedding-2-preview (text, image, video, audio, PDF)
Key Optimization Patterns
- Use context caching for repeated queries on the same large document -- reduces cost by up to 75%
- Batch embeddings by passing an array to
ai.models.embedContent()-- one API call for multiple inputs - Set
temperature: 0for deterministic output (enables server-side caching) - Use
gemini-2.5-flash-litefor high-volume, latency-sensitive workloads - Reduce embedding dimensions via
config.outputDimensionalityfor faster similarity search - Use streaming for user-facing responses to improve perceived latency
<decision_framework>
Decision Framework
Which Model to Choose
What is your task?
+-- General text generation -> gemini-2.5-flash (best balance)
+-- Complex reasoning / code -> gemini-2.5-pro (deep thinking, 1M context)
+-- High-volume / budget -> gemini-2.5-flash-lite (cheapest)
+-- Cutting-edge / experimental -> gemini-3-flash-preview
+-- Text embeddings -> gemini-embedding-001
+-- Multimodal embeddings -> gemini-embedding-2-preview
File Input Method
How large is your file?
+-- Under 20 MB -> Inline base64 (inlineData)
+-- 20 MB - 2 GB -> ai.files.upload() + createPartFromUri()
+-- Need to reuse across requests -> ai.files.upload() (persists 48 hours)
+-- External URL (Gemini 3+) -> createPartFromUri() with HTTP URL
Structured Output vs Free Text
Do you need a specific output shape?
+-- YES -> Set responseMimeType: "application/json"
| +-- Have a Zod schema? -> zodToJsonSchema() into responseJsonSchema
| +-- Raw JSON Schema? -> Pass directly to responseJsonSchema
+-- NO -> Default text output (no config needed)
Chat vs Single-Turn
Is this a multi-turn conversation?
+-- YES -> ai.chats.create() with sendMessage()
| +-- Need streaming? -> chat.sendMessageStream()
| +-- Need function calling? -> Pass tools in chat config
+-- NO -> ai.models.generateContent() (single call)
When to Use Context Caching
Are you sending the same large prefix repeatedly?
+-- YES -> Is it 1,024+ tokens (Flash) or 4,096+ (Pro)?
| +-- YES -> Use ai.caches.create() to cache the prefix
| +-- NO -> Too small to cache, just send inline
+-- NO -> Standard generateContent() calls
</decision_framework>
<red_flags>
RED FLAGS
High Priority Issues:
- Using
@google/generative-ai(the deprecated package) -- migrate to@google/genai - Calling
response.text()as a method -- it is a property (response.text) in the new SDK - Using
getGenerativeModel()-- the new SDK passes model as a string in each API call - Hardcoding API keys instead of using environment variables
- Not checking
finishReasonfor"SAFETY"or"MAX_TOKENS"-- safety blocks silently return no content
Medium Priority Issues:
- Putting
temperature,tools, orsystemInstructionat the top level instead of insideconfig - Not setting
config.mimeTypewhen uploading files -- the API may reject or misinterpret the file - Using
model.startChat()(old SDK) -- useai.chats.create()in the new SDK - Not consuming
generateContentStream()output -- the async iterable must be iterated - Ignoring uploaded file lifecycle -- files expire after 48 hours
Common Mistakes:
- Confusing
responseMimeType(for structured output) withmimeType(for file upload config) - Using
responseSchema(old SDK parameter name) instead ofresponseJsonSchema(new SDK) - Mixing old SDK patterns (
model.generateContent(prompt)) with new SDK (ai.models.generateContent({ model, contents })) - Passing plain Zod schemas to
responseJsonSchema-- you must convert withzodToJsonSchema()first - Using
content(singular) instead ofcontents(plural) ingenerateContent()
Gotchas & Edge Cases:
response.textreturnsnull(not empty string) when content is blocked by safety filters -- always null-check- Uploaded files via
ai.files.upload()expire after 48 hours -- there is no way to extend this, re-upload if needed - Context caching has minimum token requirements: 1,024 tokens for Flash models, 4,096 for Pro models -- smaller content cannot be cached
- Inline data has a 100 MB payload limit (50 MB for PDFs) -- use file upload for larger content
ai.files.upload()supports up to 2 GB per file, 20 GB per project- Embedding dimensions default to 3,072 -- use
config.outputDimensionalityto reduce for cost/speed - Gemini embedding models are NOT compatible with each other -- switching models requires re-embedding all data
FunctionCallingConfigMode.ANYforces the model to always call a function -- useAUTOto let the model decide- Chat history uses
role: "model"(not"assistant") for Gemini responses -- differs from other providers' convention - Safety filter defaults changed: Gemini 2.5+ and 3.x models default to
OFF(no blocking) unlike earlier models - The
sendMessage()method takes{ message: string }(object withmessagekey), not a plain string - When function calling returns multiple calls, process ALL of them before sending results back -- partial responses cause errors
zodToJsonSchema()comes from thezod-to-json-schemapackage (separate fromzod) -- requiresnpm install zod-to-json-schema
</red_flags>
<critical_reminders>
CRITICAL REMINDERS
All code must follow project conventions in CLAUDE.md (kebab-case, named exports, import ordering,
import type, named constants)
(You MUST use @google/genai (the new unified SDK) -- NOT the deprecated @google/generative-ai package)
(You MUST access response text via response.text (a property) -- NOT response.text() (the old SDK used a method call))
(You MUST pass model as a string parameter in every API call -- there is no getGenerativeModel() step)
(You MUST use config for all generation parameters (temperature, safetySettings, tools, systemInstruction) -- NOT top-level properties)
(You MUST never hardcode API keys -- use environment variables via process.env.GEMINI_API_KEY or GOOGLE_API_KEY)
Failure to follow these rules will produce broken integrations due to old SDK patterns or silent safety filter blocks.
</critical_reminders>
More from agents-inc/skills
web-animation-css-animations
CSS Animation patterns - transitions, keyframes, scroll-driven animations, @property, GPU-accelerated properties, accessibility with prefers-reduced-motion
22web-testing-playwright-e2e
Playwright E2E testing patterns - test structure, Page Object Model, locator strategies, assertions, network mocking, visual regression, parallel execution, fixtures, and configuration
19web-animation-view-transitions
View Transitions API patterns - same-document transitions, cross-document MPA transitions, shared element animations, pseudo-element styling, accessibility
18web-animation-framer-motion
Motion (formerly Framer Motion) animation patterns - motion components, variants, gestures, layout animations, scroll-linked animations, accessibility
18web-styling-cva
Class Variance Authority - type-safe component variant styling with cva(), compound variants, and VariantProps
17web-i18n-next-intl
Type-safe i18n for Next.js App Router
17