gemini-3-pro-api
Gemini 3 Pro API Integration
Comprehensive guide for integrating Google's Gemini 3 Pro API/SDK into your applications. Covers setup, authentication, text generation, advanced reasoning with dynamic thinking, chat applications, streaming responses, and production deployment patterns.
Overview
Gemini 3 Pro (gemini-3-pro-preview) is Google's most intelligent model designed for complex tasks requiring advanced reasoning and broad world knowledge. This skill provides complete workflows for API integration using Python or Node.js SDKs.
Key Capabilities
- Massive Context: 1M token input, 64k token output
- Dynamic Thinking: Adaptive reasoning with high/low modes
- Streaming: Real-time token delivery
- Chat: Multi-turn conversations with history
- Production-Ready: Error handling, retry logic, cost optimization
When to Use This Skill
- Setting up Gemini 3 Pro API access
- Building text generation applications
- Implementing chat applications with reasoning
- Configuring advanced thinking modes
- Deploying production Gemini applications
- Optimizing API usage and costs
Quick Start
Prerequisites
- API Key: Get from Google AI Studio
- Python 3.9+ or Node.js 18+
Python Quick Start
# Install SDK
pip install google-genai
# Basic usage
import google.generativeai as genai
genai.configure(api_key="YOUR_API_KEY")
model = genai.GenerativeModel("gemini-3-pro-preview")
response = model.generate_content("Explain quantum computing")
print(response.text)
Node.js Quick Start
// Install SDK
npm install @google/generative-ai
// Basic usage
import { GoogleGenerativeAI } from "@google/generative-ai";
const genAI = new GoogleGenerativeAI("YOUR_API_KEY");
const model = genAI.getGenerativeModel({ model: "gemini-3-pro-preview" });
const result = await model.generateContent("Explain quantum computing");
console.log(result.response.text());
Core Workflows
Workflow 1: Quick Start Setup
Goal: Get from zero to first successful API call in < 5 minutes.
Steps:
-
Get API Key
- Visit Google AI Studio
- Create or select project
- Generate API key
- Copy key securely
-
Install SDK
# Python pip install google-genai # Node.js npm install @google/generative-ai -
Configure Authentication
# Python - using environment variable (recommended) import os import google.generativeai as genai genai.configure(api_key=os.getenv("GEMINI_API_KEY"))// Node.js - using environment variable (recommended) const genAI = new GoogleGenerativeAI(process.env.GEMINI_API_KEY); -
Make First API Call
# Python model = genai.GenerativeModel("gemini-3-pro-preview") response = model.generate_content("Write a haiku about coding") print(response.text) -
Verify Success
- Check response received
- Verify text output
- Note token usage
- Confirm API key working
Expected Outcome: Working API integration in under 5 minutes.
Workflow 2: Chat Application Development
Goal: Build a production-ready chat application with conversation history and streaming.
Steps:
-
Initialize Chat Model
# Python model = genai.GenerativeModel( "gemini-3-pro-preview", generation_config={ "thinking_level": "high", # Dynamic reasoning "temperature": 1.0, # Keep at 1.0 for best results "max_output_tokens": 8192 } ) -
Start Chat Session
chat = model.start_chat(history=[]) -
Send Message with Streaming
response = chat.send_message( "Explain how neural networks learn", stream=True ) # Stream tokens in real-time for chunk in response: print(chunk.text, end="", flush=True) -
Manage Conversation History
# History is automatically maintained # Access it anytime print(f"Conversation turns: {len(chat.history)}") # Continue conversation response = chat.send_message("Can you give an example?") -
Handle Thought Signatures
- SDKs handle automatically in standard chat flows
- No manual intervention needed for basic use
- See
references/thought-signatures.mdfor advanced cases
-
Implement Error Handling
import time from google.api_core import retry, exceptions @retry.Retry(predicate=retry.if_exception_type( exceptions.ResourceExhausted, exceptions.ServiceUnavailable )) def send_with_retry(chat, message): return chat.send_message(message) try: response = send_with_retry(chat, user_input) except exceptions.GoogleAPIError as e: print(f"API error: {e}")
Expected Outcome: Production-ready chat application with streaming, history, and error handling.
Workflow 3: Production Deployment
Goal: Deploy Gemini 3 Pro integration with monitoring, cost control, and reliability.
Steps:
-
Setup Authentication (Production)
# Use environment variables (never hardcode keys) import os from pathlib import Path # Option 1: Environment variable api_key = os.getenv("GEMINI_API_KEY") # Option 2: Secrets manager (recommended for production) # Use Google Secret Manager, AWS Secrets Manager, etc. -
Configure Production Settings
model = genai.GenerativeModel( "gemini-3-pro-preview", generation_config={ "thinking_level": "high", # or "low" for simple tasks "temperature": 1.0, # CRITICAL: Keep at 1.0 "max_output_tokens": 4096, "top_p": 0.95, "top_k": 40 }, safety_settings={ # Configure content filtering as needed } ) -
Implement Comprehensive Error Handling
from google.api_core import exceptions, retry import logging logging.basicConfig(level=logging.INFO) logger = logging.getLogger(__name__) def generate_with_fallback(prompt, max_retries=3): @retry.Retry( predicate=retry.if_exception_type( exceptions.ResourceExhausted, exceptions.ServiceUnavailable, exceptions.DeadlineExceeded ), initial=1.0, maximum=10.0, multiplier=2.0, deadline=60.0 ) def _generate(): return model.generate_content(prompt) try: return _generate() except exceptions.InvalidArgument as e: logger.error(f"Invalid argument: {e}") raise except exceptions.PermissionDenied as e: logger.error(f"Permission denied: {e}") raise except Exception as e: logger.error(f"Unexpected error: {e}") # Fallback to simpler model or cached response return None -
Monitor Usage and Costs
def log_usage(response): usage = response.usage_metadata logger.info(f"Tokens - Input: {usage.prompt_token_count}, " f"Output: {usage.candidates_token_count}, " f"Total: {usage.total_token_count}") # Estimate cost (for prompts ≤200k tokens) input_cost = (usage.prompt_token_count / 1_000_000) * 2.00 output_cost = (usage.candidates_token_count / 1_000_000) * 12.00 total_cost = input_cost + output_cost logger.info(f"Estimated cost: ${total_cost:.6f}") response = model.generate_content(prompt) log_usage(response) -
Implement Rate Limiting
import time from collections import deque class RateLimiter: def __init__(self, max_requests_per_minute=60): self.max_rpm = max_requests_per_minute self.requests = deque() def wait_if_needed(self): now = time.time() # Remove requests older than 1 minute while self.requests and self.requests[0] < now - 60: self.requests.popleft() # Check if at limit if len(self.requests) >= self.max_rpm: sleep_time = 60 - (now - self.requests[0]) if sleep_time > 0: time.sleep(sleep_time) self.requests.append(now) limiter = RateLimiter(max_requests_per_minute=60) def generate_with_rate_limit(prompt): limiter.wait_if_needed() return model.generate_content(prompt) -
Setup Logging and Monitoring
import logging from datetime import datetime # Configure logging logging.basicConfig( level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s', handlers=[ logging.FileHandler('gemini_api.log'), logging.StreamHandler() ] ) logger = logging.getLogger(__name__) def monitored_generate(prompt): start_time = datetime.now() try: response = model.generate_content(prompt) duration = (datetime.now() - start_time).total_seconds() logger.info(f"Success - Duration: {duration}s, " f"Tokens: {response.usage_metadata.total_token_count}") return response except Exception as e: duration = (datetime.now() - start_time).total_seconds() logger.error(f"Failed - Duration: {duration}s, Error: {e}") raise
Expected Outcome: Production-ready deployment with monitoring, cost control, error handling, and rate limiting.
Thinking Levels
Dynamic Thinking System
Gemini 3 Pro introduces thinking_level to control reasoning depth:
thinking_level: "high" (default)
- Maximum reasoning depth
- Best quality for complex tasks
- Slower first-token response
- Higher cost
- Use for: Complex reasoning, coding, analysis, research
thinking_level: "low"
- Minimal reasoning overhead
- Faster response
- Lower cost
- Simpler output
- Use for: Simple questions, factual answers, quick queries
Configuration
# Python
model = genai.GenerativeModel(
"gemini-3-pro-preview",
generation_config={
"thinking_level": "high" # or "low"
}
)
// Node.js
const model = genAI.getGenerativeModel({
model: "gemini-3-pro-preview",
generationConfig: {
thinking_level: "high" // or "low"
}
});
Critical Notes
⚠️ Temperature MUST stay at 1.0 - Changing temperature can cause looping or degraded performance on complex reasoning tasks.
⚠️ Cannot combine thinking_level with legacy thinking_budget parameter.
See references/thinking-levels.md for detailed guide.
Streaming Responses
Python Streaming
response = model.generate_content(
"Write a long article about AI",
stream=True
)
for chunk in response:
print(chunk.text, end="", flush=True)
Node.js Streaming
const result = await model.generateContentStream("Write a long article about AI");
for await (const chunk of result.stream) {
process.stdout.write(chunk.text());
}
Benefits
- Lower perceived latency
- Real-time user feedback
- Better UX for long responses
- Can process tokens as they arrive
See references/streaming.md for advanced patterns.
Cost Optimization
Pricing (Gemini 3 Pro)
| Context Size | Input | Output |
|---|---|---|
| ≤ 200k tokens | $2/1M | $12/1M |
| > 200k tokens | $4/1M | $18/1M |
Optimization Strategies
- Keep prompts under 200k tokens (50% cheaper)
- Use
thinking_level: "low"for simple tasks (faster, lower cost) - Implement context caching for reusable contexts (see
gemini-3-advancedskill) - Monitor token usage and set budgets
- Use Gemini 1.5 Flash for simple tasks (20x cheaper)
See references/best-practices.md for comprehensive cost optimization.
Model Selection
Gemini 3 Pro vs Other Models
| Model | Context | Output | Input Price | Best For |
|---|---|---|---|---|
| gemini-3-pro-preview | 1M | 64k | $2-4/1M | Complex reasoning, coding |
| gemini-1.5-pro | 1M | 8k | $7-14/1M | General use, multimodal |
| gemini-1.5-flash | 1M | 8k | $0.35-0.70/1M | Simple tasks, cost-sensitive |
When to Use Gemini 3 Pro
✅ Complex reasoning tasks ✅ Advanced coding problems ✅ Long-context analysis (up to 1M tokens) ✅ Large output requirements (up to 64k tokens) ✅ Tasks requiring dynamic thinking
When to Use Alternatives
- Gemini 1.5 Flash: Simple tasks, cost-sensitive applications
- Gemini 1.5 Pro: Multimodal tasks, general use
- Gemini 2.5 models: Experimental features, specific capabilities
Error Handling
Common Errors
| Error | Cause | Solution |
|---|---|---|
ResourceExhausted |
Rate limit exceeded | Implement retry with backoff |
InvalidArgument |
Invalid parameters | Validate input, check docs |
PermissionDenied |
Invalid API key | Check authentication |
DeadlineExceeded |
Request timeout | Reduce context, retry |
Production Error Handling
from google.api_core import exceptions, retry
@retry.Retry(
predicate=retry.if_exception_type(
exceptions.ResourceExhausted,
exceptions.ServiceUnavailable
),
initial=1.0,
maximum=60.0,
multiplier=2.0
)
def safe_generate(prompt):
try:
return model.generate_content(prompt)
except exceptions.InvalidArgument as e:
logger.error(f"Invalid argument: {e}")
raise
except exceptions.PermissionDenied as e:
logger.error(f"Permission denied - check API key: {e}")
raise
except Exception as e:
logger.error(f"Unexpected error: {e}")
raise
See references/error-handling.md for comprehensive patterns.
References
Setup & Configuration
- Setup Guide - Installation, authentication, configuration
- Best Practices - Optimization, cost control, tips
Features
- Text Generation - Detailed text generation patterns
- Chat Patterns - Chat conversation management
- Thinking Levels - Dynamic thinking system guide
- Streaming - Streaming response patterns
Production
- Error Handling - Error handling and retry strategies
Official Resources
Next Steps
After Basic Setup
- Explore chat applications - Build conversational interfaces
- Add multimodal capabilities - Use
gemini-3-multimodalskill - Add image generation - Use
gemini-3-image-generationskill - Add advanced features - Use
gemini-3-advancedskill (caching, tools, batch)
Common Integration Patterns
- Simple Chatbot: This skill only
- Multimodal Assistant: This skill +
gemini-3-multimodal - Creative Bot: This skill +
gemini-3-image-generation - Production App: All 4 Gemini 3 skills
Troubleshooting
Issue: API key not working
Solution: Verify API key in Google AI Studio, check environment variable
Issue: Rate limit errors
Solution: Implement rate limiting, upgrade to paid tier, reduce request frequency
Issue: Slow responses
Solution: Use thinking_level: "low" for simple tasks, enable streaming, reduce context size
Issue: High costs
Solution: Keep prompts under 200k tokens, use appropriate thinking level, consider Gemini 1.5 Flash for simple tasks
Issue: Temperature warnings
Solution: Keep temperature at 1.0 (default) - do not modify for complex reasoning tasks
Summary
This skill provides everything needed to integrate Gemini 3 Pro API into your applications:
✅ Quick setup (< 5 minutes) ✅ Production-ready chat applications ✅ Dynamic thinking configuration ✅ Streaming responses ✅ Error handling and retry logic ✅ Cost optimization strategies ✅ Monitoring and logging patterns
For multimodal, image generation, and advanced features, see the companion skills.
Ready to build? Start with Workflow 1: Quick Start Setup above!