invoking-gemini

SKILL.md

Invoking Gemini

Delegate tasks to Google's Gemini models when they offer advantages over Claude.

When to Use Gemini

Structured outputs:

  • JSON Schema validation with property ordering guarantees
  • Pydantic model compliance
  • Strict schema adherence (enum values, required fields)

Cost optimization:

  • Parallel batch processing (Gemini 3 Flash is lightweight)
  • High-volume simple tasks
  • Budget-constrained operations

Google ecosystem:

  • Integration with Google services
  • Vertex AI workflows
  • Google-specific APIs

Multi-modal tasks:

  • Image analysis with JSON output
  • Video processing
  • Audio transcription with structure

Available Models

All Gemini 3 models are currently in preview. Use only these — no Gemini 2.x.

Text / Reasoning Models

gemini-3-flash-preview (Default / Recommended):

  • Gemini 3 Flash: Pro-level intelligence at Flash speed and pricing
  • 1M token context window, 64k output
  • Knowledge cutoff: Jan 2025
  • $0.50 input / $3.00 output per 1M tokens
  • Alias: flash

gemini-3.1-pro-preview:

  • Gemini 3.1 Pro: Best for complex tasks requiring broad world knowledge and advanced reasoning across modalities
  • 1M token context window, 64k output
  • Knowledge cutoff: Jan 2025
  • $2.00 / $12.00 per 1M tokens (<200k tokens); $4.00 / $18.00 (>200k tokens)
  • Alias: pro

gemini-3.1-flash-lite-preview:

  • Gemini 3.1 Flash-Lite: Workhorse model for cost-efficiency and high-volume tasks
  • 1M token context window, 64k output
  • Knowledge cutoff: Jan 2025
  • $0.25 (text, image, video), $0.50 (audio) input / $1.50 output per 1M tokens
  • Alias: lite

Image Generation Models

nano-banana-2 (Default image model):

  • Gemini 3.1 Flash Image — high-volume, high-efficiency image generation
  • API model: gemini-3.1-flash-image-preview
  • 128k input context, 32k output
  • $0.25 per 1M text input tokens / $0.067 per image output
  • Alias: image

nano-banana-pro:

  • Gemini 3 Pro Image — highest quality image generation with text rendering and multi-turn editing
  • API model: gemini-3-pro-image-preview
  • 65k input context, 32k output
  • $2.00 per 1M text input tokens / $0.134 per image output
  • Alias: image-pro

See references/models.md for full model details and pricing.

Setup

Prerequisites:

uv pip install requests pydantic
# google-generativeai only needed for direct API fallback:
# uv pip install google-generativeai

Credentials — Option A (recommended): Cloudflare AI Gateway

Requests are routed through Cloudflare AI Gateway, bypassing IP blocks and gaining caching, analytics, and rate limiting.

Create /mnt/project/proxy.env:

CF_ACCOUNT_ID=<your-cloudflare-account-id>
CF_GATEWAY_ID=<your-gateway-name>
CF_API_TOKEN=<your-cf-api-token>
# GOOGLE_API_KEY only needed if not using Cloudflare BYOK:
# GOOGLE_API_KEY=AIzaSy...
  • Get your Cloudflare Account ID: Cloudflare dashboard → right sidebar
  • Create a gateway: Cloudflare dashboard → AI Gateway → Create gateway
  • Generate an API token: https://dash.cloudflare.com/profile/api-tokens
  • Store your Gemini key in the gateway (BYOK): AI Gateway → your gateway → API Keys

Credentials — Option B: Direct Google API (fallback)

If no proxy.env is found, the client falls back to direct Google API access:

Basic Usage

Import the client:

import sys
sys.path.append('/mnt/skills/invoking-gemini/scripts')
from gemini_client import invoke_gemini

# Simple prompt
response = invoke_gemini(
    prompt="Explain quantum computing in 3 bullet points",
    model="gemini-3-flash-preview"
)
print(response)

Structured Output

Use Pydantic models for guaranteed JSON Schema compliance:

from pydantic import BaseModel, Field
from gemini_client import invoke_with_structured_output

class BookAnalysis(BaseModel):
    title: str
    genre: str = Field(description="Primary genre")
    key_themes: list[str] = Field(max_length=5)
    rating: int = Field(ge=1, le=5)

result = invoke_with_structured_output(
    prompt="Analyze the book '1984' by George Orwell",
    pydantic_model=BookAnalysis
)

# result is a BookAnalysis instance
print(result.title)  # "1984"
print(result.genre)  # "Dystopian Fiction"

Advantages over Claude:

  • Guaranteed property ordering in JSON
  • Strict enum enforcement
  • Native schema validation (no prompt engineering)
  • Lower cost for simple extractions

Parallel Invocation

Process multiple prompts concurrently:

from gemini_client import invoke_parallel

prompts = [
    "Summarize the plot of Hamlet",
    "Summarize the plot of Macbeth",
    "Summarize the plot of Othello"
]

results = invoke_parallel(
    prompts=prompts,
    model="gemini-3-flash-preview"
)

for prompt, result in zip(prompts, results):
    print(f"Q: {prompt[:30]}...")
    print(f"A: {result[:100]}...\n")

Use cases:

  • Batch classification tasks
  • Data labeling
  • Multiple independent analyses
  • A/B testing prompts

Error Handling

The client handles common errors:

from gemini_client import invoke_gemini

response = invoke_gemini(
    prompt="Your prompt here",
    model="gemini-3-flash-preview"
)

if response is None:
    print("Error: API call failed")
    # Check project knowledge file for valid google_api_key

Common issues:

  • Missing API key → Add GOOGLE_API_KEY.txt to project knowledge (see Setup above)
  • Invalid model → Raises ValueError
  • Rate limit → Automatically retries with backoff
  • Network error → Returns None after retries

Advanced Features

Custom Generation Config

response = invoke_gemini(
    prompt="Write a haiku",
    model="gemini-3-flash-preview",
    temperature=0.9,
    max_output_tokens=100,
    top_p=0.95
)

Multi-modal Input

# Image analysis with structured output
from pydantic import BaseModel

class ImageDescription(BaseModel):
    objects: list[str]
    scene: str
    colors: list[str]

result = invoke_with_structured_output(
    prompt="Describe this image",
    pydantic_model=ImageDescription,
    image_path="/mnt/user-data/uploads/photo.jpg"
)

See references/advanced.md for more patterns.

Image Generation

Generate images using Gemini's native image models:

from gemini_client import generate_image

# Basic generation
result = generate_image("A watercolor painting of a mountain lake at sunset")
print(result["path"])     # /mnt/user-data/outputs/gemini_image_1740000000.png
print(result["caption"])  # Optional text the model returns alongside the image

Model Selection

# Fast generation (default) — nano-banana-2 → gemini-3.1-flash-image-preview
result = generate_image("A red bicycle", model="nano-banana-2")

# High-fidelity — nano-banana-pro → gemini-3-pro-image-preview
result = generate_image("A red bicycle", model="image-pro")

Custom Output Path

result = generate_image(
    "A logo for a coffee shop called 'Bean There'",
    output_path="/mnt/user-data/outputs/coffee_logo.png"
)

Effective Prompt Patterns

  • Be specific about style: "A watercolor painting of..." vs "A picture of..."
  • Include composition details: "centered, wide angle, high contrast"
  • Specify text rendering: "A poster with the text 'SALE' in bold red letters"
  • Multi-turn editing: Generate once, then refine with follow-up prompts

Return Value

{
    "path": "/mnt/user-data/outputs/gemini_image_1740000000.png",
    "caption": "Optional descriptive text from the model"  # or None
}

Returns None on failure (credentials missing, API error, no image in response).

Comparison: Gemini vs Claude

Use Gemini when:

  • Structured output is primary goal
  • Cost is a constraint
  • Property ordering matters
  • Batch processing many simple tasks

Use Claude when:

  • Complex reasoning required
  • Long context needed (200K tokens)
  • Code generation quality matters
  • Nuanced instruction following

Use both:

  • Claude for planning/reasoning
  • Gemini for structured extraction
  • Parallel workflows with different strengths

Token Efficiency Pattern

Gemini 3 Flash is cost-effective for sub-tasks:

# Claude (you) plans the approach
# Gemini executes structured extractions

data_points = []
for file in uploaded_files:
    # Gemini extracts structured data
    result = invoke_with_structured_output(
        prompt=f"Extract contact info from {file}",
        pydantic_model=ContactInfo
    )
    data_points.append(result)

# Claude synthesizes results
# ... your analysis here ...

Limitations

Not suitable for:

  • Tasks requiring deep reasoning
  • Long context (>1M tokens)
  • Complex code generation
  • Subjective creative writing

Token limits:

  • gemini-3-flash-preview: ~1M input, 64k output
  • gemini-3.1-pro-preview: ~1M input, 64k output (2x pricing above 200k)
  • gemini-3.1-flash-lite-preview: ~1M input, 64k output

Rate limits:

  • Vary by API tier
  • Client handles automatic retry

Examples

See references/examples.md for:

  • Data extraction from documents
  • Batch classification
  • Multi-modal analysis
  • Hybrid Claude+Gemini workflows

Troubleshooting

"No credentials configured":

  • Create /mnt/project/proxy.env with CF_ACCOUNT_ID, CF_GATEWAY_ID, CF_API_TOKEN
  • Or add GOOGLE_API_KEY.txt for direct API access
  • See Setup section above for details

CF Gateway 401/403:

  • Verify your CF_API_TOKEN has AI Gateway permissions
  • Check that gateway authentication is enabled in the Cloudflare dashboard
  • If not using BYOK, add GOOGLE_API_KEY to proxy.env

CF Gateway 429 (rate limited):

  • The client automatically retries with exponential backoff
  • Check your gateway's rate limit settings in Cloudflare dashboard

Import errors:

uv pip install requests pydantic
# For direct API fallback only:
uv pip install google-generativeai

Schema validation failures:

  • Check Pydantic model definitions
  • Ensure prompt is clear about expected structure
  • Add examples to prompt if needed

Cost Comparison

All Gemini 3 models (preview, Jan 2025 cutoff):

Model Input / 1M tokens Output / 1M tokens
Gemini 3 Flash (gemini-3-flash-preview) $0.50 $3.00
Gemini 3.1 Pro (gemini-3.1-pro-preview) $2.00 (<200k) / $4.00 (>200k) $12.00 / $18.00
Gemini 3.1 Flash-Lite (gemini-3.1-flash-lite-preview) $0.25 text/image/video, $0.50 audio $1.50
Nano Banana 2 / 3.1 Flash Image (gemini-3.1-flash-image-preview) $0.25 text $0.067 per image
Nano Banana Pro / 3 Pro Image (gemini-3-pro-image-preview) $2.00 text $0.134 per image

Strategy: Use Flash-Lite for high-volume simple tasks, 3 Flash for balanced performance, 3.1 Pro for complex reasoning.

Weekly Installs
26
GitHub Stars
111
First Seen
Jan 25, 2026
Installed on
opencode25
claude-code25
codex25
gemini-cli25
github-copilot24
kimi-cli24