skills/smithery.ai/jony2176-cloud-paid-models-integration

jony2176-cloud-paid-models-integration

SKILL.md

Paid AI Models Integration

Complete guide for integrating commercial AI model APIs with Python applications.

Overview

This skill covers integration with major paid AI model providers:

  • Anthropic Claude - Advanced reasoning and coding
  • OpenAI - GPT-4o, GPT-4, GPT-3.5
  • Google Gemini - Multimodal capabilities
  • Groq - Fast inference (paid tier)
  • Amazon Bedrock - AWS-hosted models
  • Azure OpenAI - Enterprise OpenAI

Anthropic Claude Integration

Setup

# Install SDK
pip install anthropic

# Set API key
export ANTHROPIC_API_KEY='your-api-key'

Basic Usage

import anthropic

client = anthropic.Anthropic(
    api_key="your-api-key"
)

# Simple message
response = client.messages.create(
    model="claude-3-7-sonnet-20250219",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Explain quantum computing"}
    ]
)

print(response.content[0].text)

Advanced Features

from anthropic import Anthropic

client = Anthropic()

# Streaming response
with client.messages.stream(
    model="claude-3-7-sonnet-20250219",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Write a story"}]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

# With system prompt
response = client.messages.create(
    model="claude-3-7-sonnet-20250219",
    max_tokens=2048,
    system="You are a Python expert. Provide clear, well-documented code.",
    messages=[
        {"role": "user", "content": "Write a binary search function"}
    ]
)

Tool Use (Function Calling)

import anthropic

client = anthropic.Anthropic()

# Define tools
tools = [
    {
        "name": "get_weather",
        "description": "Get weather for a location",
        "input_schema": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "City name"
                }
            },
            "required": ["location"]
        }
    }
]

# Request with tools
response = client.messages.create(
    model="claude-3-7-sonnet-20250219",
    max_tokens=1024,
    tools=tools,
    messages=[
        {"role": "user", "content": "What's the weather in Paris?"}
    ]
)

# Process tool calls
for content in response.content:
    if content.type == "tool_use":
        print(f"Tool: {content.name}")
        print(f"Input: {content.input}")

        # Execute tool and send result back
        tool_result = get_weather(content.input["location"])

        # Continue conversation with tool result
        response = client.messages.create(
            model="claude-3-7-sonnet-20250219",
            max_tokens=1024,
            tools=tools,
            messages=[
                {"role": "user", "content": "What's the weather in Paris?"},
                {"role": "assistant", "content": response.content},
                {
                    "role": "user",
                    "content": [
                        {
                            "type": "tool_result",
                            "tool_use_id": content.id,
                            "content": tool_result
                        }
                    ]
                }
            ]
        )

Vision (Image Analysis)

import base64
import anthropic

client = anthropic.Anthropic()

# Read image
with open("image.jpg", "rb") as f:
    image_data = base64.b64encode(f.read()).decode("utf-8")

response = client.messages.create(
    model="claude-3-7-sonnet-20250219",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/jpeg",
                        "data": image_data
                    }
                },
                {
                    "type": "text",
                    "text": "Describe this image"
                }
            ]
        }
    ]
)

print(response.content[0].text)

Prompt Caching (Cost Optimization)

# Enable prompt caching for repeated content
response = client.messages.create(
    model="claude-3-7-sonnet-20250219",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": "You are an expert in Python programming...",
            "cache_control": {"type": "ephemeral"}
        }
    ],
    messages=[
        {"role": "user", "content": "Explain decorators"}
    ]
)

# Check cache usage
print(f"Cache read tokens: {response.usage.cache_read_input_tokens}")
print(f"Cache creation tokens: {response.usage.cache_creation_input_tokens}")

OpenAI Integration

Setup

pip install openai

export OPENAI_API_KEY='your-api-key'

Basic Usage

from openai import OpenAI

client = OpenAI(api_key="your-api-key")

# Chat completion
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant"},
        {"role": "user", "content": "Explain async programming"}
    ],
    temperature=0.7,
    max_tokens=1000
)

print(response.choices[0].message.content)

Streaming

stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Write a poem"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Function Calling

functions = [
    {
        "name": "get_stock_price",
        "description": "Get current stock price",
        "parameters": {
            "type": "object",
            "properties": {
                "symbol": {
                    "type": "string",
                    "description": "Stock symbol (e.g., AAPL)"
                }
            },
            "required": ["symbol"]
        }
    }
]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What's the price of Apple stock?"}],
    functions=functions,
    function_call="auto"
)

# Process function call
if response.choices[0].message.function_call:
    function_call = response.choices[0].message.function_call
    print(f"Function: {function_call.name}")
    print(f"Arguments: {function_call.arguments}")

Vision

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://example.com/image.jpg"
                    }
                }
            ]
        }
    ]
)

Embeddings

response = client.embeddings.create(
    model="text-embedding-3-large",
    input="Your text here"
)

embedding = response.data[0].embedding
print(f"Embedding dimensions: {len(embedding)}")

Assistants API

# Create assistant
assistant = client.beta.assistants.create(
    name="Math Tutor",
    instructions="You are a math tutor. Help with math problems.",
    model="gpt-4o",
    tools=[{"type": "code_interpreter"}]
)

# Create thread
thread = client.beta.threads.create()

# Add message
message = client.beta.threads.messages.create(
    thread_id=thread.id,
    role="user",
    content="Solve: 3x + 11 = 14"
)

# Run assistant
run = client.beta.threads.runs.create(
    thread_id=thread.id,
    assistant_id=assistant.id
)

# Wait for completion
import time
while run.status != "completed":
    time.sleep(1)
    run = client.beta.threads.runs.retrieve(
        thread_id=thread.id,
        run_id=run.id
    )

# Get messages
messages = client.beta.threads.messages.list(thread_id=thread.id)
for msg in messages.data:
    print(f"{msg.role}: {msg.content[0].text.value}")

Google Gemini Integration

Setup

pip install google-generativeai

export GOOGLE_API_KEY='your-api-key'

Basic Usage

import google.generativeai as genai

genai.configure(api_key="your-api-key")

# Create model
model = genai.GenerativeModel('gemini-2.0-flash-exp')

# Generate content
response = model.generate_content("Explain machine learning")
print(response.text)

Streaming

response = model.generate_content(
    "Write a long story",
    stream=True
)

for chunk in response:
    print(chunk.text, end="", flush=True)

Multimodal (Vision + Text)

import PIL.Image

# Load image
image = PIL.Image.open('photo.jpg')

# Analyze image
model = genai.GenerativeModel('gemini-2.0-flash-exp')
response = model.generate_content([
    "What's in this image?",
    image
])

print(response.text)

Function Calling

def get_exchange_rate(currency_from: str, currency_to: str) -> dict:
    """Get exchange rate between currencies"""
    return {"rate": 1.12, "from": currency_from, "to": currency_to}

model = genai.GenerativeModel(
    'gemini-2.0-flash-exp',
    tools=[get_exchange_rate]
)

response = model.generate_content(
    "What's the exchange rate from USD to EUR?"
)

# Check for function calls
for part in response.parts:
    if fn := part.function_call:
        print(f"Calling: {fn.name}")
        print(f"Args: {fn.args}")

Groq Integration

Setup

pip install groq

export GROQ_API_KEY='your-api-key'

Basic Usage

from groq import Groq

client = Groq(api_key="your-api-key")

# Fast inference
response = client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=[
        {"role": "system", "content": "You are a helpful assistant"},
        {"role": "user", "content": "Explain neural networks"}
    ],
    temperature=0.7,
    max_tokens=1000
)

print(response.choices[0].message.content)

Streaming

stream = client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Amazon Bedrock Integration

Setup

pip install boto3

# Configure AWS credentials
aws configure

Basic Usage

import boto3
import json

bedrock = boto3.client(
    service_name='bedrock-runtime',
    region_name='us-east-1'
)

# Claude on Bedrock
body = json.dumps({
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 1024,
    "messages": [
        {"role": "user", "content": "Explain cloud computing"}
    ]
})

response = bedrock.invoke_model(
    modelId='anthropic.claude-3-sonnet-20240229-v1:0',
    body=body
)

response_body = json.loads(response['body'].read())
print(response_body['content'][0]['text'])

Azure OpenAI Integration

Setup

pip install openai

export AZURE_OPENAI_API_KEY='your-api-key'
export AZURE_OPENAI_ENDPOINT='https://your-resource.openai.azure.com/'

Basic Usage

from openai import AzureOpenAI

client = AzureOpenAI(
    api_key="your-api-key",
    api_version="2024-02-15-preview",
    azure_endpoint="https://your-resource.openai.azure.com/"
)

response = client.chat.completions.create(
    model="gpt-4o",  # Your deployment name
    messages=[
        {"role": "user", "content": "Hello!"}
    ]
)

print(response.choices[0].message.content)

Unified Interface with Agno

Multi-Provider Support

from agno.agent import Agent
from agno.models.anthropic import Claude
from agno.models.openai import OpenAIChat
from agno.models.gemini import Gemini
from agno.models.groq import Groq

# Create agents with different providers
claude_agent = Agent(
    model=Claude(id="claude-3-7-sonnet-latest"),
    description="Claude-powered agent"
)

openai_agent = Agent(
    model=OpenAIChat(id="gpt-4o"),
    description="OpenAI-powered agent"
)

gemini_agent = Agent(
    model=Gemini(id="gemini-2.0-flash-exp"),
    description="Gemini-powered agent"
)

groq_agent = Agent(
    model=Groq(id="llama-3.3-70b-versatile"),
    description="Groq-powered agent"
)

# Use the same interface
for agent in [claude_agent, openai_agent, gemini_agent, groq_agent]:
    response = agent.run("Explain AI")
    print(f"{agent.description}: {response.content[:100]}...")

Production Patterns

Retry Logic with Exponential Backoff

import time
from functools import wraps
from anthropic import Anthropic, RateLimitError, APIError

def retry_with_backoff(max_retries=3, initial_delay=1):
    """Retry decorator with exponential backoff"""
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            delay = initial_delay
            for attempt in range(max_retries):
                try:
                    return func(*args, **kwargs)
                except RateLimitError as e:
                    if attempt == max_retries - 1:
                        raise
                    print(f"Rate limited, retrying in {delay}s...")
                    time.sleep(delay)
                    delay *= 2
                except APIError as e:
                    if attempt == max_retries - 1:
                        raise
                    print(f"API error, retrying in {delay}s...")
                    time.sleep(delay)
                    delay *= 2
        return wrapper
    return decorator

@retry_with_backoff(max_retries=3)
def call_claude(prompt: str):
    client = Anthropic()
    response = client.messages.create(
        model="claude-3-7-sonnet-20250219",
        max_tokens=1024,
        messages=[{"role": "user", "content": prompt}]
    )
    return response.content[0].text

Rate Limiting

import asyncio
from asyncio import Semaphore
from anthropic import AsyncAnthropic

class RateLimitedClient:
    """Rate-limited API client"""

    def __init__(self, max_concurrent=5, requests_per_minute=60):
        self.client = AsyncAnthropic()
        self.semaphore = Semaphore(max_concurrent)
        self.requests_per_minute = requests_per_minute
        self.request_times = []

    async def call(self, prompt: str):
        """Make rate-limited API call"""
        async with self.semaphore:
            # Wait if we've hit rate limit
            await self._wait_if_needed()

            response = await self.client.messages.create(
                model="claude-3-7-sonnet-20250219",
                max_tokens=1024,
                messages=[{"role": "user", "content": prompt}]
            )

            self.request_times.append(asyncio.get_event_loop().time())
            return response.content[0].text

    async def _wait_if_needed(self):
        """Wait if rate limit would be exceeded"""
        now = asyncio.get_event_loop().time()
        minute_ago = now - 60

        # Remove old requests
        self.request_times = [t for t in self.request_times if t > minute_ago]

        # Wait if needed
        if len(self.request_times) >= self.requests_per_minute:
            sleep_time = 60 - (now - self.request_times[0])
            if sleep_time > 0:
                await asyncio.sleep(sleep_time)

Cost Tracking

from dataclasses import dataclass
from typing import Dict

@dataclass
class TokenUsage:
    """Track token usage"""
    input_tokens: int = 0
    output_tokens: int = 0
    cache_read_tokens: int = 0
    cache_creation_tokens: int = 0

class CostTracker:
    """Track API costs"""

    # Pricing per 1M tokens (adjust based on current pricing)
    PRICING = {
        "claude-3-7-sonnet-20250219": {
            "input": 3.00,
            "output": 15.00,
            "cache_write": 3.75,
            "cache_read": 0.30
        },
        "gpt-4o": {
            "input": 2.50,
            "output": 10.00
        }
    }

    def __init__(self):
        self.usage: Dict[str, TokenUsage] = {}

    def track(self, model: str, response):
        """Track usage from response"""
        if model not in self.usage:
            self.usage[model] = TokenUsage()

        usage = self.usage[model]

        # Handle different response formats
        if hasattr(response, 'usage'):
            # Anthropic format
            usage.input_tokens += response.usage.input_tokens
            usage.output_tokens += response.usage.output_tokens
            if hasattr(response.usage, 'cache_read_input_tokens'):
                usage.cache_read_tokens += response.usage.cache_read_input_tokens or 0
            if hasattr(response.usage, 'cache_creation_input_tokens'):
                usage.cache_creation_tokens += response.usage.cache_creation_input_tokens or 0
        elif hasattr(response, 'usage'):
            # OpenAI format
            usage.input_tokens += response.usage.prompt_tokens
            usage.output_tokens += response.usage.completion_tokens

    def calculate_cost(self, model: str) -> float:
        """Calculate total cost for model"""
        if model not in self.usage or model not in self.PRICING:
            return 0.0

        usage = self.usage[model]
        pricing = self.PRICING[model]

        cost = 0.0
        cost += (usage.input_tokens / 1_000_000) * pricing["input"]
        cost += (usage.output_tokens / 1_000_000) * pricing["output"]

        if "cache_write" in pricing:
            cost += (usage.cache_creation_tokens / 1_000_000) * pricing["cache_write"]
        if "cache_read" in pricing:
            cost += (usage.cache_read_tokens / 1_000_000) * pricing["cache_read"]

        return cost

    def report(self):
        """Generate cost report"""
        total = 0.0
        print("\n=== API Cost Report ===")
        for model, usage in self.usage.items():
            cost = self.calculate_cost(model)
            total += cost
            print(f"\n{model}:")
            print(f"  Input tokens: {usage.input_tokens:,}")
            print(f"  Output tokens: {usage.output_tokens:,}")
            if usage.cache_read_tokens:
                print(f"  Cache read: {usage.cache_read_tokens:,}")
            if usage.cache_creation_tokens:
                print(f"  Cache write: {usage.cache_creation_tokens:,}")
            print(f"  Cost: ${cost:.4f}")
        print(f"\nTotal cost: ${total:.4f}")
        return total

Environment Configuration

from pydantic_settings import BaseSettings
from typing import Optional

class AIConfig(BaseSettings):
    """AI API configuration"""

    # API Keys
    anthropic_api_key: Optional[str] = None
    openai_api_key: Optional[str] = None
    google_api_key: Optional[str] = None
    groq_api_key: Optional[str] = None

    # Default models
    default_model: str = "claude-3-7-sonnet-20250219"

    # Limits
    max_tokens: int = 4096
    temperature: float = 0.7
    timeout: int = 30

    class Config:
        env_file = ".env"
        case_sensitive = False

# Usage
config = AIConfig()

client = Anthropic(api_key=config.anthropic_api_key)

Best Practices

  1. Use environment variables for API keys - Never hardcode
  2. Implement retry logic - Handle transient failures
  3. Track token usage and costs - Monitor spending
  4. Use streaming for long responses - Better UX
  5. Cache prompts when possible - Reduce costs
  6. Set reasonable timeouts - Prevent hanging
  7. Handle rate limits gracefully - Exponential backoff
  8. Validate inputs - Use Pydantic
  9. Log API calls - Debug and audit
  10. Use prompt caching - Save on repeated content

Model Selection Guide

When to Use Claude

  • Complex reasoning tasks
  • Code generation and review
  • Long context understanding (200K tokens)
  • Safety-critical applications
  • Document analysis

When to Use GPT-4o

  • General purpose tasks
  • Function calling
  • Vision tasks
  • Fastest OpenAI model
  • Good balance of cost/performance

When to Use Gemini

  • Multimodal tasks (text + image + video)
  • Cost-effective for simple tasks
  • Long context (2M tokens with Gemini Pro)
  • Fast inference

When to Use Groq

  • Ultra-fast inference needed
  • Real-time applications
  • Cost-effective for simple tasks
  • Open source models

Resources

Weekly Installs
1
First Seen
Mar 3, 2026
Installed on
amp1
cline1
opencode1
cursor1
kimi-cli1
codex1