xAI Grok Models Guide

Complete guide to selecting the right Grok model for your use case, with pricing and capability comparisons.

Model Quick Reference

Model	Best For	Input $/1M	Output $/1M	Context
`grok-4-1-fast`	Tool calling, agents	$0.20	$0.50	2M
`grok-4`	Complex reasoning	$3.00	$15.00	256K
`grok-3-fast`	General tasks	$0.20	$0.50	131K
`grok-3-mini`	Lightweight tasks	$0.30	$0.50	131K
`grok-2-vision`	Image analysis	$2.00	$10.00	32K

Model Selection Decision Tree

What's your primary need?
│
├─► Tool calling / Agent workflows
│   └─► grok-4-1-fast ($0.20/$0.50)
│
├─► Complex reasoning / Analysis
│   └─► grok-4 ($3.00/$15.00)
│
├─► General chat / Simple tasks
│   └─► grok-3-fast ($0.20/$0.50)
│
├─► High volume / Cost sensitive
│   └─► grok-3-mini ($0.30/$0.50)
│
└─► Image/Vision tasks
    └─► grok-2-vision ($2.00/$10.00)

Detailed Model Profiles

grok-4-1-fast (Recommended for Most Uses)

Best for: Tool calling, agentic workflows, real-time search

# Best choice for X search and sentiment analysis
response = client.chat.completions.create(
    model="grok-4-1-fast",
    messages=[{"role": "user", "content": "Search X for AAPL sentiment"}]
)

Features:

2 million token context window
Optimized for tool calling
Fast response times
Best price/performance ratio

Variants:

grok-4-1-fast-reasoning - Maximum intelligence
grok-4-1-fast-non-reasoning - Instant responses

grok-4

Best for: Deep analysis, complex reasoning, research

# Use for complex multi-step analysis
response = client.chat.completions.create(
    model="grok-4",
    messages=[{"role": "user", "content": "Analyze market trends..."}]
)

Features:

Highest reasoning capability
Best for complex tasks
256K context window

grok-3-fast

Best for: General purpose, balanced performance

# Good default choice for most tasks
response = client.chat.completions.create(
    model="grok-3-fast",
    messages=[{"role": "user", "content": "Summarize this..."}]
)

Features:

Fast responses
131K context
Good balance of speed/quality

grok-3-mini

Best for: High-volume, cost-sensitive applications

# Use for bulk processing
response = client.chat.completions.create(
    model="grok-3-mini",
    messages=[{"role": "user", "content": "Classify: ..."}]
)

Features:

Lowest latency
Most cost-effective
Good for simple tasks

grok-2-vision

Best for: Image analysis, charts, screenshots

import base64

# Encode image
with open("chart.png", "rb") as f:
    image_data = base64.b64encode(f.read()).decode()

response = client.chat.completions.create(
    model="grok-2-vision",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Analyze this chart"},
            {"type": "image_url", "image_url": {"url": f"data:image/png;base64,{image_data}"}}
        ]
    }]
)

Cost Optimization Strategies

1. Use the Right Model

# For filtering/classification - use mini
filter_response = client.chat.completions.create(
    model="grok-3-mini",
    messages=[{"role": "user", "content": f"Is this relevant? {text}"}]
)

# For analysis - use fast
if is_relevant:
    analysis = client.chat.completions.create(
        model="grok-4-1-fast",
        messages=[{"role": "user", "content": f"Analyze: {text}"}]
    )

2. Leverage Caching

Cached input tokens are 75% cheaper:

Regular: $0.20/1M
Cached: $0.05/1M

3. Batch Similar Requests

# Instead of 10 separate calls, batch them
texts = ["text1", "text2", "text3"]
batch_prompt = "Analyze these texts:\n" + "\n".join(texts)

response = client.chat.completions.create(
    model="grok-3-fast",
    messages=[{"role": "user", "content": batch_prompt}]
)

Tool Calling Costs

Tool	Cost per 1,000 calls
X Search	$5.00
Web Search	$5.00
Code Execution	$5.00
Document Search	$2.50

Context Window Comparison

Model	Context	Pages of Text	Hours of Audio
grok-4-1-fast	2M	~6,000	~50
grok-4	256K	~800	~6
grok-3-fast	131K	~400	~3
grok-2-vision	32K	~100	~1

Model Capabilities Matrix

Capability	4.1 Fast	4	3 Fast	3 Mini	2 Vision
Tool Calling	⭐⭐⭐	⭐⭐	⭐	⭐	❌
Reasoning	⭐⭐	⭐⭐⭐	⭐⭐	⭐	⭐⭐
Speed	⭐⭐⭐	⭐	⭐⭐⭐	⭐⭐⭐	⭐⭐
Cost	⭐⭐⭐	⭐	⭐⭐⭐	⭐⭐⭐	⭐⭐
Vision	❌	❌	❌	❌	⭐⭐⭐
X Search	⭐⭐⭐	⭐⭐	⭐⭐	⭐	❌

Recommended Configurations

Financial Sentiment Pipeline

MODELS = {
    "filter": "grok-3-mini",      # Fast filtering
    "analyze": "grok-4-1-fast",   # Tool calling + analysis
    "deep": "grok-4"              # Complex reasoning (rare)
}

High-Volume Processing

MODELS = {
    "bulk": "grok-3-mini",
    "quality_check": "grok-3-fast"
}

Research & Analysis

MODELS = {
    "search": "grok-4-1-fast",
    "analyze": "grok-4",
    "summarize": "grok-3-fast"
}

API Usage Example

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.getenv("XAI_API_KEY"),
    base_url="https://api.x.ai/v1"
)

# List available models
models = client.models.list()
for model in models.data:
    print(f"{model.id}")

# Use specific model
response = client.chat.completions.create(
    model="grok-4-1-fast",
    messages=[{"role": "user", "content": "Hello!"}],
    max_tokens=100
)

Related Skills

xai-auth - Authentication setup
xai-agent-tools - Tool calling
xai-sentiment - Sentiment analysis

xai-models

xAI Grok Models Guide

Model Quick Reference

Model Selection Decision Tree

Detailed Model Profiles

grok-4-1-fast (Recommended for Most Uses)

grok-4

grok-3-fast

grok-3-mini

grok-2-vision

Cost Optimization Strategies

1. Use the Right Model

2. Leverage Caching

3. Batch Similar Requests

Tool Calling Costs

Context Window Comparison

Model Capabilities Matrix

Recommended Configurations

Financial Sentiment Pipeline

High-Volume Processing

Research & Analysis

API Usage Example

Related Skills

References