bedrock-inference
Amazon Bedrock Inference
Overview
Amazon Bedrock Runtime provides APIs for invoking foundation models including Claude (Opus, Sonnet, Haiku), Nova (Amazon), Titan (Amazon), and third-party models (Cohere, AI21, Meta). Supports both synchronous and asynchronous inference with streaming capabilities.
Purpose: Production-grade model inference with unified API across all Bedrock models
Pattern: Task-based (independent operations for different inference modes)
Key Capabilities:
- Model Invocation - Direct model calls with native or Converse API
- Streaming - Real-time token streaming for low latency
- Async Invocation - Long-running tasks up to 24 hours
- Token Counting - Cost estimation before inference
- Guardrails - Runtime content filtering and safety
- Inference Profiles - Cross-region routing and cost optimization
Quality Targets:
- Latency: < 1s first token for streaming
- Throughput: Up to 4,000 tokens/sec
- Availability: 99.9% SLA with cross-region profiles
When to Use
Use bedrock-inference when:
- Invoking Claude, Nova, Titan, or other Bedrock models
- Building conversational AI applications
- Implementing streaming responses for better UX
- Running long-running async inference tasks
- Applying runtime guardrails for content safety
- Optimizing costs with inference profiles
- Counting tokens before model invocation
- Implementing multi-turn conversations
When NOT to Use:
- Building complex agents (use bedrock-agentcore)
- Knowledge base RAG (use bedrock-knowledge-bases)
- Model customization (use bedrock-fine-tuning)
Prerequisites
Required
- AWS account with Bedrock access
- Model access enabled in AWS Console
- IAM permissions for Bedrock Runtime
Recommended
boto3 >= 1.34.0(for latest Converse API)- Understanding of model-specific input formats
- CloudWatch for monitoring
Installation
pip install boto3 botocore
Enable Model Access
# Check available models
aws bedrock list-foundation-models --region us-east-1
# Request model access via Console:
# AWS Console → Bedrock → Model access → Manage model access
Model IDs and Inference Profiles
Claude Models (Anthropic)
| Model | Model ID | Inference Profile ID | Region | Max Tokens |
|---|---|---|---|---|
| Claude Opus 4.5 | anthropic.claude-opus-4-5-20251101-v1:0 |
global.anthropic.claude-opus-4-5-20251101-v1:0 |
Global | 200K |
| Claude Sonnet 4.5 | anthropic.claude-sonnet-4-5-20250929-v1:0 |
us.anthropic.claude-sonnet-4-5-20250929-v1:0 |
US | 200K |
| Claude Haiku 4.5 | anthropic.claude-haiku-4-5-20251001-v1:0 |
us.anthropic.claude-haiku-4-5-20251001-v1:0 |
US | 200K |
| Claude Sonnet 3.5 v2 | anthropic.claude-3-5-sonnet-20241022-v2:0 |
us.anthropic.claude-3-5-sonnet-20241022-v2:0 |
US | 200K |
| Claude Haiku 3.5 | anthropic.claude-3-5-haiku-20241022-v1:0 |
us.anthropic.claude-3-5-haiku-20241022-v1:0 |
US | 200K |
Amazon Nova Models
| Model | Model ID | Inference Profile ID | Region | Max Tokens |
|---|---|---|---|---|
| Nova Pro | amazon.nova-pro-v1:0 |
us.amazon.nova-pro-v1:0 |
US | 300K |
| Nova Lite | amazon.nova-lite-v1:0 |
us.amazon.nova-lite-v1:0 |
US | 300K |
| Nova Micro | amazon.nova-micro-v1:0 |
us.amazon.nova-micro-v1:0 |
US | 128K |
Amazon Titan Models
| Model | Model ID | Region | Max Tokens |
|---|---|---|---|
| Titan Text Premier | amazon.titan-text-premier-v1:0 |
All | 32K |
| Titan Text Express | amazon.titan-text-express-v1 |
All | 8K |
Inference Profile Prefixes
us.- US-only routing (lower latency for US traffic)global.- Global cross-region routing (highest availability)apac.- Asia-Pacific routing (lower latency for APAC traffic)
Quick Reference
Client Initialization
import boto3
from typing import Optional
def get_bedrock_client(region_name: str = 'us-east-1',
profile_name: Optional[str] = None):
"""Initialize Bedrock Runtime client"""
session = boto3.Session(
region_name=region_name,
profile_name=profile_name
)
return session.client('bedrock-runtime')
# Usage
bedrock = get_bedrock_client(region_name='us-west-2')
Operations
1. Invoke Model (Native API)
Direct model invocation using model-specific request format.
Basic Invocation:
import json
def invoke_claude(prompt: str, model_id: str = 'us.anthropic.claude-sonnet-4-5-20250929-v1:0'):
"""Invoke Claude with native API"""
bedrock = get_bedrock_client()
# Claude-specific request format
request_body = {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 2048,
"messages": [
{
"role": "user",
"content": prompt
}
],
"temperature": 0.7,
"top_p": 0.9
}
response = bedrock.invoke_model(
modelId=model_id,
body=json.dumps(request_body)
)
# Parse response
response_body = json.loads(response['body'].read())
return response_body['content'][0]['text']
# Usage
result = invoke_claude("Explain quantum computing in simple terms")
print(result)
With System Prompts:
request_body = {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 2048,
"system": "You are a helpful AI assistant specialized in technical documentation.",
"messages": [
{
"role": "user",
"content": "Write API documentation for a REST endpoint"
}
]
}
With Tool Use:
request_body = {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 4096,
"messages": [
{
"role": "user",
"content": "What's the weather in San Francisco?"
}
],
"tools": [
{
"name": "get_weather",
"description": "Get current weather for a location",
"input_schema": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City name"
}
},
"required": ["location"]
}
}
]
}
2. Converse API (Unified Interface)
Model-agnostic API that works across all Bedrock models with consistent interface.
Basic Conversation:
def converse_with_model(
messages: list,
model_id: str = 'us.anthropic.claude-sonnet-4-5-20250929-v1:0',
system_prompts: Optional[list] = None,
max_tokens: int = 2048
):
"""Converse API for unified model interaction"""
bedrock = get_bedrock_client()
inference_config = {
'maxTokens': max_tokens,
'temperature': 0.7,
'topP': 0.9
}
request_params = {
'modelId': model_id,
'messages': messages,
'inferenceConfig': inference_config
}
if system_prompts:
request_params['system'] = system_prompts
response = bedrock.converse(**request_params)
return response
# Usage
messages = [
{
'role': 'user',
'content': [
{'text': 'What are the benefits of microservices architecture?'}
]
}
]
system_prompts = [
{'text': 'You are a software architecture expert.'}
]
response = converse_with_model(messages, system_prompts=system_prompts)
assistant_message = response['output']['message']
print(assistant_message['content'][0]['text'])
Multi-turn Conversation:
def multi_turn_conversation():
"""Multi-turn conversation with context"""
bedrock = get_bedrock_client()
messages = []
model_id = 'us.anthropic.claude-sonnet-4-5-20250929-v1:0'
# Turn 1
messages.append({
'role': 'user',
'content': [{'text': 'My name is Alice and I work in healthcare.'}]
})
response = bedrock.converse(
modelId=model_id,
messages=messages,
inferenceConfig={'maxTokens': 1024}
)
# Add assistant response to history
messages.append(response['output']['message'])
# Turn 2 (model remembers context)
messages.append({
'role': 'user',
'content': [{'text': 'What are some AI applications in my field?'}]
})
response = bedrock.converse(
modelId=model_id,
messages=messages,
inferenceConfig={'maxTokens': 1024}
)
return response['output']['message']['content'][0]['text']
With Tool Use (Converse API):
def converse_with_tools():
"""Converse API with tool use"""
bedrock = get_bedrock_client()
tools = [
{
'toolSpec': {
'name': 'get_stock_price',
'description': 'Get current stock price for a symbol',
'inputSchema': {
'json': {
'type': 'object',
'properties': {
'symbol': {
'type': 'string',
'description': 'Stock ticker symbol'
}
},
'required': ['symbol']
}
}
}
}
]
messages = [
{
'role': 'user',
'content': [{'text': "What's the price of AAPL stock?"}]
}
]
response = bedrock.converse(
modelId='us.anthropic.claude-sonnet-4-5-20250929-v1:0',
messages=messages,
toolConfig={'tools': tools},
inferenceConfig={'maxTokens': 2048}
)
# Check if model wants to use a tool
if response['stopReason'] == 'tool_use':
tool_use = response['output']['message']['content'][0]['toolUse']
print(f"Tool requested: {tool_use['name']}")
print(f"Tool input: {tool_use['input']}")
# Execute tool and return result
# (Add tool result to messages and call converse again)
return response
3. Stream Response (Real-time Tokens)
Stream tokens as they're generated for lower perceived latency.
Streaming with Native API:
def stream_claude_response(prompt: str):
"""Stream response tokens in real-time"""
bedrock = get_bedrock_client()
request_body = {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 2048,
"messages": [
{
"role": "user",
"content": prompt
}
]
}
response = bedrock.invoke_model_with_response_stream(
modelId='us.anthropic.claude-sonnet-4-5-20250929-v1:0',
body=json.dumps(request_body)
)
# Process event stream
stream = response['body']
full_text = ""
for event in stream:
chunk = event.get('chunk')
if chunk:
chunk_obj = json.loads(chunk['bytes'].decode())
if chunk_obj['type'] == 'content_block_delta':
delta = chunk_obj['delta']
if delta['type'] == 'text_delta':
text = delta['text']
print(text, end='', flush=True)
full_text += text
elif chunk_obj['type'] == 'message_stop':
print() # New line at end
return full_text
# Usage
response = stream_claude_response("Write a short story about a robot")
Streaming with Converse API:
def stream_converse(messages: list, model_id: str):
"""Stream response using Converse API"""
bedrock = get_bedrock_client()
response = bedrock.converse_stream(
modelId=model_id,
messages=messages,
inferenceConfig={'maxTokens': 2048}
)
stream = response['stream']
full_text = ""
for event in stream:
if 'contentBlockDelta' in event:
delta = event['contentBlockDelta']['delta']
if 'text' in delta:
text = delta['text']
print(text, end='', flush=True)
full_text += text
elif 'messageStop' in event:
print()
break
return full_text
# Usage
messages = [{'role': 'user', 'content': [{'text': 'Explain neural networks'}]}]
stream_converse(messages, 'us.anthropic.claude-sonnet-4-5-20250929-v1:0')
Streaming with Error Handling:
def safe_streaming(prompt: str):
"""Streaming with comprehensive error handling"""
bedrock = get_bedrock_client()
request_body = {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 2048,
"messages": [{"role": "user", "content": prompt}]
}
try:
response = bedrock.invoke_model_with_response_stream(
modelId='us.anthropic.claude-sonnet-4-5-20250929-v1:0',
body=json.dumps(request_body)
)
full_text = ""
for event in response['body']:
chunk = event.get('chunk')
if chunk:
chunk_obj = json.loads(chunk['bytes'].decode())
if chunk_obj['type'] == 'content_block_delta':
text = chunk_obj['delta'].get('text', '')
print(text, end='', flush=True)
full_text += text
elif chunk_obj['type'] == 'error':
print(f"\nStreaming error: {chunk_obj['error']}")
break
return full_text
except Exception as e:
print(f"Stream failed: {e}")
raise
4. Count Tokens
Estimate token usage and costs before invoking models.
Converse Token Counting:
def count_tokens(messages: list, model_id: str):
"""Count tokens for cost estimation"""
bedrock = get_bedrock_client()
# Optional system prompts
system_prompts = [
{'text': 'You are a helpful assistant.'}
]
# Optional tools
tools = [
{
'toolSpec': {
'name': 'example_tool',
'description': 'Example tool',
'inputSchema': {
'json': {
'type': 'object',
'properties': {}
}
}
}
}
]
response = bedrock.converse_count(
modelId=model_id,
messages=messages,
system=system_prompts,
toolConfig={'tools': tools}
)
# Get token counts
usage = response['usage']
print(f"Input tokens: {usage['inputTokens']}")
print(f"System tokens: {usage.get('systemTokens', 0)}")
print(f"Tool tokens: {usage.get('toolTokens', 0)}")
print(f"Total input: {usage['totalTokens']}")
return usage
# Usage
messages = [
{'role': 'user', 'content': [{'text': 'This is a test message'}]}
]
tokens = count_tokens(messages, 'us.anthropic.claude-sonnet-4-5-20250929-v1:0')
Cost Estimation:
def estimate_cost(messages: list, model_id: str, estimated_output_tokens: int = 1000):
"""Estimate inference cost before invocation"""
bedrock = get_bedrock_client()
# Count input tokens
token_response = bedrock.converse_count(
modelId=model_id,
messages=messages
)
input_tokens = token_response['usage']['totalTokens']
# Pricing (as of December 2024, prices vary by region)
pricing = {
'us.anthropic.claude-opus-4-5-20251101-v1:0': {
'input': 15.00 / 1_000_000, # $15 per 1M input tokens
'output': 75.00 / 1_000_000 # $75 per 1M output tokens
},
'us.anthropic.claude-sonnet-4-5-20250929-v1:0': {
'input': 3.00 / 1_000_000,
'output': 15.00 / 1_000_000
},
'us.anthropic.claude-haiku-4-5-20251001-v1:0': {
'input': 0.80 / 1_000_000,
'output': 4.00 / 1_000_000
}
}
if model_id in pricing:
input_cost = input_tokens * pricing[model_id]['input']
output_cost = estimated_output_tokens * pricing[model_id]['output']
total_cost = input_cost + output_cost
print(f"Input tokens: {input_tokens:,} (${input_cost:.6f})")
print(f"Estimated output: {estimated_output_tokens:,} (${output_cost:.6f})")
print(f"Estimated total: ${total_cost:.6f}")
return {
'input_tokens': input_tokens,
'estimated_output_tokens': estimated_output_tokens,
'input_cost': input_cost,
'output_cost': output_cost,
'total_cost': total_cost
}
else:
print("Pricing not available for this model")
return None
5. Async Invoke (Long-Running Tasks)
For inference tasks that take longer than 60 seconds (up to 24 hours).
Start Async Invocation:
def async_invoke_model(prompt: str, s3_output_uri: str):
"""Start async model invocation for long tasks"""
bedrock = get_bedrock_client()
request_body = {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 10000,
"messages": [
{
"role": "user",
"content": prompt
}
]
}
response = bedrock.invoke_model_async(
modelId='us.anthropic.claude-sonnet-4-5-20250929-v1:0',
modelInput=json.dumps(request_body),
outputDataConfig={
's3OutputDataConfig': {
's3Uri': s3_output_uri
}
}
)
invocation_arn = response['invocationArn']
print(f"Async invocation started: {invocation_arn}")
return invocation_arn
# Usage
s3_output = 's3://my-bucket/bedrock-outputs/result.json'
arn = async_invoke_model("Write a 10,000 word technical guide", s3_output)
Check Async Status:
def check_async_status(invocation_arn: str):
"""Check status of async invocation"""
bedrock = get_bedrock_client()
response = bedrock.get_async_invoke(
invocationArn=invocation_arn
)
status = response['status']
print(f"Status: {status}")
if status == 'Completed':
output_uri = response['outputDataConfig']['s3OutputDataConfig']['s3Uri']
print(f"Output available at: {output_uri}")
# Download and parse result
# (Use boto3 S3 client to retrieve)
elif status == 'Failed':
print(f"Failure reason: {response.get('failureMessage', 'Unknown')}")
return response
# Usage
status = check_async_status(arn)
List Async Invocations:
def list_async_invocations(status_filter: Optional[str] = None):
"""List all async invocations"""
bedrock = get_bedrock_client()
params = {}
if status_filter:
params['statusEquals'] = status_filter # 'InProgress', 'Completed', 'Failed'
response = bedrock.list_async_invokes(**params)
for invocation in response.get('asyncInvokeSummaries', []):
print(f"ARN: {invocation['invocationArn']}")
print(f"Status: {invocation['status']}")
print(f"Submit time: {invocation['submitTime']}")
print("---")
return response
6. Apply Guardrail (Runtime Safety)
Apply content filtering and safety policies at runtime.
Invoke with Guardrail:
def invoke_with_guardrail(
prompt: str,
guardrail_id: str,
guardrail_version: str = 'DRAFT'
):
"""Invoke model with runtime guardrail"""
bedrock = get_bedrock_client()
request_body = {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 2048,
"messages": [
{
"role": "user",
"content": prompt
}
]
}
response = bedrock.invoke_model(
modelId='us.anthropic.claude-sonnet-4-5-20250929-v1:0',
body=json.dumps(request_body),
guardrailIdentifier=guardrail_id,
guardrailVersion=guardrail_version
)
# Check if content was blocked
response_body = json.loads(response['body'].read())
if 'amazon-bedrock-guardrailAction' in response['ResponseMetadata']['HTTPHeaders']:
action = response['ResponseMetadata']['HTTPHeaders']['amazon-bedrock-guardrailAction']
if action == 'GUARDRAIL_INTERVENED':
print("Content blocked by guardrail")
return None
return response_body['content'][0]['text']
# Usage
result = invoke_with_guardrail(
"Tell me about quantum computing",
guardrail_id='abc123xyz',
guardrail_version='1'
)
Converse with Guardrail:
def converse_with_guardrail(messages: list, guardrail_config: dict):
"""Converse API with guardrail configuration"""
bedrock = get_bedrock_client()
response = bedrock.converse(
modelId='us.anthropic.claude-sonnet-4-5-20250929-v1:0',
messages=messages,
inferenceConfig={'maxTokens': 2048},
guardrailConfig=guardrail_config
)
# Check trace for guardrail intervention
if 'trace' in response:
trace = response['trace']['guardrail']
if trace.get('action') == 'GUARDRAIL_INTERVENED':
print("Guardrail blocked content")
for assessment in trace.get('assessments', []):
print(f"Policy: {assessment['topicPolicy']}")
return response
# Usage
guardrail_config = {
'guardrailIdentifier': 'abc123xyz',
'guardrailVersion': '1',
'trace': 'enabled'
}
messages = [{'role': 'user', 'content': [{'text': 'Test message'}]}]
converse_with_guardrail(messages, guardrail_config)
Error Handling Patterns
Comprehensive Error Handling
from botocore.exceptions import ClientError, BotoCoreError
import time
def robust_invoke(prompt: str, max_retries: int = 3):
"""Invoke model with retry logic and error handling"""
bedrock = get_bedrock_client()
request_body = {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 2048,
"messages": [{"role": "user", "content": prompt}]
}
for attempt in range(max_retries):
try:
response = bedrock.invoke_model(
modelId='us.anthropic.claude-sonnet-4-5-20250929-v1:0',
body=json.dumps(request_body)
)
response_body = json.loads(response['body'].read())
return response_body['content'][0]['text']
except ClientError as e:
error_code = e.response['Error']['Code']
if error_code == 'ThrottlingException':
wait_time = (2 ** attempt) + 1 # Exponential backoff
print(f"Throttled. Waiting {wait_time}s before retry {attempt + 1}/{max_retries}")
time.sleep(wait_time)
continue
elif error_code == 'ModelTimeoutException':
print("Model timeout - request took too long")
if attempt < max_retries - 1:
time.sleep(2)
continue
raise
elif error_code == 'ModelErrorException':
print("Model error - check input format")
raise
elif error_code == 'ValidationException':
print("Invalid parameters")
raise
elif error_code == 'AccessDeniedException':
print("Access denied - check IAM permissions and model access")
raise
elif error_code == 'ResourceNotFoundException':
print("Model not found - check model ID")
raise
else:
print(f"Unexpected error: {error_code}")
raise
except BotoCoreError as e:
print(f"Connection error: {e}")
if attempt < max_retries - 1:
time.sleep(2)
continue
raise
raise Exception(f"Failed after {max_retries} attempts")
Specific Error Scenarios
def handle_model_errors():
"""Common error scenarios and solutions"""
bedrock = get_bedrock_client()
try:
# Attempt invocation
response = bedrock.invoke_model(
modelId='us.anthropic.claude-sonnet-4-5-20250929-v1:0',
body=json.dumps({
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 2048,
"messages": [{"role": "user", "content": "test"}]
})
)
except ClientError as e:
error_code = e.response['Error']['Code']
if error_code == 'ModelNotReadyException':
# Model is still loading
print("Model not ready, wait 30 seconds and retry")
elif error_code == 'ServiceQuotaExceededException':
# Hit service quota
print("Exceeded quota - request increase or use different region")
elif error_code == 'ModelStreamErrorException':
# Error during streaming
print("Stream interrupted - restart stream")
Best Practices
1. Cost Optimization
def cost_optimized_inference(prompt: str, require_high_accuracy: bool = False):
"""Choose model based on task complexity and cost"""
# Simple tasks → Haiku (cheapest)
# Moderate tasks → Sonnet (balanced)
# Complex tasks → Opus (most capable)
if not require_high_accuracy:
model_id = 'us.anthropic.claude-haiku-4-5-20251001-v1:0'
print("Using Haiku for cost efficiency")
elif require_high_accuracy:
model_id = 'global.anthropic.claude-opus-4-5-20251101-v1:0'
print("Using Opus for maximum accuracy")
else:
model_id = 'us.anthropic.claude-sonnet-4-5-20250929-v1:0'
print("Using Sonnet for balanced performance")
return invoke_claude(prompt, model_id)
2. Use Inference Profiles
def use_inference_profiles():
"""Leverage inference profiles for cost savings"""
# Cross-region profiles offer 30-50% cost savings
# with automatic region failover
profiles = {
'global_opus': 'global.anthropic.claude-opus-4-5-20251101-v1:0',
'us_sonnet': 'us.anthropic.claude-sonnet-4-5-20250929-v1:0',
'us_haiku': 'us.anthropic.claude-haiku-4-5-20251001-v1:0'
}
# Use global profile for high availability
# Use regional profile for lower latency
return profiles
3. Implement Caching
from functools import lru_cache
import hashlib
@lru_cache(maxsize=100)
def cached_inference(prompt: str, model_id: str):
"""Cache responses for identical prompts"""
return invoke_claude(prompt, model_id)
def cache_key(prompt: str) -> str:
"""Generate cache key for prompt"""
return hashlib.sha256(prompt.encode()).hexdigest()
4. Monitor Token Usage
def track_token_usage(messages: list, model_id: str):
"""Track and log token usage"""
bedrock = get_bedrock_client()
# Count before invocation
token_count = bedrock.converse_count(
modelId=model_id,
messages=messages
)
input_tokens = token_count['usage']['totalTokens']
# Invoke
response = bedrock.converse(
modelId=model_id,
messages=messages,
inferenceConfig={'maxTokens': 2048}
)
# Get actual output tokens
output_tokens = response['usage']['outputTokens']
total_tokens = response['usage']['totalInputTokens'] + output_tokens
# Log to CloudWatch or database
print(f"Input: {input_tokens}, Output: {output_tokens}, Total: {total_tokens}")
return response
5. Use Streaming for Better UX
def stream_for_user_experience(prompt: str):
"""Always use streaming for interactive applications"""
# Streaming reduces perceived latency
# Users see tokens immediately instead of waiting
return stream_claude_response(prompt)
6. Async for Long Tasks
def use_async_for_batch(prompts: list, s3_bucket: str):
"""Use async invocation for batch processing"""
invocation_arns = []
for idx, prompt in enumerate(prompts):
s3_uri = f's3://{s3_bucket}/outputs/result-{idx}.json'
arn = async_invoke_model(prompt, s3_uri)
invocation_arns.append(arn)
return invocation_arns
IAM Permissions
Minimum Runtime Permissions
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"bedrock:InvokeModel",
"bedrock:InvokeModelWithResponseStream"
],
"Resource": [
"arn:aws:bedrock:*::foundation-model/anthropic.claude-*",
"arn:aws:bedrock:*::foundation-model/amazon.nova-*",
"arn:aws:bedrock:*::foundation-model/amazon.titan-*"
]
},
{
"Effect": "Allow",
"Action": [
"bedrock:Converse",
"bedrock:ConverseStream"
],
"Resource": "*"
}
]
}
With Async Invocation
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"bedrock:InvokeModel",
"bedrock:InvokeModelWithResponseStream",
"bedrock:InvokeModelAsync",
"bedrock:GetAsyncInvoke",
"bedrock:ListAsyncInvokes"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject"
],
"Resource": "arn:aws:s3:::my-bedrock-bucket/*"
}
]
}
Progressive Disclosure
Quick Start (This File)
- Client initialization
- Model IDs and inference profiles
- Basic invocation (native and Converse API)
- Streaming responses
- Token counting
- Async invocation
- Guardrail application
- Error handling patterns
- Best practices
Detailed References
- Advanced Invocation Patterns: Batch processing, parallel requests, custom retry logic, response parsing
- Multimodal Support: Image inputs, document parsing, vision capabilities for Claude and Nova
- Tool Use and Function Calling: Complete tool use patterns, multi-turn tool conversations, error handling
- Performance Optimization: Latency optimization, throughput tuning, cost reduction strategies
- Monitoring and Observability: CloudWatch integration, custom metrics, cost tracking, usage analytics
Related Skills
- bedrock-agentcore: Build production AI agents with managed infrastructure
- bedrock-guardrails: Configure content filters and safety policies
- bedrock-knowledge-bases: RAG with vector stores and retrieval
- bedrock-prompts: Manage and version prompts
- anthropic-expert: Claude API patterns and best practices
- claude-cost-optimization: Cost tracking and optimization for Claude
- boto3-eks: For containerized Bedrock applications