Claude Vision Patterns

Quick Guide: Use type: "image" content blocks for images (base64, URL, or file_id) and type: "document" content blocks for PDFs. Supported image formats: JPEG, PNG, GIF, WebP. Images before text in the content array improves results. Token cost formula: tokens = (width * height) / 750. Images are auto-resized if the long edge exceeds 1568px or exceeds ~1600 tokens. PDFs use type: "document" with media_type: "application/pdf". No OCR library needed -- Claude reads text directly from images and PDFs.

<critical_requirements>

CRITICAL: Before Using This Skill

All code must follow project conventions in CLAUDE.md (kebab-case, named exports, import ordering, import type, named constants)

(You MUST use type: "image" for images and type: "document" for PDFs -- they are different content block types)

(You MUST place images and documents BEFORE text in the content array -- Claude performs better with visual content first)

(You MUST always provide max_tokens in every request -- it is required and has no default)

(You MUST iterate over response.content blocks -- never assume a single text block in the response)

ai-provider-claude-vision

Claude Vision Patterns

CRITICAL: Before Using This Skill

More from agents-inc/skills

web-animation-css-animations

web-animation-view-transitions

web-testing-playwright-e2e

web-styling-cva

web-animation-framer-motion

web-i18n-next-intl