OpenAI Image Vision

Analyze images using OpenAI's GPT-4 Vision API. The model can understand visual elements including objects, shapes, colors, textures, and text within images.

Setup

This skill requires at least one of the following API keys (OpenAI is preferred when both are set):

OpenAI (preferred): env_config(action="set", key="OPENAI_API_KEY", value="your-key")
LinkAI (fallback): env_config(action="set", key="LINKAI_API_KEY", value="your-key")

Optional: Set custom API base URL:

env_config(action="set", key="OPENAI_API_BASE", value="your-base-url")

Usage

Important: Scripts are located relative to this skill's base directory.

When you see this skill in <available_skills>, note the <base_dir> path.

CRITICAL: Always use bash command to execute the script:

# General pattern (MUST start with bash):
bash "<base_dir>/scripts/vision.sh" "<image_path_or_url>" "<question>" [model]

# DO NOT execute the script directly like this (WRONG):
# "<base_dir>/scripts/vision.sh" ...

# Parameters:
# - image_path_or_url: Local image file path or HTTP(S) URL (required)
# - question: Question to ask about the image (required)
# - model: OpenAI model to use (default: gpt-4.1-mini)
#   Options: gpt-4.1-mini, gpt-4.1, gpt-4o-mini, gpt-4-turbo

Examples

Analyze a local image

bash "<base_dir>/scripts/vision.sh" "/path/to/image.jpg" "What's in this image?"

Analyze an image from URL

bash "<base_dir>/scripts/vision.sh" "https://example.com/image.jpg" "Describe this image in detail"

Use specific model

bash "<base_dir>/scripts/vision.sh" "/path/to/photo.png" "What colors are prominent?" "gpt-4o-mini"

Extract text from image

bash "<base_dir>/scripts/vision.sh" "/path/to/document.jpg" "Extract all text from this image"

Analyze multiple aspects

bash "<base_dir>/scripts/vision.sh" "image.jpg" "List all objects you can see and describe the overall scene"

Supported Image Formats

JPEG (.jpg, .jpeg)
PNG (.png)
GIF (.gif)
WebP (.webp)

Performance Optimization: Files larger than 1MB are automatically compressed to 800px (longest side) to avoid command-line parameter limits. This happens transparently without affecting analysis quality.

Response Format

The script returns a JSON response:

{
  "model": "gpt-4.1-mini",
  "content": "The image shows...",
  "usage": {
    "prompt_tokens": 1234,
    "completion_tokens": 567,
    "total_tokens": 1801
  }
}

Or in case of error:

{
  "error": "Error description",
  "details": "Additional error information"
}

Notes

Image size: Images are automatically resized if too large
Timeout: 60 seconds for API calls
Rate limits: Subject to your OpenAI API plan limits
Privacy: Images are sent to OpenAI's servers for processing
Local files: Automatically converted to base64 for API submission
URLs: Can be passed directly to the API without downloading

openai-image-vision