Apify Actor Development

Build serverless Apify actors for web scraping, browser automation, and data extraction using Python.

Prerequisites & Setup (MANDATORY)

Before creating or modifying actors, verify that apify CLI is installed: Run apify --help.

If it is not installed, you can run:

curl -fsSL https://apify.com/install-cli.sh | bash

# Or (Mac): brew install apify-cli
# Or (Windows): irm https://apify.com/install-cli.ps1 | iex
# Or: npm install -g apify-cli

When the apify CLI is installed, check that it is logged in with:

apify info  # Should return your username

If it is not logged in, check if the APIFY_TOKEN environment variable is defined (if not, ask the user to generate one on https://console.apify.com/settings/integrations and then define APIFY_TOKEN with it).

Then run:

apify login -t $APIFY_TOKEN

Quick Start Workflow

Creating a New Actor

Copy template - Copy all files including hidden ones from the skill's assets/python-template/ directory to your new actor directory. The template is located at {base_dir}/assets/python-template/ where {base_dir} is the skill's base directory.
Setup pre-commit - Run uv run pre-commit install for automatic quality checks
Add dependencies - Use uv add package-name for each required dependency
Implement logic - Write the actor code in src/main.py (the src/__main__.py entry point is already set up)
Configure schemas - Update input/output schemas in .actor/input_schema.json and .actor/output_schema.json
Configure platform settings - Update .actor/actor.json with actor metadata
Write documentation - Create comprehensive .actor/ACTOR.md for the marketplace
Test locally - Run apify run to verify functionality
Deploy - Run apify push to deploy the actor on the Apify platform

CRITICAL REMINDERS:

NEVER create requirements.txt
NEVER use pip install or uv pip install
ALWAYS use uv add to add dependencies
ALWAYS use uv sync to install dependencies
ALWAYS format with uv run ruff format . after file changes
ALWAYS lint with uv run ruff check --fix . after file changes
ALWAYS check the apify push output for build errors before considering deployment complete
Input/output schemas should be updated when changing actor functionality

Core Concepts

Input/Output Pattern

Every actor follows this pattern:

Input: JSON from key-value store (defined by input schema)
Process: Actor logic extracts/transforms data
Output: Results pushed to dataset or key-value store

Storage Types

Dataset: Structured data (arrays of objects) - use for scraping results and tabular data
Key-Value Store: Arbitrary data (files, objects) - use for screenshots, PDFs, state, and binary files
Request Queue: URLs to crawl - use for deep web crawling and multi-page scraping workflows

Project Structure

my-actor/
├── .actor/
│   ├── actor.json                    # Actor metadata
│   ├── input_schema.json             # Input schema
│   ├── output_schema.json            # Output schema
│   ├── ACTOR.md                      # PUBLIC marketplace documentation (CRITICAL)
│   └── datasets/
│       └── dataset_schema.json       # Dataset schema with views
├── src/ or package_name/             # Source code
│   ├── __init__.py
│   ├── __main__.py                   # Entry point for CLI (REQUIRED)
│   └── main.py                       # Main actor logic
├── tests/                            # Test files
│   └── test_*.py
├── .dockerignore                     # Docker build exclusions
├── .pre-commit-config.yaml           # Pre-commit hooks
├── Dockerfile                        # Container config
├── pyproject.toml                    # Python project config
├── uv.lock                          # Dependency lock file
└── README.md                         # Development docs

Common Patterns

See references/python-sdk.md for complete examples of:

Simple HTTP scraping with BeautifulSoup
Browser automation with Playwright and Selenium
Deep crawling with Request Queue
Proxy management and error handling
Storage APIs (Dataset, Key-Value Store, Request Queue)

Input Schema Design

Input schemas use JSON Schema format to define and validate actor inputs. See references/input-schema.md for:

Field types (string, number, boolean, array, object)
Special editors (requestListSources, globs, pseudoUrls, proxy, json, textarea)
Validation patterns (regex, length, range, required fields)
Complete examples with best practices

Key principles:

Always include descriptions and examples
Provide examples for all fields
Set sensible defaults for ease of use
Use appropriate editors for better UX
Add units for numeric fields (pages, seconds, MB)

Output Schema Design

Output schemas define where actors store outputs and provide templates for accessing that data. See references/output-schema.md for:

Schema structure and template variables (links.apiDefaultDatasetUrl, links.apiDefaultKeyValueStoreUrl, etc.)
Dataset and key-value store output configurations
Multiple output types in a single actor
Integration with Python code
Complete examples with emojis and descriptions

Key principles:

Define all outputs explicitly (even if empty)
Use descriptive titles with emojis for visual clarity
Include helpful descriptions for users and LLM integrations
Match templates to actual storage locations in code

ACTOR.md Documentation (CRITICAL)

The .actor/ACTOR.md file is the public-facing documentation that users see in the Apify marketplace. This is your actor's main sales page and user guide.

Required sections:

Title & Description - Clear, compelling one-liner
What it does - Bullet points of key capabilities
Input - Example JSON with field explanations
Output - Example JSON showing expected results
Use Cases - Who benefits and why (with emojis)
Standby Mode (if applicable) - API usage examples
Tips & Best Practices - Performance and configuration guidance

See assets/python-template/.actor/ACTOR.md for a complete template.

Key principles:

Write for non-technical users - assume no coding knowledge
Use emojis to make sections scannable (🎯 🔍 ⚡ 🚀)
Provide copy-paste ready code examples
Show actual input/output samples, not schemas
Highlight benefits and use cases clearly

Modifying Existing Actors

When modifying an existing actor:

Understand current logic - Read src/main.py
Check input schema - Review .actor/input_schema.json for expected inputs
Add dependencies with uv - Use uv add package-name (NEVER pip install)
Make code changes - Implement the requested features
Format code - Run uv run ruff format . (MANDATORY)
Lint code - Run uv run ruff check --fix . (MANDATORY)
Test changes locally - Use apify run before deploying
Update schema if needed - Add new fields to input schema
Deploy - Push changes with apify push

Debugging Actors

Test locally - Use apify run to test actor locally before deployment
Check storage - Inspect ./storage/ directory for datasets, key-value stores, and request queues
Add logging - Use Actor.log.info(), Actor.log.debug(), Actor.log.error() (see SDK references)
View logs on platform - Check actor run logs in Apify Console for production issues

Best Practices

Code Quality

Validate input - Always check required fields and formats with clear error messages
Handle errors - Use try/catch with proper error logging and graceful degradation
Structured logging - Use Actor.log with extra fields for better debugging
Type hints - Add type annotations for better code clarity and IDE support
Docstrings - Document functions and modules for maintainability
Format with ruff - ALWAYS run uv run ruff format . before committing
Lint with ruff - ALWAYS run uv run ruff check --fix . before deploying

Performance & Scalability

Batch processing - Push data in batches (100-1000 items) for large datasets to reduce API calls
Use proxies - Avoid IP blocking for web scraping with proxy configuration
Resource limits - Set appropriate memory limits and timeouts in .actor/actor.json
Optimize Docker - Use multi-stage builds, bytecode compilation, and minimal base images
Consider Standby mode - For low-latency (<100ms), high-frequency use cases

Security & Configuration

Environment variables - Never hardcode secrets; use Actor.config and environment variables
Input validation - Use JSON Schema patterns, required fields, and runtime validation
Run as non-root - Use myuser in Dockerfile for container security
Minimize image size - Use .dockerignore to exclude unnecessary files and reduce build time

Development Workflow

Testing - Write tests with pytest; use coverage and snapshot testing for reliability
Pre-commit hooks - Use ruff and pre-commit for consistent code quality (MANDATORY)
Use uv exclusively - NEVER use pip or requirements.txt; only use uv add and uv sync (MANDATORY)
Lock dependencies - Always commit uv.lock for reproducible builds (MANDATORY)
Test locally - Always test with apify run before deploying to catch issues early
Dataset schemas - Define dataset_schema.json with views for better Apify Console UI
CLI support - Add CLI entry points via __main__.py for local testing and development

Standby Mode (Real-time API)

Standby mode allows actors to run as persistent HTTP servers, providing instant responses without cold start delays.

Perfect for:

Real-time APIs requiring <100ms response times
Webhook endpoints that need immediate processing
High-frequency requests (multiple requests per second)
Integration with real-time services (Slack bots, chat applications, webhooks)
Low-latency scraping APIs and on-demand data extraction

See references/standby-mode.md for complete implementation patterns, authentication, and examples.

References

Detailed documentation in references/:

python-sdk.md - SDK patterns and complete code examples
standby-mode.md - Real-time API implementation
input-schema.md - Input validation and UI configuration
output-schema.md - Output configuration and templates

Troubleshooting

If you need information not covered in this skill, use the WebFetch tool with https://docs.apify.com/llms.txt to access the complete official documentation.

apify-actor

Apify Actor Development

Prerequisites & Setup (MANDATORY)

Quick Start Workflow

Creating a New Actor

Core Concepts

Input/Output Pattern

Storage Types

Project Structure

Common Patterns

Input Schema Design

Output Schema Design

ACTOR.md Documentation (CRITICAL)

Modifying Existing Actors

Debugging Actors

Best Practices

Code Quality

Performance & Scalability

Security & Configuration

Development Workflow

Standby Mode (Real-time API)

References

Troubleshooting