skills/smithery.ai/apify-actor

apify-actor

SKILL.md

Apify Actor Development

Build serverless Apify actors for web scraping, browser automation, and data extraction using Python.

Prerequisites & Setup (MANDATORY)

Before creating or modifying actors, verify that apify CLI is installed: Run apify --help.

If it is not installed, you can run:

curl -fsSL https://apify.com/install-cli.sh | bash

# Or (Mac): brew install apify-cli
# Or (Windows): irm https://apify.com/install-cli.ps1 | iex
# Or: npm install -g apify-cli

When the apify CLI is installed, check that it is logged in with:

apify info  # Should return your username

If it is not logged in, check if the APIFY_TOKEN environment variable is defined (if not, ask the user to generate one on https://console.apify.com/settings/integrations and then define APIFY_TOKEN with it).

Then run:

apify login -t $APIFY_TOKEN

Quick Start Workflow

Creating a New Actor

  1. Copy template - Copy all files including hidden ones from the skill's assets/python-template/ directory to your new actor directory. The template is located at {base_dir}/assets/python-template/ where {base_dir} is the skill's base directory.
  2. Setup pre-commit - Run uv run pre-commit install for automatic quality checks
  3. Add dependencies - Use uv add package-name for each required dependency
  4. Implement logic - Write the actor code in src/main.py (the src/__main__.py entry point is already set up)
  5. Configure schemas - Update input/output schemas in .actor/input_schema.json and .actor/output_schema.json
  6. Configure platform settings - Update .actor/actor.json with actor metadata
  7. Write documentation - Create comprehensive .actor/ACTOR.md for the marketplace
  8. Test locally - Run apify run to verify functionality
  9. Deploy - Run apify push to deploy the actor on the Apify platform

CRITICAL REMINDERS:

  • NEVER create requirements.txt
  • NEVER use pip install or uv pip install
  • ALWAYS use uv add to add dependencies
  • ALWAYS use uv sync to install dependencies
  • ALWAYS format with uv run ruff format . after file changes
  • ALWAYS lint with uv run ruff check --fix . after file changes
  • ALWAYS check the apify push output for build errors before considering deployment complete
  • Input/output schemas should be updated when changing actor functionality

Core Concepts

Input/Output Pattern

Every actor follows this pattern:

  1. Input: JSON from key-value store (defined by input schema)
  2. Process: Actor logic extracts/transforms data
  3. Output: Results pushed to dataset or key-value store

Storage Types

  • Dataset: Structured data (arrays of objects) - use for scraping results and tabular data
  • Key-Value Store: Arbitrary data (files, objects) - use for screenshots, PDFs, state, and binary files
  • Request Queue: URLs to crawl - use for deep web crawling and multi-page scraping workflows

Project Structure

my-actor/
├── .actor/
│   ├── actor.json                    # Actor metadata
│   ├── input_schema.json             # Input schema
│   ├── output_schema.json            # Output schema
│   ├── ACTOR.md                      # PUBLIC marketplace documentation (CRITICAL)
│   └── datasets/
│       └── dataset_schema.json       # Dataset schema with views
├── src/ or package_name/             # Source code
│   ├── __init__.py
│   ├── __main__.py                   # Entry point for CLI (REQUIRED)
│   └── main.py                       # Main actor logic
├── tests/                            # Test files
│   └── test_*.py
├── .dockerignore                     # Docker build exclusions
├── .pre-commit-config.yaml           # Pre-commit hooks
├── Dockerfile                        # Container config
├── pyproject.toml                    # Python project config
├── uv.lock                          # Dependency lock file
└── README.md                         # Development docs

Common Patterns

See references/python-sdk.md for complete examples of:

  • Simple HTTP scraping with BeautifulSoup
  • Browser automation with Playwright and Selenium
  • Deep crawling with Request Queue
  • Proxy management and error handling
  • Storage APIs (Dataset, Key-Value Store, Request Queue)

Input Schema Design

Input schemas use JSON Schema format to define and validate actor inputs. See references/input-schema.md for:

  • Field types (string, number, boolean, array, object)
  • Special editors (requestListSources, globs, pseudoUrls, proxy, json, textarea)
  • Validation patterns (regex, length, range, required fields)
  • Complete examples with best practices

Key principles:

  • Always include descriptions and examples
  • Provide examples for all fields
  • Set sensible defaults for ease of use
  • Use appropriate editors for better UX
  • Add units for numeric fields (pages, seconds, MB)

Output Schema Design

Output schemas define where actors store outputs and provide templates for accessing that data. See references/output-schema.md for:

  • Schema structure and template variables (links.apiDefaultDatasetUrl, links.apiDefaultKeyValueStoreUrl, etc.)
  • Dataset and key-value store output configurations
  • Multiple output types in a single actor
  • Integration with Python code
  • Complete examples with emojis and descriptions

Key principles:

  • Define all outputs explicitly (even if empty)
  • Use descriptive titles with emojis for visual clarity
  • Include helpful descriptions for users and LLM integrations
  • Match templates to actual storage locations in code

ACTOR.md Documentation (CRITICAL)

The .actor/ACTOR.md file is the public-facing documentation that users see in the Apify marketplace. This is your actor's main sales page and user guide.

Required sections:

  1. Title & Description - Clear, compelling one-liner
  2. What it does - Bullet points of key capabilities
  3. Input - Example JSON with field explanations
  4. Output - Example JSON showing expected results
  5. Use Cases - Who benefits and why (with emojis)
  6. Standby Mode (if applicable) - API usage examples
  7. Tips & Best Practices - Performance and configuration guidance

See assets/python-template/.actor/ACTOR.md for a complete template.

Key principles:

  • Write for non-technical users - assume no coding knowledge
  • Use emojis to make sections scannable (🎯 🔍 ⚡ 🚀)
  • Provide copy-paste ready code examples
  • Show actual input/output samples, not schemas
  • Highlight benefits and use cases clearly

Modifying Existing Actors

When modifying an existing actor:

  1. Understand current logic - Read src/main.py
  2. Check input schema - Review .actor/input_schema.json for expected inputs
  3. Add dependencies with uv - Use uv add package-name (NEVER pip install)
  4. Make code changes - Implement the requested features
  5. Format code - Run uv run ruff format . (MANDATORY)
  6. Lint code - Run uv run ruff check --fix . (MANDATORY)
  7. Test changes locally - Use apify run before deploying
  8. Update schema if needed - Add new fields to input schema
  9. Deploy - Push changes with apify push

Debugging Actors

  1. Test locally - Use apify run to test actor locally before deployment
  2. Check storage - Inspect ./storage/ directory for datasets, key-value stores, and request queues
  3. Add logging - Use Actor.log.info(), Actor.log.debug(), Actor.log.error() (see SDK references)
  4. View logs on platform - Check actor run logs in Apify Console for production issues

Best Practices

Code Quality

  • Validate input - Always check required fields and formats with clear error messages
  • Handle errors - Use try/catch with proper error logging and graceful degradation
  • Structured logging - Use Actor.log with extra fields for better debugging
  • Type hints - Add type annotations for better code clarity and IDE support
  • Docstrings - Document functions and modules for maintainability
  • Format with ruff - ALWAYS run uv run ruff format . before committing
  • Lint with ruff - ALWAYS run uv run ruff check --fix . before deploying

Performance & Scalability

  • Batch processing - Push data in batches (100-1000 items) for large datasets to reduce API calls
  • Use proxies - Avoid IP blocking for web scraping with proxy configuration
  • Resource limits - Set appropriate memory limits and timeouts in .actor/actor.json
  • Optimize Docker - Use multi-stage builds, bytecode compilation, and minimal base images
  • Consider Standby mode - For low-latency (<100ms), high-frequency use cases

Security & Configuration

  • Environment variables - Never hardcode secrets; use Actor.config and environment variables
  • Input validation - Use JSON Schema patterns, required fields, and runtime validation
  • Run as non-root - Use myuser in Dockerfile for container security
  • Minimize image size - Use .dockerignore to exclude unnecessary files and reduce build time

Development Workflow

  • Testing - Write tests with pytest; use coverage and snapshot testing for reliability
  • Pre-commit hooks - Use ruff and pre-commit for consistent code quality (MANDATORY)
  • Use uv exclusively - NEVER use pip or requirements.txt; only use uv add and uv sync (MANDATORY)
  • Lock dependencies - Always commit uv.lock for reproducible builds (MANDATORY)
  • Test locally - Always test with apify run before deploying to catch issues early
  • Dataset schemas - Define dataset_schema.json with views for better Apify Console UI
  • CLI support - Add CLI entry points via __main__.py for local testing and development

Standby Mode (Real-time API)

Standby mode allows actors to run as persistent HTTP servers, providing instant responses without cold start delays.

Perfect for:

  • Real-time APIs requiring <100ms response times
  • Webhook endpoints that need immediate processing
  • High-frequency requests (multiple requests per second)
  • Integration with real-time services (Slack bots, chat applications, webhooks)
  • Low-latency scraping APIs and on-demand data extraction

See references/standby-mode.md for complete implementation patterns, authentication, and examples.

References

Detailed documentation in references/:

  • python-sdk.md - SDK patterns and complete code examples
  • standby-mode.md - Real-time API implementation
  • input-schema.md - Input validation and UI configuration
  • output-schema.md - Output configuration and templates

Troubleshooting

If you need information not covered in this skill, use the WebFetch tool with https://docs.apify.com/llms.txt to access the complete official documentation.

Weekly Installs
1
First Seen
5 days ago
Installed on
opencode1