Docs

Workflow

Speed: Check metrics.predict_time on completed predictions for actual inference time. Official models are always warm. Community models can cold-boot.
Cost: Official models have predictable per-run pricing. Community models charge by compute time (GPU-seconds). Run a few predictions and check the metrics field for actual cost.
Quality: Run the same prompts through each model and compare outputs. Quality is subjective. Match it to your use case, not a leaderboard.
Capabilities: Compare input schemas for supported features (reference images, masks, aspect ratios, streaming, multi-image input). Check output formats.

Lowest cost: smaller/distilled models. Accept slower inference and lower quality.
Lowest latency: official models or schnell/turbo variants. Accept higher cost per run.
Highest quality: pro/max/quality variants. Accept slower inference and higher cost.
Most control: models with ControlNet, masks, or reference images. Accept more complex input setup.

Official models: always warm, stable APIs, predictable pricing, maintained by Replicate.
Community models: may cold-boot, require version pinning, maintained by the author.
If a community model meets your needs and an official model doesn't, consider creating a deployment for consistent uptime.

For prompting techniques and task-specific guidance: