replicate

SKILL.md

Docs

Workflow

Here's a common workflow for using Replicate's API to run a model:

  1. Choose the right model - Search with the API or ask the user
  2. Get model metadata - Fetch model input and output schema via API
  3. Create prediction - POST to /v1/predictions
  4. Poll for results - GET prediction until status is "succeeded"
  5. Return output - Usually URLs to generated content

Choosing models

  • Use the search and collections APIs to find and compare the best models. Do not list all the models via API, as it's basically a firehose.
  • Collections are curated by Replicate staff, so they're vetted.
  • Official models are in the "official" collection.
  • Use official models because they:
    • are always running
    • have stable API interfaces
    • have predictable output pricing
    • are maintained by Replicate staff
  • If you must use a community model, be aware that it can take a long time to boot.
  • You can create always-on deployments of community models, but you pay for model uptime.

Running models

Models take time to run. There are three ways to run a model via API and get its output:

  1. Create a prediction, store its id from the response, and poll until completion.
  2. Set a Prefer: wait header when creating a prediction for a blocking synchronous response. Only recommended for very fast models.
  3. Set an HTTPS webhook URL when creating a prediction, and Replicate will POST to that URL when the prediction completes.

Follow these guideliness when running models:

  • Use the "POST /v1/predictions" endpoint, as it supports both official and community models.
  • Every model has its own OpenAPI schema. Always fetch and check model schemas to make sure you're setting valid inputs. Even popular models change their schemas.
  • Validate input parameters against schema constraints (minimum, maximum, enum values). Don't generate values that violate them.
  • When unsure about a parameter value, use the model's default example or omit the optional parameter.
  • Don't set optional inputs unless you have a reason to. Stick to the required inputs and let the model's defaults do the work.
  • Use HTTPS URLs for file inputs whenever possible. You can also send base64-encoded files, but they should be avoided.
  • Fire off multiple predictions concurrently. Don't wait for one to finish before starting the next.
  • Output file URLs expire after 1 hour, so back them up if you need to keep them, using a service like Cloudflare R2.
  • Webhooks are a good mechanism for receiving and storing prediction output.
Weekly Installs
233
GitHub Stars
27
First Seen
Feb 16, 2026
Installed on
opencode223
codex222
gemini-cli221
github-copilot220
amp218
kimi-cli217