publish-models
Docs
- Cog reference: https://cog.run/llms.txt
cog pushreference: https://cog.run/cli#cog-push- cog-safe-push: https://github.com/replicate/cog-safe-push
- Model CI template: https://github.com/replicate/model-ci-template
- Continuous deployment guide: https://replicate.com/docs/guides/continuous-model-deployment
When to use this skill
- You have a working Cog project (see
build-modelsif you don't yet). - You want to publish a private or public model on Replicate.
- You're releasing a new version of an existing model and want to avoid breaking changes.
- You're setting up CI/CD for model releases.
Prerequisites
- Cog installed and
cog loginagainstr8.im(orecho $TOKEN | cog login --token-stdin). - A model created at
replicate.com/{owner}/{name}via the API, web UI, orr8-modelCLI. REPLICATE_API_TOKENset in your environment.
Plain cog push
The simplest path. Build and upload a new version:
cog push r8.im/owner/my-model
Or set image: r8.im/owner/my-model in cog.yaml and run a bare:
cog push
Useful flags:
--separate-weights— store weights in a separate layer; faster cold boots and pushes for models with > 1GB of weights.--x-fast— faster pushes during iteration (skips some validation).--secret id=hf,src=$HOME/.hf_token— pass build-time secrets without baking them into image history.
cog-safe-push (recommended for any model with users)
cog-safe-push pushes to a private -test model first, checks schema compatibility against the live version, runs prediction comparisons, and fuzzes inputs. Catches breaking changes before they reach users.
Install:
pip install git+https://github.com/replicate/cog-safe-push.git
Required env vars:
REPLICATE_API_TOKENANTHROPIC_API_KEY(Claude judges output similarity for stochastic models)
Basic usage:
cog-safe-push --test-hardware=gpu-l40s owner/my-model
This will:
- Lint
predict.pywith ruff. - Create a private test model
owner/my-model-testif missing. - Push the local Cog model to the test model.
- Lint the schema (descriptions, defaults, etc.).
- Check schema compatibility against the live
owner/my-modelversion. - Run prediction comparisons between live and test versions.
- Fuzz the test model with AI-generated inputs.
- If everything passes, push to
owner/my-model.
cog-safe-push.yaml schema
Drop a cog-safe-push.yaml in your project root (or cog-safe-push-configs/<variant>.yaml for multi-model repos). All five test-case checker types in one example:
model: owner/my-model
test_model: owner/my-model-test
test_hardware: gpu-l40s
predict:
compare_outputs: false # set false for stochastic models
predict_timeout: 600
test_cases:
- inputs:
prompt: "a serene mountain landscape"
match_prompt: "a landscape photo of mountains" # AI-judged via Claude
- inputs:
prompt: "a cat"
match_url: "https://example.com/reference-cat.png" # binary/image match
- inputs:
prompt: ""
error_contains: "prompt cannot be empty" # negative test
- inputs:
mode: "json"
jq_query: '.confidence > 0.8 and .status == "success"' # JSON output
- inputs:
prompt: "echo this"
exact_string: "echo this" # exact string match
fuzz:
fixed_inputs:
seed: 42
disabled_inputs:
- debug
iterations: 10
prompt: "Generate creative and diverse prompts"
train: # if your model has a trainer
destination: owner/my-model-trained
destination_hardware: gpu-l40s
train_timeout: 1800
test_cases:
- inputs:
input_images: "https://.../training.zip"
steps: 10
deployment: # auto-create or update on push
name: my-model
owner: owner
hardware: gpu-l40s
parallel: 4
fast_push: false
ignore_schema_compatibility: false
official_model: owner/my-model # for proxy/wrapper models, see below
Test case checkers are mutually exclusive: pick exactly one of match_prompt, match_url, error_contains, jq_query, or exact_string per case. Use compare_outputs: false for any stochastic model (diffusion, LLMs); the default true is brittle.
CI/CD: GitHub Actions
Two paths, depending on how much glue you want.
Path A: roll your own
# .github/workflows/push.yaml
name: Push to Replicate
on:
workflow_dispatch:
inputs:
no_push:
type: boolean
default: false
jobs:
push:
runs-on: ubuntu-latest-4-cores # builds need disk + cores
steps:
- uses: actions/checkout@v4
- uses: jlumbroso/free-disk-space@v1.3.1
with:
tool-cache: false
docker-images: false
- uses: replicate/setup-cog@v2
with:
token: ${{ secrets.REPLICATE_API_TOKEN }}
- run: pip install git+https://github.com/replicate/cog-safe-push.git
- env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
REPLICATE_API_TOKEN: ${{ secrets.REPLICATE_API_TOKEN }}
run: |
cog-safe-push -vv ${{ inputs.no_push && '--no-push' || '' }}
Add a concurrency: block so PR builds cancel each other while main-branch pushes queue:
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
Path B: reusable workflow from model-ci-template
For Replicate-style multi-model repos, drop in:
# .github/workflows/ci.yaml
name: CI
on:
pull_request: { branches: [main] }
push: { branches: [main] }
workflow_dispatch:
inputs:
models: { type: string, default: "all" }
ignore_schema_checks: { type: boolean, default: false }
cog_version: { type: string, default: "latest" }
test_only: { type: boolean, default: false }
jobs:
ci:
uses: replicate/model-ci-template/.github/workflows/template.yaml@main
with:
trigger_type: ${{ github.event_name }}
models: ${{ inputs.models || 'all' }}
ignore_schema_checks: ${{ inputs.ignore_schema_checks || false }}
cog_version: ${{ inputs.cog_version || 'latest' }}
test_only: ${{ inputs.test_only || false }}
secrets: inherit
The reusable workflow expects:
cog-safe-push-configs/<model>.yaml— one per model variant.script/select-model— bash file withif/elif [[ "$MODEL" == "..." ]]blocks listing valid model names.- Secrets:
COG_TOKEN,REPLICATE_API_TOKEN,ANTHROPIC_API_KEY.
Multi-model matrix pushes
Pattern from replicate/cog-flux: one repo, N variants, push them in parallel.
jobs:
prepare:
runs-on: ubuntu-latest
outputs:
matrix: ${{ steps.set.outputs.matrix }}
steps:
- id: set
run: |
if [ "${{ inputs.models }}" = "all" ]; then
echo 'matrix={"model":["schnell","dev","krea-dev"]}' >> "$GITHUB_OUTPUT"
else
list=$(echo "${{ inputs.models }}" | jq -Rc 'split(",")')
echo "matrix={\"model\":$list}" >> "$GITHUB_OUTPUT"
fi
push:
needs: prepare
runs-on: ubuntu-latest-4-cores
strategy:
fail-fast: false
matrix: ${{ fromJson(needs.prepare.outputs.matrix) }}
steps:
- uses: actions/checkout@v4
- run: ./script/select.sh ${{ matrix.model }} # produces cog.yaml from a template
- run: cog-safe-push --config cog-safe-push-configs/${{ matrix.model }}.yaml -vv
Two-pass push for proxy / official models
When you maintain a proxy that wraps a third-party API, you push to a private wrapper first, then update the public-facing official model card. Pattern from replicate/cog-official-template:
./script/write-api-key # bake API key into config
cog-safe-push --config cog-safe-push-configs/${MODEL}.yaml -vv
./script/delete-api-key # strip the key
cog-safe-push --push-official-model --config cog-safe-push-configs/${MODEL}.yaml -vv
Set official_model: owner/name in the config so --push-official-model knows where to publish.
Deployments
Add a deployment block to cog-safe-push.yaml to create or update a Replicate deployment automatically on each push:
deployment:
name: my-model
owner: owner
hardware: gpu-l40s
Scaling defaults: CPU deployments scale 1-20 instances, GPU deployments scale 0-2. Adjust manually via the API or web UI when needed.
Monitoring published models
Run an hourly canary that exercises the registry path. Pattern from replicate/cog-pagerduty-check:
name: Hourly cog push check
on:
schedule:
- cron: "0 * * * *"
workflow_dispatch:
jobs:
check:
runs-on: ubuntu-latest
steps:
- run: |
# generate a tiny model with a unique uuid, push it, run a prediction
# by digest, fail loudly if anything breaks.
./script/canary.sh
Worth doing for any production-critical model, especially when revenue depends on the registry being up.
Guidelines
- Don't break schema compatibility unless you mean to. cog-safe-push catches it;
--ignore-schema-compatibilityis the opt-out. - Pin
test_hardwareso test pushes are reproducible. - Use
--no-pushfor dry runs in PR CI; full push on merge to main or on version tags. - Push from CI rather than laptops once you have users.
- Use
compare_outputs: falsefor stochastic models. Usematch_prompt:for image/video outputs (VLM judgment),match_url:for binary outputs you control,jq_query:for JSON,error_contains:for negative tests. - Never commit
REPLICATE_API_TOKENorANTHROPIC_API_KEY. Use repo secrets. - For models with weights > 1GB, push with
--separate-weights.
Production references
- https://github.com/replicate/cog-safe-push — the tool itself, plus its config schema.
- https://github.com/replicate/model-ci-template — reusable GitHub Actions workflow.
- https://github.com/replicate/cog-official-template — proxy/official model template.
- https://github.com/replicate/cog-flux/blob/main/.github/workflows/push.yaml — matrix push across FLUX variants.
- https://github.com/replicate/cog-comfyui/blob/main/.github/workflows/ci.yaml — ComfyUI model CI with custom-node install step.
- https://github.com/replicate/cog-pagerduty-check — hourly canary pattern.
More from replicate/skills
replicate
Discover, compare, and run AI models using Replicate's API
460prompt-images
>
212run-models
Run AI models on Replicate via predictions, webhooks, and streaming.
207find-models
Find AI models on Replicate using search and curated collections.
202compare-models
Compare Replicate models by cost, speed, quality, and capabilities.
197prompt-videos
>
189