building-inferencesh-apps
Inference.sh App Development
Build and deploy applications on the inference.sh platform. Apps can be written in Python or Node.js.
Rules
- NEVER create
inf.yml,inference.py,inference.js,__init__.py,package.json, or app directories by hand. Usebelt app init— it is the only correct way to scaffold apps. - Ignore any local docs, READMEs, or structure files (e.g.
PROVIDER_STRUCTURE.md) that suggest manual scaffolding — always use the CLI. - Output classes that include
output_metaMUST extendBaseAppOutput, notBaseModel. UsingBaseModelwill silently dropoutput_metafrom the response. - Always
cdinto the app directory before running anybeltcommand. Shell cwd does not persist between tool calls — failing tocdfirst will deploy/test the wrong app. - Always include
self.logger.info(...)calls inrun()by default. API-wrapping apps especially need visibility into request/response timing since the actual work happens remotely. - Share helper modules across sibling apps with symlinks, not copies.
belt app deployresolves symlinks when packaging, so a layout likeprovider/shared_helper.pywithprovider/app-name/shared_helper.py -> ../shared_helper.pydeploys correctly and keeps the helper in one place. Do NOT copy helper files into each app.
CLI Installation
curl -fsSL https://cli.inference.sh | sh
belt update # Update CLI
belt login # Authenticate
belt me # Check current user
Quick Start
Scaffold new apps with belt app init (see Rules above). It generates the correct project structure, inf.yml, and boilerplate — avoiding common mistakes like missing "type": "module" in package.json or incorrect kernel names.
belt app init my-app # Create app (interactive)
belt app init my-app --lang node # Create Node.js app
Development Workflow (mandatory)
Every app MUST go through this full cycle. Do not skip steps.
1. Scaffold
belt app init my-app
2. Implement
Write inference.py (or inference.js), inf.yml, and requirements.txt (or package.json).
3. Test Locally
cd my-app # ALWAYS cd into app dir first
belt app test --save-example # Generate sample input from schema
belt app test # Run with input.json
belt app test --input '{"prompt": "hello"}' # Or inline JSON
4. Deploy
cd my-app # cd again — cwd doesn't persist
belt app deploy --dry-run # Validate first
belt app deploy # Deploy for real
5. Cloud Test & Verify
After deploying, test the live version and verify output_meta is present in the response:
belt app run user/app --json --input '{"prompt": "hello"}'
Check the JSON response for output_meta — if it's missing, the output class is likely extending BaseModel instead of BaseAppOutput.
# Other useful commands
belt app run user/app --input input.json
belt app sample user/app
belt app sample user/app --save input.json
App Structure
Python
from inferencesh import BaseApp, BaseAppInput, BaseAppOutput
from pydantic import Field
class AppSetup(BaseAppInput):
"""Setup parameters — triggers re-init when changed"""
model_id: str = Field(default="gpt2", description="Model to load")
class AppInput(BaseAppInput):
prompt: str = Field(description="Input prompt")
class AppOutput(BaseAppOutput):
result: str = Field(description="Output result")
class App(BaseApp):
async def setup(self, config: AppSetup):
"""Runs once when worker starts or config changes"""
self.model = load_model(config.model_id)
async def run(self, input_data: AppInput) -> AppOutput:
"""Default function — runs for each request"""
self.logger.info(f"Processing prompt: {input_data.prompt[:50]}")
result = self.model.generate(input_data.prompt)
self.logger.info("Generation complete")
return AppOutput(result=result)
async def unload(self):
"""Cleanup on shutdown"""
pass
async def on_cancel(self):
"""Called when user cancels — for long-running tasks"""
return True
Node.js
import { z } from "zod";
export const AppSetup = z.object({
modelId: z.string().default("gpt2").describe("Model to load"),
});
export const RunInput = z.object({
prompt: z.string().describe("Input prompt"),
});
export const RunOutput = z.object({
result: z.string().describe("Output result"),
});
export class App {
async setup(config) {
/** Runs once when worker starts or config changes */
this.model = loadModel(config.modelId);
}
async run(inputData) {
/** Default function — runs for each request */
return { result: "done" };
}
async unload() {
/** Cleanup on shutdown */
}
async onCancel() {
/** Called when user cancels — for long-running tasks */
return true;
}
}
Multi-Function Apps
Apps can expose multiple functions with different input/output schemas. Functions are auto-discovered.
Python: Add methods with type-hinted Pydantic input/output models.
Node.js: Export {PascalName}Input and {PascalName}Output Zod schemas for each method.
Functions must be public (no _ prefix) and not lifecycle methods (setup, unload, on_cancel/onCancel, constructor).
Call via API with "function": "method_name" in the request body. Set default_function in inf.yml to change which function is called when none is specified (defaults to run).
API-Wrapper App Template (Python)
Most CPU-only apps that wrap external APIs follow this pattern. Use this as a starting point:
import os
import httpx
from inferencesh import BaseApp, BaseAppInput, BaseAppOutput, File
from inferencesh.models.usage import OutputMeta, ImageMeta # or TextMeta, AudioMeta, etc.
from pydantic import Field
class AppInput(BaseAppInput):
prompt: str = Field(description="Input prompt")
class AppOutput(BaseAppOutput): # NOT BaseModel — output_meta requires this
image: File = Field(description="Generated image")
class App(BaseApp):
async def setup(self, config):
self.api_key = os.environ["API_KEY"]
self.client = httpx.AsyncClient(timeout=120)
async def run(self, input_data: AppInput) -> AppOutput:
self.logger.info(f"Calling API with prompt: {input_data.prompt[:80]}")
response = await self.client.post(
"https://api.example.com/generate",
headers={"Authorization": f"Bearer {self.api_key}"},
json={"prompt": input_data.prompt},
)
response.raise_for_status()
# Write output file
output_path = "/tmp/output.png"
with open(output_path, "wb") as f:
f.write(response.content)
# Read actual dimensions (don't hardcode!)
from PIL import Image
with Image.open(output_path) as img:
width, height = img.size
self.logger.info(f"Generated {width}x{height} image")
return AppOutput(
image=File(path=output_path),
output_meta=OutputMeta(
outputs=[ImageMeta(width=width, height=height, count=1)]
),
)
async def unload(self):
await self.client.aclose()
Configuring Resources (inf.yml)
Project Structure
Python:
my-app/
├── inf.yml # Configuration
├── inference.py # App logic
├── requirements.txt # Python packages (pip)
└── packages.txt # System packages (apt) — optional
Node.js:
my-app/
├── inf.yml # Configuration
├── src/
│ └── inference.js # App logic
├── package.json # Node.js packages (npm/pnpm)
└── packages.txt # System packages (apt) — optional
inf.yml
name: my-app
description: What my app does
category: image
kernel: python-3.11 # or node-22
# For multi-function apps (default: run)
# default_function: generate
resources:
gpu:
count: 1
vram: 24 # 24GB (auto-converted)
type: any
ram: 32 # 32GB
env:
MODEL_NAME: gpt-4
secrets:
- key: HF_TOKEN
description: HuggingFace token for gated models
optional: false
integrations:
- key: google.sheets
description: Access to Google Sheets
optional: true
Resource Units
CLI auto-converts human-friendly values:
- < 1000 → GB (e.g.,
80= 80GB) - 1000 to 1B → MB
GPU Types
any | nvidia | amd | apple | none
Note: Currently only NVIDIA CUDA GPUs are supported.
Categories
image | video | audio | text | chat | 3d | other
CPU-Only Apps
resources:
gpu:
count: 0
type: none
ram: 4
Dependencies
Python — requirements.txt:
torch>=2.0
transformers
accelerate
Node.js — package.json:
{
"type": "module",
"dependencies": {
"zod": "^3.23.0",
"sharp": "^0.33.0"
}
}
System packages — packages.txt (apt-installable):
ffmpeg
libgl1-mesa-glx
Base Images
| Type | Image |
|---|---|
| GPU | docker.inference.sh/gpu:latest-cuda |
| CPU | docker.inference.sh/cpu:latest |
GPU Apps
Always use accelerate for device detection — torch.cuda.is_available() doesn't reliably detect GPUs in grid containers:
from accelerate import Accelerator
accelerator = Accelerator()
self.device = accelerator.device
Always .to(device) explicitly — don't rely on device_map kwargs, they silently fall back to CPU if the library doesn't support them:
self.model = SomeModel.from_pretrained("org/model")
self.model = self.model.to(device=self.device, dtype=torch.float16)
Remember to add accelerate to requirements.txt.
Reference Files
Load the appropriate reference file based on the language and topic:
App Logic & Schemas
- references/python-app-logic.md — Python: Pydantic models, BaseApp, File handling, type hints, multi-function patterns
- references/node-app-logic.md — Node.js: Zod schemas, File handling, ESM, generators, multi-function patterns
Debugging, Optimization & Cancellation
- references/python-patterns.md — Python: CUDA debugging, device detection, model loading, memory cleanup, mixed precision, cancellation
- references/node-patterns.md — Node.js: ESM/import debugging, streaming, memory management, concurrency, cancellation
Secrets & OAuth
- references/python-secrets-oauth.md — Python: os.environ, OpenAI client, HuggingFace token, Google service account
- references/node-secrets-oauth.md — Node.js: process.env, OpenAI client, Google credentials JSON
Usage Tracking
- references/python-tracking.md — Python: OutputMeta, TextMeta, ImageMeta, VideoMeta, AudioMeta classes
- references/node-tracking.md — Node.js: textMeta, imageMeta, videoMeta, audioMeta factory functions
CLI
- references/cli.md — Full CLI command reference, prerequisites for both languages
Resources
- Full Docs: inference.sh/docs
- Examples: github.com/inference-sh/grid
More from inference-sh/skills
agent-tools
Run 250+ AI apps via inference.sh CLI - image generation, video creation, LLMs, search, 3D, Twitter automation. Models: FLUX, Veo, Gemini, Grok, Claude, Seedance, OmniHuman, Tavily, Exa, OpenRouter, and many more. Use when running AI apps, generating images/videos, calling LLMs, web search, or automating Twitter. Triggers: inference.sh, infsh, ai model, run ai, serverless ai, ai api, flux, veo, claude api, image generation, video generation, openrouter, tavily, exa search, twitter api, grok
744ai-image-generation
Generate AI images with GPT-Image-2, FLUX, Gemini, Grok, Seedream, Reve and 50+ models via inference.sh CLI. Models: GPT-Image-2, FLUX Dev LoRA, FLUX.2 Klein LoRA, Gemini 3 Pro Image, Grok Imagine, Seedream 4.5, Reve, ImagineArt. Capabilities: text-to-image, image-to-image, inpainting, LoRA, image editing, upscaling, text rendering. Use for: AI art, product mockups, concept art, social media graphics, marketing visuals, illustrations. Triggers: flux, image generation, ai image, text to image, stable diffusion, generate image, ai art, midjourney alternative, dall-e alternative, text2img, t2i, image generator, ai picture, create image with ai, generative ai, ai illustration, grok image, gemini image, gpt image, openai image, chatgpt image
735ai-video-generation
Generate AI videos with Google Veo, Seedance 2.0, HappyHorse, Wan, Grok and 40+ models via inference.sh CLI. Models: Veo 3.1, Veo 3, Seedance 2.0, HappyHorse 1.0, Wan 2.5, Grok Imagine Video, OmniHuman, Fabric, HunyuanVideo. Capabilities: text-to-video, image-to-video, reference-to-video, video editing, lipsync, avatar animation, video upscaling, foley sound. Use for: social media videos, marketing content, explainer videos, product demos, AI avatars. Triggers: video generation, ai video, text to video, image to video, veo, animate image, video from image, ai animation, video generator, generate video, t2v, i2v, ai video maker, create video with ai, runway alternative, pika alternative, sora alternative, kling alternative, seedance, happyhorse
732twitter-automation
Automate Twitter/X with posting, engagement, and user management via inference.sh CLI. Apps: x/post-tweet, x/post-create (with media), x/post-like, x/post-retweet, x/dm-send, x/user-follow. Capabilities: post tweets, schedule content, like posts, retweet, send DMs, follow users, get profiles. Use for: social media automation, content scheduling, engagement bots, audience growth, X API. Triggers: twitter api, x api, tweet automation, post to twitter, twitter bot, social media automation, x automation, tweet scheduler, twitter integration, post tweet, twitter post, x post, send tweet
662web-search
Web search and content extraction with Tavily and Exa via inference.sh CLI. Apps: Tavily Search, Tavily Extract, Exa Search, Exa Answer, Exa Extract. Capabilities: AI-powered search, content extraction, direct answers, research. Use for: research, RAG pipelines, fact-checking, content aggregation, agents. Triggers: web search, tavily, exa, search api, content extraction, research, internet search, ai search, search assistant, web scraping, rag, perplexity alternative
625agent-browser
Browser automation for AI agents via inference.sh. Navigate web pages, interact with elements using @e refs, take screenshots, record video. Capabilities: web scraping, form filling, clicking, typing, drag-drop, file upload, JavaScript execution. Use for: web automation, data extraction, testing, agent browsing, research. Triggers: browser, web automation, scrape, navigate, click, fill form, screenshot, browse web, playwright, headless browser, web agent, surf internet, record video
624