skills/inference-sh/skills/building-inferencesh-apps

building-inferencesh-apps

SKILL.md

Inference.sh App Development

Build and deploy applications on the inference.sh platform. Apps can be written in Python or Node.js.

CLI Installation

curl -fsSL https://cli.inference.sh | sh
infsh update   # Update CLI
infsh login    # Authenticate
infsh me       # Check current user

Quick Start

Always use infsh app init to create new apps — it scaffolds the correct project structure, inf.yml, and boilerplate code. Don't create these files by hand; the CLI ensures everything is wired up correctly and avoids common mistakes like missing "type": "module" in package.json or incorrect kernel names in inf.yml.

infsh app init my-app              # Create app (interactive)
infsh app init my-app --lang node  # Create Node.js app
infsh app test                     # Test locally
infsh app deploy                   # Deploy

App Structure

Python

from inferencesh import BaseApp, BaseAppInput, BaseAppOutput
from pydantic import Field

class AppSetup(BaseAppInput):
    """Setup parameters — triggers re-init when changed"""
    model_id: str = Field(default="gpt2", description="Model to load")

class AppInput(BaseAppInput):
    prompt: str = Field(description="Input prompt")

class AppOutput(BaseAppOutput):
    result: str = Field(description="Output result")

class App(BaseApp):
    async def setup(self, config: AppSetup):
        """Runs once when worker starts or config changes"""
        self.model = load_model(config.model_id)

    async def run(self, input_data: AppInput) -> AppOutput:
        """Default function — runs for each request"""
        return AppOutput(result="done")

    async def unload(self):
        """Cleanup on shutdown"""
        pass

    async def on_cancel(self):
        """Called when user cancels — for long-running tasks"""
        return True

Node.js

import { z } from "zod";

export const AppSetup = z.object({
  modelId: z.string().default("gpt2").describe("Model to load"),
});

export const RunInput = z.object({
  prompt: z.string().describe("Input prompt"),
});

export const RunOutput = z.object({
  result: z.string().describe("Output result"),
});

export class App {
  async setup(config) {
    /** Runs once when worker starts or config changes */
    this.model = loadModel(config.modelId);
  }

  async run(inputData) {
    /** Default function — runs for each request */
    return { result: "done" };
  }

  async unload() {
    /** Cleanup on shutdown */
  }

  async onCancel() {
    /** Called when user cancels — for long-running tasks */
    return true;
  }
}

Multi-Function Apps

Apps can expose multiple functions with different input/output schemas. Functions are auto-discovered.

Python: Add methods with type-hinted Pydantic input/output models. Node.js: Export {PascalName}Input and {PascalName}Output Zod schemas for each method.

Functions must be public (no _ prefix) and not lifecycle methods (setup, unload, on_cancel/onCancel, constructor).

Call via API with "function": "method_name" in the request body. Set default_function in inf.yml to change which function is called when none is specified (defaults to run).

Configuring Resources (inf.yml)

Project Structure

Python:

my-app/
├── inf.yml           # Configuration
├── inference.py      # App logic
├── requirements.txt  # Python packages (pip)
└── packages.txt      # System packages (apt) — optional

Node.js:

my-app/
├── inf.yml           # Configuration
├── src/
│   └── inference.js  # App logic
├── package.json      # Node.js packages (npm/pnpm)
└── packages.txt      # System packages (apt) — optional

inf.yml

name: my-app
description: What my app does
category: image
kernel: python-3.11     # or node-22

# For multi-function apps (default: run)
# default_function: generate

resources:
  gpu:
    count: 1
    vram: 24    # 24GB (auto-converted)
    type: any
  ram: 32       # 32GB

env:
  MODEL_NAME: gpt-4

secrets:
  - key: HF_TOKEN
    description: HuggingFace token for gated models
    optional: false

integrations:
  - key: google.sheets
    description: Access to Google Sheets
    optional: true

Resource Units

CLI auto-converts human-friendly values:

  • < 1000 → GB (e.g., 80 = 80GB)
  • 1000 to 1B → MB

GPU Types

any | nvidia | amd | apple | none

Note: Currently only NVIDIA CUDA GPUs are supported.

Categories

image | video | audio | text | chat | 3d | other

CPU-Only Apps

resources:
  gpu:
    count: 0
    type: none
  ram: 4

Dependencies

Pythonrequirements.txt:

torch>=2.0
transformers
accelerate

Node.jspackage.json:

{
  "type": "module",
  "dependencies": {
    "zod": "^3.23.0",
    "sharp": "^0.33.0"
  }
}

System packagespackages.txt (apt-installable):

ffmpeg
libgl1-mesa-glx

Base Images

Type Image
GPU docker.inference.sh/gpu:latest-cuda
CPU docker.inference.sh/cpu:latest

Reference Files

Load the appropriate reference file based on the language and topic:

App Logic & Schemas

Debugging, Optimization & Cancellation

Secrets & OAuth

Usage Tracking

CLI

Resources

Weekly Installs
1
GitHub Stars
182
First Seen
Today
Installed on
mcpjam1
openhands1
junie1
windsurf1
zencoder1
crush1