novita-ai

Installation
SKILL.md

Novita AI

Access 200+ AI models through a unified API — LLM, image generation and editing, video generation, text-to-speech, speech recognition, and GPU cloud infrastructure.

  • OpenAI-compatible LLM API works as a drop-in replacement with any OpenAI SDK
  • 30+ image endpoints covering generation, editing, upscaling, background removal, face merging, and more
  • Video generation from 7+ providers including Kling, Wan, Minimax Hailuo, Vidu, and Seedance
  • Full GPU cloud management — instances, templates, storage, and serverless endpoints

Setup

  1. Get an API key at novita.ai/settings/key-management
  2. Set the environment variable: export NOVITA_API_KEY=your_key
  3. Base endpoint: https://api.novita.ai

Services

Service Use When Mode
LLM Chat, completion, embeddings, reranking Sync / Stream
Image Generation Text-to-image (FLUX, SD, Seedream, Hunyuan, Qwen, GLM) Sync / Async
Image Editing Remove BG, upscale, inpaint, outpaint, cleanup, reimagine, merge face Sync / Async
Video Generation Text-to-video, image-to-video (Kling, Wan, Hailuo, Vidu, PixVerse, Seedance) Async
Audio TTS, ASR, voice cloning (MiniMax, GLM, Fish Audio) Sync
Batch Bulk LLM processing (OpenAI-compatible) Async
GPU Cloud Instances, templates, storage, serverless endpoints Sync

LLM (OpenAI-Compatible)

Drop-in replacement for the OpenAI API — use any OpenAI SDK with base https://api.novita.ai/openai.

import os
from openai import OpenAI
client = OpenAI(base_url="https://api.novita.ai/openai", api_key=os.environ["NOVITA_API_KEY"])
response = client.chat.completions.create(
    model="moonshotai/kimi-k2.5",
    messages=[{"role": "user", "content": "Hello"}],
    max_tokens=512,
)

Models: Kimi K2.5, MiniMax M2.7, GLM-5, DeepSeek V3, DeepSeek R1, and more via /openai/v1/models.

Features: vision (multimodal), reasoning, function calling, structured outputs, prompt caching, batch API.

Image Capabilities

Feature Description
Generation FLUX.1 Schnell (fast, sync), FLUX Kontext, Stable Diffusion, Seedream, and more
Background Remove background, replace with prompt-guided new background
Editing Inpainting, outpainting, cleanup, reimagine, upscale
Face Merge face from one image onto another
Analysis Image-to-prompt — describe any image as text

Video Capabilities

Feature Description
Text-to-video Generate video from text via Kling, Wan, Hailuo, Vidu, Seedance
Image-to-video Animate a still image with motion
Unified API Single endpoint (/v3/video/create) for all video models

Audio Capabilities

Feature Description
Text-to-speech MiniMax (English, 17 voices, emotion control) and GLM (Chinese, low latency)
Speech-to-text GLM ASR transcription
Voice cloning Clone a voice from an audio sample

GPU Cloud

Manage dedicated GPU instances, templates, network storage, and serverless endpoints for custom model deployment.

Security

  • Never hardcode API keys — use environment variables or secret managers
  • All media inputs should come from trusted, local sources only
  • Enable NSFW detection for user-facing image applications

API References

For detailed endpoint parameters, request and response schemas, and code examples:


Related skills
Installs
11
GitHub Stars
4
First Seen
Mar 25, 2026