Novita AI

Access 200+ AI models through a unified API — LLM, image generation and editing, video generation, text-to-speech, speech recognition, and GPU cloud infrastructure.

OpenAI-compatible LLM API works as a drop-in replacement with any OpenAI SDK
30+ image endpoints covering generation, editing, upscaling, background removal, face merging, and more
Video generation from 7+ providers including Kling, Wan, Minimax Hailuo, Vidu, and Seedance
Full GPU cloud management — instances, templates, storage, and serverless endpoints

Setup

Get an API key at novita.ai/settings/key-management
Set the environment variable: export NOVITA_API_KEY=your_key
Base endpoint: https://api.novita.ai

Services

Service	Use When	Mode
LLM	Chat, completion, embeddings, reranking	Sync / Stream
Image Generation	Text-to-image (FLUX, SD, Seedream, Hunyuan, Qwen, GLM)	Sync / Async
Image Editing	Remove BG, upscale, inpaint, outpaint, cleanup, reimagine, merge face	Sync / Async
Video Generation	Text-to-video, image-to-video (Kling, Wan, Hailuo, Vidu, PixVerse, Seedance)	Async
Audio	TTS, ASR, voice cloning (MiniMax, GLM, Fish Audio)	Sync
Batch	Bulk LLM processing (OpenAI-compatible)	Async
GPU Cloud	Instances, templates, storage, serverless endpoints	Sync

LLM (OpenAI-Compatible)

Drop-in replacement for the OpenAI API — use any OpenAI SDK with base https://api.novita.ai/openai.

import os
from openai import OpenAI
client = OpenAI(base_url="https://api.novita.ai/openai", api_key=os.environ["NOVITA_API_KEY"])
response = client.chat.completions.create(
    model="moonshotai/kimi-k2.5",
    messages=[{"role": "user", "content": "Hello"}],
    max_tokens=512,
)

Models: Kimi K2.5, MiniMax M2.7, GLM-5, DeepSeek V3, DeepSeek R1, and more via /openai/v1/models.

Features: vision (multimodal), reasoning, function calling, structured outputs, prompt caching, batch API.

Image Capabilities

Feature	Description
Generation	FLUX.1 Schnell (fast, sync), FLUX Kontext, Stable Diffusion, Seedream, and more
Background	Remove background, replace with prompt-guided new background
Editing	Inpainting, outpainting, cleanup, reimagine, upscale
Face	Merge face from one image onto another
Analysis	Image-to-prompt — describe any image as text

Video Capabilities

Feature	Description
Text-to-video	Generate video from text via Kling, Wan, Hailuo, Vidu, Seedance
Image-to-video	Animate a still image with motion
Unified API	Single endpoint (`/v3/video/create`) for all video models

Audio Capabilities

Feature	Description
Text-to-speech	MiniMax (English, 17 voices, emotion control) and GLM (Chinese, low latency)
Speech-to-text	GLM ASR transcription
Voice cloning	Clone a voice from an audio sample

GPU Cloud

Manage dedicated GPU instances, templates, network storage, and serverless endpoints for custom model deployment.

Security

Never hardcode API keys — use environment variables or secret managers
All media inputs should come from trusted, local sources only
Enable NSFW detection for user-facing image applications

API References

For detailed endpoint parameters, request and response schemas, and code examples:

LLM: references/llm-api.md — Chat, embeddings, rerank, function calling, structured outputs, batch
Image: references/image-api.md — All generation and editing endpoints
Video: references/video-api.md — Unified API and model-specific parameters
Audio: references/audio-api.md — TTS variants, ASR, voice cloning
GPU Cloud: references/gpu-api.md — Instances, templates, storage, serverless

novita-ai