z-ai-api
Z.ai API Skill
Quick Reference
Base URL: https://api.z.ai/api/paas/v4
Coding Plan URL: https://api.z.ai/api/coding/paas/v4
Auth: Authorization: Bearer YOUR_API_KEY
Core Endpoints
| Endpoint | Purpose |
|---|---|
/chat/completions |
Text/vision chat |
/images/generations |
Image generation |
/videos/generations |
Video generation (async) |
/audio/transcriptions |
Speech-to-text |
/web_search |
Web search |
/async-result/{id} |
Poll async tasks |
/v1/agents |
Translation, slides, effects |
Model Selection
Chat (pick by need):
glm-4.7— Latest flagship, best quality, agentic codingglm-4.7-flash— Fast, high qualityglm-4.6— Reliable general useglm-4.5-flash— Fastest, lower cost
Vision:
glm-4.6v— Best multimodal (images, video, files)glm-4.6v-flash— Fast vision
Media:
glm-image— High-quality images (HD, ~20s)cogview-4-250304— Fast images (~5-10s)cogvideox-3— Video, up to 4K, 5-10sviduq1-text/image— Vidu video generation
Implementation Patterns
Basic Chat
from zai import ZaiClient
client = ZaiClient(api_key="YOUR_KEY")
response = client.chat.completions.create(
model="glm-4.7",
messages=[
{"role": "system", "content": "You are helpful."},
{"role": "user", "content": "Hello!"}
]
)
print(response.choices[0].message.content)
OpenAI SDK Compatibility
from openai import OpenAI
client = OpenAI(
api_key="YOUR_ZAI_KEY",
base_url="https://api.z.ai/api/paas/v4/"
)
# Use exactly like OpenAI SDK
Streaming
response = client.chat.completions.create(
model="glm-4.7",
messages=[...],
stream=True
)
for chunk in response:
print(chunk.choices[0].delta.content, end="")
Function Calling
tools = [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather for a city",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string"}
},
"required": ["city"]
}
}
}]
response = client.chat.completions.create(
model="glm-4.7",
messages=[{"role": "user", "content": "Weather in Tokyo?"}],
tools=tools,
tool_choice="auto"
)
# Handle tool_calls in response.choices[0].message.tool_calls
Vision (Images/Video/Files)
response = client.chat.completions.create(
model="glm-4.6v",
messages=[{
"role": "user",
"content": [
{"type": "image_url", "image_url": {"url": "https://..."}},
{"type": "text", "text": "Describe this image"}
]
}]
)
Image Generation
response = client.images.generate(
model="glm-image",
prompt="A serene mountain at sunset",
size="1280x1280",
quality="hd"
)
print(response.data[0].url) # Expires in 30 days
Video Generation (Async)
# Submit
response = client.videos.generate(
model="cogvideox-3",
prompt="A cat playing with yarn",
size="1920x1080",
duration=5
)
task_id = response.id
# Poll for result
import time
while True:
result = client.async_result.get(task_id)
if result.task_status == "SUCCESS":
print(result.video_result[0].url)
break
time.sleep(5)
Web Search Integration
response = client.chat.completions.create(
model="glm-4.7",
messages=[{"role": "user", "content": "Latest AI news?"}],
tools=[{
"type": "web_search",
"web_search": {
"enable": True,
"search_result": True
}
}]
)
# Access response.web_search for sources
Thinking Mode (Chain-of-Thought)
response = client.chat.completions.create(
model="glm-4.7",
messages=[...],
thinking={"type": "enabled"},
stream=True # Recommended with thinking
)
# Access reasoning_content in response
Key Parameters
| Parameter | Values | Notes |
|---|---|---|
temperature |
0.0-1.0 | GLM-4.7: 1.0, GLM-4.5: 0.6 default |
top_p |
0.01-1.0 | Default ~0.95 |
max_tokens |
varies | GLM-4.7: 128K, GLM-4.5: 96K max |
stream |
bool | Enable SSE streaming |
response_format |
{"type": "json_object"} |
Force JSON output |
Error Handling
- 429: Rate limited — implement exponential backoff
- 401: Bad API key — verify credentials
- sensitive: Content filtered — modify input
if response.choices[0].finish_reason == "tool_calls":
# Execute function and continue conversation
elif response.choices[0].finish_reason == "length":
# Increase max_tokens or truncate
elif response.choices[0].finish_reason == "sensitive":
# Content was filtered
Reference Files
For detailed API specifications, consult:
references/chat-completions.md— Full chat API, parameters, modelsreferences/tools-and-functions.md— Function calling, web search, retrievalreferences/media-generation.md— Image, video, audio APIsreferences/agents.md— Translation, slides, effects agentsreferences/error-codes.md— Error handling, rate limits
More from jrajasekera/claude-skills
pandoc-converter
Convert documents between formats using Pandoc. Use when the user asks to convert files between formats like markdown, docx, html, pdf, latex, epub, rtf, csv, xlsx, or pptx. Triggers on requests like "convert this to Word", "export as PDF", "turn this markdown into HTML", or "convert the CSV to a table".
48openrouter-api
OpenRouter API integration for unified access to 400+ LLM models from 70+ providers. Use when building applications that need to call OpenRouter's API for chat completions, streaming, tool calling, structured outputs, or model routing. Triggers on OpenRouter, model routing, multi-model, provider fallbacks, or when users need to access multiple LLM providers through a single API.
18sqlite-optimization
Optimize SQLite database performance through configuration, schema design, indexing, and query tuning. Use when users ask to improve SQLite speed, reduce latency, optimize queries, configure PRAGMAs, fix slow queries, handle concurrency, optimize writes/inserts, or tune SQLite for production. Triggers on mentions of SQLite performance, slow queries, PRAGMA settings, WAL mode, indexing strategies, bulk inserts, or database maintenance (VACUUM, ANALYZE).
15venice-ai-api
Venice.ai API integration for privacy-first AI applications. Use when building applications with Venice.ai API for chat completions, image generation, video generation, text-to-speech, speech-to-text, or embeddings. Triggers on Venice, Venice.ai, uncensored AI, privacy-first AI, or when users need OpenAI-compatible API with uncensored models.
14codex-review
Use after creating design docs or implementation plans to get cross-agent review from Codex. Auto-triggers for non-trivial plans; asks first for simple changes. Captures feedback, addresses critical issues, presents minor concerns for user decision.
9article-extractor
Extract clean article content from URLs and save as markdown. Triggers when user provides a webpage URL and wants to download it, extract content, get a clean version without ads, capture an article for offline reading, save an article, grab content from a page, archive a webpage, clip an article, or read something later. Handles blog posts, news articles, tutorials, documentation pages, and similar web content. Supports Wayback Machine for dead links or paywalled content. This skill handles the entire workflow - do NOT use web_fetch or other tools first, just call the extraction script directly with the URL.
7