skills/modelslab/skills/modelslab-chat-generation

modelslab-chat-generation

SKILL.md

ModelsLab Chat/LLM Generation

Send chat completions to 60+ LLM models through a single OpenAI-compatible endpoint.

When to Use This Skill

  • Chat with AI models (DeepSeek, Llama, Gemini, Qwen, Mistral)
  • Build conversational AI applications
  • Generate text completions with system prompts
  • Use function/tool calling with LLMs
  • Stream responses for real-time output
  • Get structured JSON responses

API Endpoint

Chat Completions: POST https://modelslab.com/api/v7/llm/chat/completions

The endpoint follows the OpenAI chat completions format, making it easy to switch from OpenAI or use the OpenAI SDK.

Quick Start

import requests

def chat(message, api_key, model="meta-llama-3-8B-instruct"):
    """Send a chat completion request.

    Args:
        message: The user message
        api_key: Your ModelsLab API key
        model: LLM model ID (use modelslab models search --feature llmaster to find models)
    """
    response = requests.post(
        "https://modelslab.com/api/v7/llm/chat/completions",
        json={
            "key": api_key,
            "model_id": model,
            "messages": [
                {"role": "user", "content": message}
            ],
            "max_tokens": 200,
            "temperature": 0.7
        }
    )

    data = response.json()

    if "choices" in data:
        return data["choices"][0]["message"]["content"]
    else:
        raise Exception(f"Error: {data.get('message', 'Unknown error')}")

# Usage
reply = chat(
    "Explain quantum computing in simple terms.",
    "your_api_key"
)
print(reply)

With System Prompt

def chat_with_system(system_prompt, message, api_key, model="meta-llama-3-8B-instruct"):
    """Chat with a system prompt for role/behavior control."""
    response = requests.post(
        "https://modelslab.com/api/v7/llm/chat/completions",
        json={
            "key": api_key,
            "model_id": model,
            "messages": [
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": message}
            ],
            "max_tokens": 500,
            "temperature": 0.7
        }
    )

    data = response.json()
    if "choices" in data:
        return data["choices"][0]["message"]["content"]

# Usage
reply = chat_with_system(
    "You are a Python expert. Give concise code examples.",
    "How do I read a CSV file?",
    "your_api_key"
)

Multi-Turn Conversation

def conversation(messages, api_key, model="meta-llama-3-8B-instruct"):
    """Send a multi-turn conversation."""
    response = requests.post(
        "https://modelslab.com/api/v7/llm/chat/completions",
        json={
            "key": api_key,
            "model_id": model,
            "messages": messages,
            "max_tokens": 500,
            "temperature": 0.7
        }
    )
    return response.json()

# Multi-turn example
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is Python?"},
    {"role": "assistant", "content": "Python is a high-level programming language..."},
    {"role": "user", "content": "Show me a hello world example."}
]

result = conversation(messages, "your_api_key")
print(result["choices"][0]["message"]["content"])

OpenAI SDK Compatibility

You can use the official OpenAI Python SDK by changing the base URL and API key:

from openai import OpenAI

client = OpenAI(
    base_url="https://modelslab.com/api/v7/llm",
    api_key="your_modelslab_api_key"
)

response = client.chat.completions.create(
    model="meta-llama-3-8B-instruct",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"}
    ],
    max_tokens=100,
    temperature=0.7
)

print(response.choices[0].message.content)

cURL Example

curl -X POST "https://modelslab.com/api/v7/llm/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{
    "key": "your_api_key",
    "model_id": "meta-llama-3-8B-instruct",
    "messages": [
      {"role": "user", "content": "Say hello in one sentence."}
    ],
    "max_tokens": 50,
    "temperature": 0.7
  }'

Discovering LLM Models

# Search all chat/LLM models
modelslab models search --feature llmaster

# Search by provider
modelslab models search --search "deepseek"
modelslab models search --search "llama"
modelslab models search --search "gemini"

# Get model details
modelslab models detail --id meta-llama-3-8B-instruct

Popular LLM Model IDs

Meta Llama

  • meta-llama-3-8B-instruct — Llama 3 8B (fast, efficient)
  • meta-llama-Llama-3.3-70B-Instruct-Turbo — Llama 3.3 70B (powerful)
  • meta-llama-Meta-Llama-3.1-405B-Instruct-Turbo — Llama 3.1 405B (largest)

DeepSeek

  • deepseek-ai-DeepSeek-R1-Distill-Llama-70B — DeepSeek R1 (reasoning)
  • deepseek-ai-DeepSeek-V3 — DeepSeek V3 (general purpose)

Google

  • gemini-2.0-flash-001 — Gemini 2.0 Flash (fast)
  • gemini-2.5-pro — Gemini 2.5 Pro (capable)

Qwen

  • Qwen-Qwen2.5-72B-Instruct-Turbo — Qwen 2.5 72B
  • Qwen-Qwen2.5-Coder-32B-Instruct — Qwen 2.5 Coder

Mistral

  • mistralai-Mixtral-8x7B-Instruct-v0.1 — Mixtral 8x7B
  • mistralai-Mistral-Small-24B-Instruct-2501 — Mistral Small

Key Parameters

Parameter Type Required Description
model_id string Yes LLM model identifier (alias: model)
messages array Yes Array of message objects with role and content
temperature float No Sampling temperature (0-2, default varies by model)
max_tokens integer No Maximum tokens to generate
top_p float No Nucleus sampling (0-1)
top_k integer No Top-k sampling
frequency_penalty float No Frequency penalty (-2 to 2)
presence_penalty float No Presence penalty (-2 to 2)
streaming boolean No Enable streaming responses (alias: stream)
n integer No Number of completions (1-10)
stop array No Stop sequences (max 4)
seed integer No Random seed for reproducibility
tools array No Function/tool definitions
tool_choice string No Tool selection strategy
response_format object No {"type": "json_object"} for JSON mode

Response Format

The response follows the OpenAI chat completions format:

{
  "id": "gen-...",
  "model": "meta-llama-3-8B-instruct",
  "object": "chat.completion",
  "created": 1771658583,
  "choices": [
    {
      "index": 0,
      "finish_reason": "stop",
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      }
    }
  ],
  "usage": {
    "prompt_tokens": 17,
    "completion_tokens": 10,
    "total_tokens": 27
  }
}

Error Handling

def safe_chat(message, api_key, model="meta-llama-3-8B-instruct"):
    """Chat with error handling."""
    try:
        response = requests.post(
            "https://modelslab.com/api/v7/llm/chat/completions",
            json={
                "key": api_key,
                "model_id": model,
                "messages": [{"role": "user", "content": message}],
                "max_tokens": 200
            }
        )
        data = response.json()

        if "choices" in data:
            return data["choices"][0]["message"]["content"]
        elif data.get("status") == "error":
            raise Exception(data.get("message", "Unknown error"))
        else:
            raise Exception(f"Unexpected response: {data}")

    except requests.exceptions.RequestException as e:
        raise Exception(f"Network error: {e}")

try:
    reply = safe_chat("Hello!", "your_api_key")
    print(reply)
except Exception as e:
    print(f"Chat failed: {e}")

Resources

Related Skills

  • modelslab-model-discovery - Find and filter models
  • modelslab-account-management - Manage API keys and billing
  • modelslab-image-generation - Generate images from text
Weekly Installs
12
GitHub Stars
4
First Seen
Feb 21, 2026
Installed on
opencode12
gemini-cli12
github-copilot12
codex12
amp12
cline12