Troubleshooting gpt-oss and vLLM Errors

When to Use This Skill

Invoke this skill when you encounter:

openai_harmony.HarmonyError messages in any context
gpt-oss tool calling failures or unexpected behavior
Token parsing errors with vLLM serving gpt-oss models
Users asking about gpt-oss compatibility with frameworks like llama-stack

Critical First Step: Identify Error Source

IMPORTANT: openai_harmony.HarmonyError messages originate from the vLLM server, NOT from client applications (like llama-stack, LangChain, etc.).

Error Source Identification

Check the error origin:
- If error contains openai_harmony.HarmonyError, it's from vLLM's serving layer
- The client application is just reporting what vLLM returned
- Do NOT search the client codebase for fixes
Correct investigation path:
- Search vLLM GitHub issues and PRs
- Check openai/harmony repository for parser issues
- Review vLLM server configuration and startup flags
- Examine HuggingFace model files (generation_config.json)

Common Error Patterns

Token Mismatch Errors

Error Pattern: Unexpected token X while expecting start token Y

Example: Unexpected token 12606 while expecting start token 200006

Meaning:

vLLM expects special Harmony format control tokens
Model is generating regular text tokens instead
Token 12606 = "comment" (indicates model generating reasoning text instead of tool calls)

Known Issues:

vLLM #22519: gpt-oss-20b tool_call token errors
vLLM #22515: Same error, fixed by updating generation_config.json

Fixes:

Update model files from HuggingFace (see reference/model-updates.md)
Verify vLLM server flags for tool calling
Check generation_config.json EOS tokens

Tool Calling Not Working

Symptoms:

Model describes tools in text but doesn't call them
Empty tool_calls=[] arrays
Tool responses in wrong format

Root Causes:

Missing vLLM server flags
Outdated model configuration files
Configuration mismatch between client and server

Configuration Requirements:

vLLM server must be started with:

--tool-call-parser openai --enable-auto-tool-choice

For demo tool server:

--tool-server demo

For MCP tool servers:

--tool-server ip-1:port-1,ip-2:port-2

Important: Only tool_choice='auto' is supported.

Investigation Workflow

Identify the error message:
- Copy the exact error text
- Note any token IDs mentioned
Search vLLM GitHub:
- Use error text in issue search
- Include "gpt-oss" and model size (20b/120b)
- Check both open and closed issues
Check model configuration:
- Verify generation_config.json is current
- Compare against latest HuggingFace version
- Look for recent commits that updated config
Review server configuration:
- Check vLLM startup flags
- Verify tool-call-parser settings
- Confirm vLLM version compatibility
Check vLLM version:
- Many tool calling issues resolved in recent vLLM releases
- Update to latest version if encountering errors
- Check vLLM changelog for gpt-oss-specific fixes

Quick Reference

Key Resources

vLLM gpt-oss recipe: https://docs.vllm.ai/projects/recipes/en/latest/OpenAI/GPT-OSS.html
Common issues: See reference/known-issues.md
Model update procedure: See reference/model-updates.md

Diagnostic Commands

Check vLLM server health:

curl http://localhost:8000/health

List available models:

curl http://localhost:8000/v1/models

Check vLLM version:

pip show vllm

Progressive Disclosure

For detailed information:

Known GitHub issues: See reference/known-issues.md
Model file updates: See reference/model-updates.md
Tool calling configuration: See reference/tool-calling-setup.md

Validation Steps

After implementing fixes:

Test simple tool calling with single function
Verify Harmony format tokens in responses
Check for token mismatch errors in logs
Test multi-turn conversations with tools
Monitor for "unexpected token" errors

If errors persist:

Update vLLM to latest version
Check vLLM GitHub for recent fixes and PRs
Try different model variant (120b vs 20b)
Review vLLM logs for additional error context