Troubleshooting gpt-oss and vLLM Errors
Troubleshooting gpt-oss and vLLM Errors
When to Use This Skill
Invoke this skill when you encounter:
openai_harmony.HarmonyErrormessages in any context- gpt-oss tool calling failures or unexpected behavior
- Token parsing errors with vLLM serving gpt-oss models
- Users asking about gpt-oss compatibility with frameworks like llama-stack
Critical First Step: Identify Error Source
IMPORTANT: openai_harmony.HarmonyError messages originate from the vLLM server, NOT from client applications (like llama-stack, LangChain, etc.).
Error Source Identification
-
Check the error origin:
- If error contains
openai_harmony.HarmonyError, it's from vLLM's serving layer - The client application is just reporting what vLLM returned
- Do NOT search the client codebase for fixes
- If error contains
-
Correct investigation path:
- Search vLLM GitHub issues and PRs
- Check openai/harmony repository for parser issues
- Review vLLM server configuration and startup flags
- Examine HuggingFace model files (generation_config.json)
Common Error Patterns
Token Mismatch Errors
Error Pattern: Unexpected token X while expecting start token Y
Example: Unexpected token 12606 while expecting start token 200006
Meaning:
- vLLM expects special Harmony format control tokens
- Model is generating regular text tokens instead
- Token 12606 = "comment" (indicates model generating reasoning text instead of tool calls)
Known Issues:
- vLLM #22519: gpt-oss-20b tool_call token errors
- vLLM #22515: Same error, fixed by updating generation_config.json
Fixes:
- Update model files from HuggingFace (see reference/model-updates.md)
- Verify vLLM server flags for tool calling
- Check generation_config.json EOS tokens
Tool Calling Not Working
Symptoms:
- Model describes tools in text but doesn't call them
- Empty
tool_calls=[]arrays - Tool responses in wrong format
Root Causes:
- Missing vLLM server flags
- Outdated model configuration files
- Configuration mismatch between client and server
Configuration Requirements:
vLLM server must be started with:
--tool-call-parser openai --enable-auto-tool-choice
For demo tool server:
--tool-server demo
For MCP tool servers:
--tool-server ip-1:port-1,ip-2:port-2
Important: Only tool_choice='auto' is supported.
Investigation Workflow
-
Identify the error message:
- Copy the exact error text
- Note any token IDs mentioned
-
Search vLLM GitHub:
- Use error text in issue search
- Include "gpt-oss" and model size (20b/120b)
- Check both open and closed issues
-
Check model configuration:
- Verify generation_config.json is current
- Compare against latest HuggingFace version
- Look for recent commits that updated config
-
Review server configuration:
- Check vLLM startup flags
- Verify tool-call-parser settings
- Confirm vLLM version compatibility
-
Check vLLM version:
- Many tool calling issues resolved in recent vLLM releases
- Update to latest version if encountering errors
- Check vLLM changelog for gpt-oss-specific fixes
Quick Reference
Key Resources
- vLLM gpt-oss recipe: https://docs.vllm.ai/projects/recipes/en/latest/OpenAI/GPT-OSS.html
- Common issues: See reference/known-issues.md
- Model update procedure: See reference/model-updates.md
Diagnostic Commands
Check vLLM server health:
curl http://localhost:8000/health
List available models:
curl http://localhost:8000/v1/models
Check vLLM version:
pip show vllm
Progressive Disclosure
For detailed information:
- Known GitHub issues: See reference/known-issues.md
- Model file updates: See reference/model-updates.md
- Tool calling configuration: See reference/tool-calling-setup.md
Validation Steps
After implementing fixes:
- Test simple tool calling with single function
- Verify Harmony format tokens in responses
- Check for token mismatch errors in logs
- Test multi-turn conversations with tools
- Monitor for "unexpected token" errors
If errors persist:
- Update vLLM to latest version
- Check vLLM GitHub for recent fixes and PRs
- Try different model variant (120b vs 20b)
- Review vLLM logs for additional error context