kernelgen-flagos
kernelgen-flagos — Unified GPU Operator Generation Skill
This is a unified entry point that bundles generation and optimization sub-skills into one:
| Sub-skill file | Purpose |
|---|---|
| Generation | |
kernelgen-generate.md |
Generate GPU kernels for any Python/Triton repository |
kernelgen-generate-for-flaggems.md |
Specialized generation for FlagGems repositories |
kernelgen-generate-for-vllm.md |
Specialized generation for vLLM repositories |
| Optimization | |
kernelgen-optimize.md |
Optimize existing Triton kernels via MCP iterative optimization (general purpose) |
kernelgen-optimize-for-flaggems.md |
Optimize Triton operators and integrate into FlagGems (3 modes: built-in/external/experimental) |
kernelgen-optimize-for-vllm.md |
Optimize Triton operators and integrate into vLLM (with CustomOp registration) |
| Platform Specialization | |
kernelgen-specialize.md |
Specialize Triton operators to target platforms (e.g., GPU → Ascend NPU) via MCP specialize_kernel |
kernelgen-specialize-for-flaggems.md |
Platform specialization + FlagGems integration (4 modes: vendor-ops/vendor-fused/override-builtin/experimental) |
| MCP Configuration | |
kernelgen-mcp-setup.md |
Check and auto-configure the kernelgen-server MCP service (URL built-in, user only provides Token) |
| Feedback | |
kernelgen-submit-feedback.md |
Submit bug reports and feedback via GitHub or email |
All sub-skill files are located in the same directory as this SKILL.md file.
Routing Protocol — Follow This BEFORE Doing Anything Else
Phase 0: MCP Configuration Check
Before anything else, ensure the kernelgen-server MCP service is configured and ready.
Use the Glob tool to find kernelgen-mcp-setup.md in this skill's directory:
Glob: **/skills/kernelgen-flagos/kernelgen-mcp-setup.md
Then use the Read tool to read the matched file and follow its instructions exactly.
- If MCP is already configured → proceed to Phase 1.
- If MCP is not configured → the setup skill will guide the user through configuration. Once configuration is written and the user is prompted to restart, stop here — do not continue to Phase 1.
Phase 1: Detect Repository Type
Use the Glob tool to check for project identity files in the current working directory:
Glob: pyproject.toml
Glob: setup.py
Glob: setup.cfg
Then use the Read tool to read whichever file exists. Determine the project name from
the file contents (e.g., name = "flag_gems" in pyproject.toml, or name='vllm' in setup.py).
Also use the Glob tool to check for characteristic directory structures:
FlagGems indicators (match ANY):
src/flag_gems/directory exists- Project name is
flag_gemsorflag-gemsorFlagGems import flag_gemsappears in test files
vLLM indicators (match ANY):
vllm/directory exists at the repo root (withvllm/__init__.py)- Project name is
vllm csrc/directory exists alongsidevllm/
Phase 2: Dispatch to Sub-skill
Based on the detection result, use the Read tool to read the appropriate sub-skill file from this skill's directory, then follow the instructions in that file exactly.
To locate the sub-skill files: They are in the same directory as this SKILL.md. Use the Glob tool to find the path:
Glob: **/skills/kernelgen-flagos/kernelgen-generate.md
Then use the Read tool to read the matched path.
Decision Table
Generation requests (user wants to create/generate a new operator):
| Detection Result | Action |
|---|---|
| FlagGems repository detected | Read kernelgen-generate-for-flaggems.md and follow it |
| vLLM repository detected | Read kernelgen-generate-for-vllm.md and follow it |
| Neither detected (or unknown) | Read kernelgen-generate.md and follow it |
Optimization requests (user wants to optimize an existing operator, mentions "optimize", "speedup", "improve performance"):
| Detection Result | Action |
|---|---|
| FlagGems repository detected | Read kernelgen-optimize-for-flaggems.md and follow it |
| vLLM repository detected | Read kernelgen-optimize-for-vllm.md and follow it |
| Neither detected (or unknown) | Read kernelgen-optimize.md and follow it |
Specialization requests (user wants to migrate/specialize an operator to a different platform, mentions "specialize", "migrate to Ascend/NPU", "platform migration"):
| Detection Result | Action |
|---|---|
| FlagGems repository detected | Read kernelgen-specialize-for-flaggems.md and follow it |
| Neither detected (or unknown) | Read kernelgen-specialize.md and follow it |
Feedback requests:
| Detection Result | Action |
|---|---|
| User reports a bug or requests feedback submission | Read kernelgen-submit-feedback.md and follow it |
Important rules:
- Always detect first, dispatch second. Never skip detection.
- Read the entire sub-skill file before starting execution — do not partially read it.
- Follow the sub-skill instructions exactly as if they were the main SKILL.md. All steps, rules, and protocols in the sub-skill apply fully.
- Do not mix sub-skills. Once you dispatch to a sub-skill, follow it to completion.
- If the user explicitly requests a specific sub-skill (e.g., "use the FlagGems version"), honor that request regardless of auto-detection results.
- CRITICAL — MCP is mandatory: ALL operator code generation MUST go through the
mcp__kernelgen-mcp__generate_kernelMCP tool. Optimization usesmcp__kernelgen-mcp__optimize_kernel, and platform specialization usesmcp__kernelgen-mcp__specialize_kernel. NEVER generate Triton kernels, PyTorch wrappers, or operator implementations yourself. If MCP is not configured, not reachable, or fails after all retries, STOP and report the issue — do NOT fall back to writing code manually.
Phase 3: Feedback Handling
At any point during the workflow, if the user reports a bug, says something is broken, or asks to submit feedback about the skill:
- Use the Read tool to read
kernelgen-submit-feedback.mdfrom this skill's directory. - Follow the feedback submission workflow described in that file.
- After feedback is submitted, ask the user if they want to continue with the operator generation workflow or stop.
Quick Reference for Users
# === Generation ===
# Generate a kernel operator (auto-detects repo type)
/kernelgen-flagos relu
# Generate with explicit function type
/kernelgen-flagos rms_norm --func-type normalization
# === Optimization ===
# Optimize an existing Triton kernel (auto-detects repo type)
# Just say "optimize the relu kernel" or "improve kernel performance"
# The skill will automatically dispatch to the right optimization sub-skill
# The skill will automatically:
# - Detect if you're in a FlagGems repo → use FlagGems-specific workflow
# - Detect if you're in a vLLM repo → use vLLM-specific workflow
# - Otherwise → use the general-purpose workflow
If you encounter any issues during generation, just say "submit feedback" or "report a bug" and the skill will guide you through the feedback submission process.