model-architecture-diagram
Model Architecture Diagram
Workflow
Return only public original diagrams indexed by this skill.
- Run the bundled resolver:
python3 skills/model-architecture-diagram/scripts/model_architecture_diagram.py "<model name>"
- If the resolver returns
kind: existing, return the raw image Markdown it prints and preserve the source attribution line. - If the resolver returns
kind: no_match, tell the user that no public original architecture diagram is indexed for that model.
Source Priority
Use references/diagram-index.json as the source of truth. It stores raw GitHub image URLs from:
datawhalechina/self-llmCalvinXKY/InfraTechTongyi-MAI/Z-ImageWan-Video/Wan2.1Wan-Video/Wan2.2Tencent-Hunyuan/HunyuanVideoTencent-Hunyuan/Hunyuan3D-2brayevalerien/Flux.1-Architecture-Diagram
Prefer detailed implementation, cookbook, or architecture-card diagrams over paper figures. Good sources show module boundaries, dataflow, MoE / attention / cache paths, or model-specific runtime structure rather than only a high-level paper overview. Official repository diagrams and curated implementation diagrams are first choice; paper figures are fallback only when no more detailed public original diagram is indexed.
Do not copy remote image binaries into the skill. Return the raw GitHub URLs so the chat renderer can display the original image.
Existing Diagram Rule
For a direct match, show the original image. Good direct matches include:
- DeepSeek V3/V3.2/V4, GLM-5, Kimi K2/K2.5, MiniMax M2.5, Qwen3.5, Qwen3-VL, and Step 3.5 Flash from InfraTech.
- Hunyuan-A13B, Kimi-VL, Qwen3, Qwen3-VL detail flows, MiniMax M2, and Llama 4 architecture/module diagrams from self-llm.
- Z-Image, Wan2.1, Wan2.2, HunyuanVideo, Hunyuan3D 2.0, and FLUX.1 diffusion architecture/module diagrams from public GitHub sources.
If multiple diagrams match, show all high-confidence matches up to the resolver's default limit. For example, DeepSeek V3 may return the full architecture plus MLA MHA/MQA diagrams.
Hosted Original Diagram Gallery
Do not commit the sgl-cookbook-model-architecture-images/ gallery into the repository. The public-original image set is hosted as a GitHub Release asset and indexed by a GitHub issue.
Current hosted artifact:
- Issue index: https://github.com/BBuf/AI-Infra-Auto-Driven-SKILLS/issues/31
- Release page: https://github.com/BBuf/AI-Infra-Auto-Driven-SKILLS/releases/tag/sgl-cookbook-architecture-images-2026-05-02
- Zip download: https://github.com/BBuf/AI-Infra-Auto-Driven-SKILLS/releases/download/sgl-cookbook-architecture-images-2026-05-02/sgl-cookbook-model-architecture-images-2026-05-02.zip
- Digest:
sha256:ea432081849a250429d3d1ecf246e267c5cc42f989aaf4b9ca695b581e7fa50f
The artifact contains 44 public original diagram image files from the indexed upstream repositories, plus a lightweight index.html, index.md, manifest.json, HTML contact sheet, and architecture-audit.md.
To inspect the gallery locally:
curl -L -o /tmp/sgl-cookbook-model-architecture-images-2026-05-02.zip \
https://github.com/BBuf/AI-Infra-Auto-Driven-SKILLS/releases/download/sgl-cookbook-architecture-images-2026-05-02/sgl-cookbook-model-architecture-images-2026-05-02.zip
unzip -q /tmp/sgl-cookbook-model-architecture-images-2026-05-02.zip -d /tmp
open /tmp/sgl-cookbook-model-architecture-images/index.html
Useful Commands
List known original diagram aliases:
python3 skills/model-architecture-diagram/scripts/model_architecture_diagram.py --list-known
Emit JSON for automation:
python3 skills/model-architecture-diagram/scripts/model_architecture_diagram.py "GLM-5" --format json
References
references/diagram-index.json: original diagram link index and aliases.references/source-notes.md: audited source repositories and local cache paths.
More from bbuf/sglang-auto-driven-skills
h100
SSH into host `h100_sglang`, enter Docker container `sglang_bbuf`, work in `/sgl-workspace/sglang`, and use the ready H100 remote environment for SGLang development and validation. Use when a task needs remote CUDA work, GPU-backed smoke tests, diffusion checks, or a safe remote copy instead of local-only execution.
25h100-sglang-diffusion
SSH into host `h100_sglang`, enter Docker container `sglang_bbuf`, work in `/data/bbuf/repos/sglang`, and use the ready H100 remote environment for SGLang **diffusion** development and validation. Use when a task needs diffusion model smoke tests, Triton/CUDA kernel validation, torch.compile diffusion checks, or a safe remote copy for diffusion-specific SGLang changes.
24sglang-prod-incident-triage
Replay-first debug flow for SGLang serving problems. Use when a live or recent server shows health-check failures, latency or throughput regressions, queue growth, timeouts, distributed stalls, crash dumps, wrong outputs after deploys, or PD/EP/HiCache issues, and the job is to turn the problem into a replay plus the right next debug tool.
24sglang-minimax-m2-series-optimization
PR-backed and current-main optimization manual for the `MiniMaxAI/MiniMax-M2` series, including M2, M2.1, M2.5, M2.7, and M2.7-highspeed. Use when Codex needs to recover, extend, or audit MiniMax-specific optimizations, TP QK norm/all-reduce behavior, parser contracts, distributed runtime behavior, quantized loading, or backend-specific validation.
15sglang-torch-profiler-analysis
Compact SGLang torch-profiler triage skill. Use when Codex should inspect an existing `trace.json(.gz)` or profile directory, trigger `sglang.profiler` against a live server, and return one compact report with kernel, overlap-opportunity, and fuse-pattern tables. Single-trace triage is enough for quick diagnosis; mapping+formal two-trace triage gives stronger overlap conclusions.
15sglang-kimi-k2-k25-optimization
PR-backed and current-main optimization manual for `moonshotai/Kimi-K2*` and `moonshotai/Kimi-K2.5*` in SGLang. Use when Codex needs to recover, extend, or audit Kimi optimizations, including K2 router/MoE fast paths, K2 thinking Marlin paths, K2.5 wrapper/multimodal/runtime plumbing, W4AFP8/W4A16 quant tracks, parser contracts, LoRA coverage, and backend-specific validation.
15