paddle-pir-cinn
PIR & CINN 编译器
PIR (Paddle Intermediate Representation) 是 Paddle 的新一代中间表示,采用 MLIR 风格的 SSA 设计;CINN 是基于 PIR 的算子编译器,将高层算子编译为高性能 CUDA Kernel。
PIR 核心概念速查
| 概念 | 关键类 | 说明 |
|---|---|---|
| Type | TypeID / AbstractType / TypeStorage / Type |
类型系统:TypeID 用 static 变量地址做唯一标识,Type 本质是指向 TypeStorage 的指针,相等性通过指针比较 |
| Value | ValueImpl / OpResultImpl / OpOperandImpl |
SSA 值系统:OpResult 是算子输出(inline 0-5 / out-of-line),OpOperand 通过侵入式双向链表管理 use-chain |
| Operation | Operation (连续内存布局) |
核心执行单元:`[OutOfLineResults |
| Block/Region | Block / Region |
Block 持有 Operation 列表 + BlockArgument + terminator;Region 是 Block 的容器,约束 Value 作用域 |
| Dialect | BuiltinDialect / PaddleDialect / CinnDialect |
模块化容器:聚合一组 Type、Attribute、Op 定义,支持独立注册与扩展 |
CINN 4 阶段编译流水线
PIR Program (pd_op.*)
│
▼ Stage 1: Frontend
├── PdOp2CinnOpConverter (算子映射)
├── add_broadcast_to_elementwise_pass
└── build_cinn_pass → cinn_op.group (按 OpPatternKind 融合)
│
▼ Stage 2: Lowering (Backend)
├── PirCompiler → OpLower
│ ├── Compute: pe::Relu → lang::Relu → ComputeOp::Make → ir::Tensor
│ ├── Schedule: Op-level + Group-level (LoopAlignment/Inline/Reduce/Fusion/BindCuda)
│ └── LowerToAstVec → LoweredFunc
│
▼ Stage 3: CodeGen
├── ir::Module → CodeGenCUDA_Dev → CUDA source
└── nvrtc::Compiler → CUDAModule → CUfunction
│
▼ Stage 4: Execution
└── JitKernelOp (CINNKernelInfo: fn_ptr + int_args_map)
└── CinnJitInstruction → launch kernel
OpPatternKind 融合优先级:kElementWise < kBroadcast < kInjective < kReduction < kOutFusible < kNonFusible
什么场景看什么文件
| 场景 | 参考文档 |
|---|---|
| 理解 PIR 类型系统、Dialect、Trait/Interface 设计 | references/pir-basics.md |
| 理解 Program/Value/Operation 内存结构、ProgramTranslator | references/pir-program.md |
| 追踪 CINN 从 GroupOp 到 CUDA Kernel 的完整编译流程 | references/cinn-pipeline.md |
| 理解 PIR 控制流 (IfOp/WhileOp)、反向 Stack 机制 | references/control-flow.md |
源码入口 (L3)
- PIR 核心:
paddle/pir/include/pir/core/—type.h,value.h,operation.h,block.h,program.h - Dialect 定义:
paddle/fluid/pir/dialect/operator/ir/—pd_op.h,cinn_op.h - CINN 前端:
paddle/cinn/hlir/dialect/operator/transforms/—build_cinn_pass.cc,pd_to_cinn_pass.cc - CINN 后端:
paddle/cinn/hlir/framework/pir/—op_lowering_impl.cc,compilation_task.cc - CodeGen:
paddle/cinn/backends/—codegen_cuda_dev.cc,nvrtc/nvrtc_util.cc - 控制流:
paddle/fluid/pir/dialect/operator/ir/control_flow_op.cc
More from pfcclab/paddle-skills
paddle-pull-request
|
30paddle-debug
专注于在 Paddle 代码库中定位问题并输出高质量调试报告的调试流程与技巧;代码修复是在结论充分后的后续步骤。遇到 Paddle 框架、算子、训练脚本或分布式训练相关问题时,优先使用本 skill 规划调试与报告输出。
24fastdeploy-pull-request
|
20paddle-phi-kernel
Use when working with Paddle's PHI kernel system: registering new kernels, debugging kernel selection/dispatch, understanding code auto-generation from YAML, or implementing operator decomposition via the combination mechanism.
5paddle-eager-graph
Use when navigating Paddle eager-mode (dynamic graph) source code, tracing forward/backward execution, debugging autograd issues, understanding PyLayer, or investigating complex-valued gradient computation. Covers Python API to C++ kernel call chain, backward graph topology sort, and inplace version tracking.
5paddle-static-graph
Use when working with Paddle's static graph mode: understanding Program/Block/Op/Var data structures, tracing the executor lifecycle from graph construction to scheduling, debugging InterpreterCore issues, or analyzing operator dependency and variable lifetime management.
4