code-review
What I do
Complete code review work to ensure code quality. Due to the length of content, you can read sections as needed based on the actual modifications being reviewed.
When to use me
Use this when you need to review code, whether it is code you just completed or directly reviewing target code.
How to use me
- Always read and respond to Part 1 (General Principles) — it applies to all code.
- For module-specific review, read the
AGENTS.mdin the corresponding source directory listed in Part 2. Those files contain non-obvious conventions and traps specific to each subsystem. - Parts 3–7 cover cross-module concerns, testing, high-risk patterns, functions, and standards — refer as needed.
Part 1: General Principles
1.1 Core Invariants
Always focus on the following core invariants during review:
- Data Correctness: Data from any committed transaction must be visible to subsequent queries and not lost; data from uncommitted transactions must not be visible
- Version Consistency: Partition visible version is the sole standard for data visibility; BE must strictly read data not exceeding this version
- Delete Bitmap Consistency (MoW tables): delete bitmap must be strictly aligned with rowset versions; sentinel marks and
TEMP_VERSION-style placeholders must be replaced with actual versions before use - Memory Safety: All significant memory allocations must be tracked by
MemTracker; orphan memory is not allowed - Error Means Failure: Invariant violations should trigger exceptions or non-OK
Status; silent continuation is never allowed
1.2 Review Principles
- Follow Context: Match adjacent code's error handling, interface usage, and lock patterns unless a clearly better approach exists
- Reuse First: Search for existing implementations before adding new ones; ensure good abstraction afterward
- Code Over Docs: When this skill conflicts with actual code, defer to code and note the mismatch
- Performance First: All obviously redundant operations should be optimized away, all obvious performance optimizations must be applied, and obvious anti-patterns must be eliminated.
- Evidence Speaks: All issues with code itself (not memory or environment) must be clearly identified as either having problems or not. For any erroneous situation, if it cannot be confirmed locally, you must provide the specific path or logic where the error occurs. That is, if you believe that if A then B, you must specify a clear scenario where A occurs.
1.3 Critical Checkpoints (Self-Review and Review Priority)
The following checkpoints must be individually confirmed with conclusions during self-review and review. If the code is too long or too complex, you should delve into the code again as needed when analyzing specific issues, especially the entire logic chain where there are doubts:
- What is the goal of the current task? Does the current code accomplish this goal? Is there a test that proves it?
- Is this modification as small, clear, and focused as possible?
- Does the code segment involve concurrency? If yes:
- Which threads introduce concurrency, what are the critical variables? What locks protect them?
- Are operations within locks as lightweight as possible? Are all heavy operations outside locks while maintaining concurrency safety?
- If multiple locks exist, is the locking order consistent? Is there deadlock risk?
- Is there special or non-intuitive lifecycle management including static initialization order? If yes:
- Understand the complete lifecycle and relationships of related variables. Are there circular references? When is each released? Can all lifecycles end normally?
- For cross-TU static/global variables: does the initializer of one static variable depend on another static variable defined in a different translation unit? If yes, the initialization order is undefined by the C++ standard (static initialization order fiasco). Use constexpr, inline variables in the same header, or lazy initialization to fix.
- Are configuration items added?
- Should the configuration allow dynamic changes, and does it actually allow them? If yes:
- Can affected processes detect the change promptly without restart?
- Should the configuration allow dynamic changes, and does it actually allow them? If yes:
- Does it involve incompatible changes like function symbols or storage formats? If yes:
- Is compatibility code added? Can it correctly handle requests during rolling upgrades?
- Are there functionally parallel code paths to the modified one? If yes:
- Should this modification be applied to other paths, and has it been?
- Are there special conditional checks? If yes:
- Does the condition have clear comments explaining why this check is necessary, simplest, and most viable?
- Are there similar paths with similar issues?
- What is the test coverage?
- Are end-to-end functional tests comprehensive?
- Are there comprehensive negative test cases from various angles?
- For complex features, are key functions sufficiently modular with unit tests?
- Does the feature need increased observability? If yes:
- When bugs occur, are existing logs and VLOGs sufficient to investigate key issues?
- Are INFO logs lightweight enough?
- Are Metrics added for critical paths? Referencing similar features, are all necessary observability values covered?
- Does it involve transaction and persistence-related modifications? If yes:
- Are all EditLog reads and writes correctly covered? Is all information requiring persistent storage persisted? Does it follow our standard approach?
- In transaction processing logic, can master failover at any time point be handled correctly?
- Does it involve data writes and modifications? If yes:
- Are transactionality and atomicity guaranteed? Are there issues with concurrency?
- Would FE or BE crashes cause leaks, deadlocks, or incorrect data?
- Are new variables added that need to be passed between FE and BE? If yes:
- Is variable passing modified in all paths? For example, various scattered thrift sending points like constant folding, point queries.
- Analyze deeply from all angles (including but not limited to time complexity, anti-patterns, CPU and memory friendliness, redundant calculations, etc.) based on the code context and available information. Are there any performance issues or optimization opportunities?
- Based on your understanding, are there any other issues with the current modification?
After checking all the above items with code. Use the remaining parts of this skill as needed. The following content does not need individual responses; just check during the review process.
1.3.1 Concurrency and Thread Safety (Highest Priority)
- Thread context: Does every new thread or bthread entry attach the right memory-tracking context? (See
be/src/runtime/AGENTS.md) - Lock protection: Is shared data accessed under the correct lock and mode?
- Lock order: Do nested acquisitions follow the module's documented lock order? (See storage, schema-change, cloud AGENTS.md)
- Atomic memory order:
relaxedfor counters only; at leastacquire/releasefor signals and lifecycle flags
1.3.2 Error Handling (Highest Priority)
- Status checking: Every
Statusmust be checked; silent discard is prohibited - Exception propagation: Is there a catch-and-convert boundary above every
THROW_IF_ERROR? (Seebe/src/common/AGENTS.md) - Error context: Do error messages include table, partition, tablet, or transaction identifiers?
- Assertions:
DORIS_CHECKfor invariants only, never speculative defensive checks
1.3.3 Memory Safety (High Priority)
- Reservation:
try_reserve()before large allocations, with guaranteed release on every exit path - Ownership: See
be/src/runtime/AGENTS.mdfor cache-handle, shared_ptr-cycle, and factory-creator rules
1.3.4 Data Correctness (High Priority)
- Version consistency: Does data reading stay at or below the visible version?
- MoW correctness: See
be/src/storage/AGENTS.mdfor delete-bitmap sentinel replacement, MoW enablement, and serialization rules - EditLog: Are metadata changes logged and replayed equivalently? (See
fe/.../persist/AGENTS.md)
1.3.5 Observability (Medium Priority)
- Are critical new paths observable with appropriate log levels and metrics?
- Do distributed operations include enough identifiers for tracing?
1.4 Test Coverage Principles
- All kernel features must have tests; prioritize regression tests, add unit tests where practical
.outfiles must be auto-generated, never handwritten- Use
order_qt_or explicitORDER BYfor deterministic output - Expected errors:
test { sql "..."; exception "..." } - Drop tables before use, not after, to preserve debug state
- Hardcode table names in simple tests instead of
def tableName
Part 2: Module-Specific Review Guides (AGENTS.md)
Detailed module review guides live in AGENTS.md files in each source directory. Read the relevant file when reviewing module-specific code.
BE Module
| Directory | Coverage |
|---|---|
be/src/common/AGENTS.md |
Error handling and compile_check |
be/src/cloud/AGENTS.md |
Cloud RPC and CloudTablet sync rules |
be/src/runtime/AGENTS.md |
MemTracker, cache lifecycle, SIOF, workload-group memory tracking |
be/src/core/AGENTS.md |
COW columns, type metadata, serialization compatibility |
be/src/exec/AGENTS.md |
Pipeline execution, dependency concurrency, spill and memory pressure |
be/src/storage/AGENTS.md |
Tablet locks, rowset lifecycle, MoW delete bitmap, segment writing |
be/src/storage/index/inverted/AGENTS.md |
Inverted-index lifetime, CLucene boundary, three-valued bitmap |
be/src/storage/schema_change/AGENTS.md |
Schema-change strategy, lock order, MoW catch-up flow |
be/src/io/AGENTS.md |
Filesystem async path and remote reader caching |
FE Module
| Directory | Coverage |
|---|---|
fe/fe-core/AGENTS.md |
FE locking, exception model, visible-version semantics |
fe/fe-core/src/main/java/org/apache/doris/persist/AGENTS.md |
EditLog write/replay path and transaction persistence modes |
fe/fe-core/src/main/java/org/apache/doris/nereids/AGENTS.md |
Rule placement, mark-join constraints, property derivation |
fe/fe-core/src/main/java/org/apache/doris/transaction/AGENTS.md |
Transaction lifecycle, publish semantics, cloud manager usage |
fe/fe-core/src/main/java/org/apache/doris/datasource/AGENTS.md |
External catalog initialization and reset hazards |
fe/fe-core/src/main/java/org/apache/doris/backup/AGENTS.md |
Backup/restore log timing and replay ordering |
Cloud Module
| Directory | Coverage |
|---|---|
cloud/src/meta-store/AGENTS.md |
TxnKv, key encoding, versioned values, versionstamp helpers |
cloud/src/meta-service/AGENTS.md |
Cloud transaction commit paths, RPC retry contract, Cloud MoW contract |
cloud/src/recycler/AGENTS.md |
Recycler safety, two-phase delete, packed-file ordering |
Part 3: Cross-Module Concerns
3.1 FE-BE Protocol Compatibility
- Do new
TPlanNodeTypevalues have matching BE handling? - Are all FE-to-BE send paths updated when new transmitted variables are added?
- Is mixed-version compatibility preserved for serialization or function metadata changes?
3.2 Cloud Mode vs Shared-Nothing Mode
- Does the change handle both cloud and non-cloud paths where both exist?
- In Cloud mode, is centrally managed partition version still the visibility source of truth?
- Are prepare/commit rowset protocols respected for Cloud writes?
3.3 Merge-on-Write Unique Key Tables
- Does the write path probe or deduplicate against the correct rowset set?
- Is publish-side delete bitmap work correctly serialized?
- Is query-side delete bitmap usage version-aligned?
- Do compaction and schema-change paths preserve MoW-specific correctness steps?
3.4 Performance
- Are hot paths free of redundant copies, repeated scans, or avoidable allocations?
- Do cloud metadata loops avoid pathological retry or scan behavior?
Part 4: Testing Review Guide
4.1 Regression Tests
-
order_qt_or explicitORDER BYused? - Expected errors use
test { sql ...; exception ... }? - Tables dropped before use, not after?
- Table names hardcoded, not
def tableName? -
.outfiles generated, not handwritten?
4.2 BE Unit Tests
-
TEST/TEST_Fused appropriately, independent of execution order? - Memory or resource leaks covered where relevant?
4.3 FE Unit Tests
- JUnit 5 used consistently with descriptive test names?
- Concurrency tests use latches or barriers, not timing guesses?
Part 5: Known High-Risk Patterns Quick Reference
These are cross-module patterns that cause silent or hard-to-diagnose failures. Module-specific traps are in the corresponding AGENTS.md files.
| Pattern | Risk | Why it's dangerous |
|---|---|---|
Ignored Status |
Critical | Silent failure propagation; invariant violations go undetected |
THROW_IF_ERROR in Defer or destructors |
Critical | Terminates during stack unwinding |
THROW_IF_ERROR without catch-and-convert boundary |
High | Doris exception escapes its intended scope |
| Cross-TU static initializer dependency | High | Initialization order undefined; crashes or wrong values at startup |
Part 6: Function System Review Guide
6.1 FE-BE Consistency
- FE and BE compute return types independently — do they agree?
- Function names lowercase and consistent on both sides?
- Variadic lookup depends on argument family names — do FE families match BE registration?
- Aliases registered on both FE and BE?
- Aggregate intermediate types match (critical for shuffle serialization)?
6.2 Null Handling
- If
use_default_implementation_for_nulls()is disabled, is the null map fully handled manually? - Do FE nullable semantics match BE implementation?
- Is
need_replace_null_data_to_default()set correctly for garbage-payload null positions? - Is nested
ColumnConst(ColumnNullable(...))handled when default null handling is disabled?
6.3 BE Registration
-
IFunctionsubclasses carry no member state? - Order-sensitive combinators like
_foreachplaced correctly in registration? - Aggregate Nullable/NotNullable tags match semantic result type?
-
ColumnStringoffset maintenance exact (off-by-one corrupts later rows silently)?
Part 7: Development Standards Supplement
7.1 Configuration
- New
config::xxxhas comments, defaults, and standard naming? - Dynamic configs use the mutable form and are observed without restart?
- Cloud-visible behavior changes: does the meta-service config need updating too?
7.2 Metrics
- New metric names follow existing module naming patterns?
7.3 bthread Safety
- No long blocking under
std::mutexon bthread paths? -
bthread_usleepused instead of thread-level sleep in coroutine code?
7.4 Error Messages and Comments
- User-visible errors include identifiers needed for debugging?
- Lock-protected members and non-obvious TODOs have precise local comments?
Finally
Item-by-item responses are required only for Part 1.3. Parts 2–7 are supporting material for finding bugs, stale assumptions, and missing coverage.