code-review
What I do
Complete code review work to ensure code quality. Due to the length of content, you can read sections as needed based on the actual modifications being reviewed.
When to use me
Use this when you need to review code, whether it's code you just completed or directly reviewing target code.
Part 1: General Principles
1.1 Core Invariants
Always focus on the following core invariants during review:
- Data Correctness: Data from any committed transaction must be visible to subsequent queries and not lost; data from uncommitted transactions must not be visible
- Version Consistency: Partition visible version is the sole standard for data visibility; BE must strictly read data not exceeding this version
- Delete Bitmap Consistency (MoW tables): delete bitmap must be strictly aligned with rowset versions; sentinel mark / TEMP_VERSION must be replaced with actual versions before use
- Memory Safety: All significant memory allocations must be tracked by MemTracker; orphan memory is not allowed
- Error Means Failure: Invariant violations should trigger exceptions or non-OK Status; silent continuation is never allowed
1.2 Review Principles
- Defensive Programming Prohibited: Do not use
if(valid)style defensive checks to mask logic errors. If logically A=true should imply B=true, writeif(A) { DORIS_CHECK(B); ... }, neverif(A && B) - Follow Context Conventions: When adding code, strictly follow patterns in adjacent code—same error handling, same interface call order, same lock acquisition patterns, unless a clearly correct and more reasonable approach is identified.
- Prioritize Reuse: Before adding new functionality, search for similar implementations that can be reused or extended; after adding, ensure good abstraction for shared code.
- Code First: When this SKILL conflicts with actual code behavior, defer to your understanding of actual code behavior and explain
- Performance First: All obviously redundant operations should be optimized away, all obvious performance optimizations must be applied, and obvious anti-patterns must be eliminated.
1.3 Critical Checkpoints (Self-Review and Review Priority)
The following checkpoints must be individually confirmed with conclusions during self-review and review:
- What is the goal of the current task? Does the current code accomplish this goal? Is there a test that proves it?
- Is this modification as small, clear, and focused as possible?
- Does the code segment involve concurrency? If yes:
- Which threads introduce concurrency, what are the critical variables? What locks protect them?
- Are operations within locks as lightweight as possible? Are all heavy operations outside locks while maintaining concurrency safety?
- If multiple locks exist, is the locking order consistent? Is there deadlock risk?
- Is there special or non-intuitive lifecycle management? If yes:
- Understand the complete lifecycle and relationships of related variables. Are there circular references? When is each released? Can all lifecycles end normally?
- Are configuration items added?
- Should the configuration allow dynamic changes, and does it actually allow them? If yes:
- Can affected processes detect the change promptly without restart?
- Should the configuration allow dynamic changes, and does it actually allow them? If yes:
- Does it involve incompatible changes like function symbols or storage formats? If yes:
- Is compatibility code added? Can it correctly handle requests during rolling upgrades?
- Are there functionally parallel code paths to the modified one? If yes:
- Should this modification be applied to other paths, and has it been?
- Are there special conditional checks? If yes:
- Does the condition have clear comments explaining why this check is necessary, simplest, and most viable?
- Are there similar paths with similar issues?
- What is the test coverage?
- Are end-to-end functional tests comprehensive?
- Are there comprehensive negative test cases from various angles?
- For complex features, are key functions sufficiently modular with unit tests?
- Does the feature need increased observability? If yes:
- When bugs occur, are existing logs and VLOGs sufficient to investigate key issues?
- Are INFO logs lightweight enough?
- Are Metrics added for critical paths? Referencing similar features, are all necessary observability values covered?
- Does it involve transaction and persistence-related modifications? If yes:
- Are all EditLog reads and writes correctly covered? Is all information requiring persistent storage persisted? Does it follow our standard approach?
- In transaction processing logic, can master failover at any time point be handled correctly?
- Does it involve data writes and modifications? If yes:
- Are transactionality and atomicity guaranteed? Are there issues with concurrency?
- Would FE or BE crashes cause leaks, deadlocks, or incorrect data?
- Are new variables added that need to be passed between FE and BE? If yes:
- Is variable passing modified in all paths? For example, various scattered thrift sending points like constant folding, point queries.
- Analyze deeply from all angles (including but not limited to time complexity, anti-patterns, CPU and memory friendliness, redundant calculations, etc.) based on the code context and available information. Are there any performance issues or optimization opportunities?
- Based on your understanding, are there any other issues with the current modification?
After checking all the above items with code. Use the remaining parts of this skill as needed. The following content does not need individual responses; just check during the review process.
1.3.1 Concurrency and Thread Safety (Highest Priority)
- New thread/bthread entry: Is
SCOPED_ATTACH_TASKused? Omission causes memory to be counted in orphan tracker, debug builds crash directly - Lock protection scope: Is shared data (version map, delete bitmap, transaction state) accessed within correct locks? Is lock type correct (shared vs exclusive)?
- Lock order: Are there nested acquisitions of multiple locks? Must follow known lock order (e.g.,
_txn_lock->_txn_map_lock), otherwise deadlock - Atomic operation memory order: Use
relaxedfor statistics, at leastacquire/releasefor flag/signal - COW column modification: Is exclusive ownership ensured before
mutate()? Isassume_mutable_ref()only used when exclusive ownership is confirmed?
1.3.2 Error Handling (Highest Priority)
- Status checking:
Statusreturn values must be checked,static_cast<void>(...)silent discard is prohibited - Exception propagation chain: Is there
RETURN_IF_CATCH_EXCEPTIONaboveTHROW_IF_ERRORto catch? - Error messages: Are they clear, include context (table/partition/TabletID/transaction ID), provide solution direction?
- DORIS_CHECK usage: Only for invariant assertions (e.g.,
DORIS_CHECK(hash_table != nullptr)), not for defensive programming
1.3.3 Memory Safety (High Priority)
- Large memory reservation: Is
try_reserve()called before allocation? Is thereDEFER_RELEASE_RESERVED()to ensure release? - Cache Handle: Is
cache->release(handle)called after usingCache::Handle*? - shared_ptr circular references: Do scenarios like
Dependencyuse raw pointers orweak_ptrto break cycles? - Object creation: Do classes with
ENABLE_FACTORY_CREATORusecreate_shared()/create_unique()? Rawnewis prohibited
1.3.4 Data Correctness (High Priority)
- Delete Bitmap version: In Cloud mode, is
TEMP_VERSION_COMMONmanually replaced before use? - Version consistency: Does data reading strictly not exceed Partition visible version?
- MoW setting: Is
SegmentWriterOptions::enable_unique_key_merge_on_writecorrectly set totrue? - EditLog persistence: Do metadata modifications call
editLog.logXxx()? Is replay logic strictly equivalent to master?
1.3.5 Observability (Medium Priority)
- Metrics addition: Are
ADD_TIMER/ADD_COUNTERadded to critical paths? Does naming followmodule_xxxconvention? - Log levels: Use
LOG(ERROR)for error logs,LOG(INFO)for critical paths,VLOGorDVLOGfor debug info - Trace information: Do distributed operations (RPC, transactions) include trace ID for tracking?
1.4 Test Coverage Principles
- All kernel features must have corresponding tests
- End-to-end SQL functionality should prioritize adding regression tests under
regression-test/(Groovy DSL) - Also add BE unit tests (
be/test/, Google Test) and FE unit tests (fe/fe-core/src/test/, JUnit 5) where possible. For complex features, must be modular with unit tests. - Regression test result files (
.out) must not be handwritten, must be auto-generated via-genOut - Regression tests must use
order_qt_prefix or manualORDER BYto ensure ordered results - Expected error cases use
test { sql "..."; exception "..." }pattern - Tables should be DROPped before use (not after tests end), to preserve environment for debugging
- Simple tests should hardcode table names directly, not use
def tableName
Part 2: BE Module Review Guide
2.1 Error Handling (Highest Priority)
2.1.1 Status / Exception / Result<T> Relationship
| Type | Usage | Propagation Macro |
|---|---|---|
Status |
Default error return type, [[nodiscard]] |
RETURN_IF_ERROR |
Exception |
Vectorized execution engine hot path, expression evaluation | RETURN_IF_CATCH_EXCEPTION |
Result<T> |
Return value or error (replaces old output parameter pattern) | DORIS_TRY |
Review Checkpoints:
- Are
Statusreturn values checked? Search forstatic_cast<void>(...)pattern—this is the most dangerous anti-pattern, silently swallowing errors without logging. Currently 171+ occurrences in codebase. Never allow this pattern in new code - Where
THROW_IF_ERRORis used, is thereRETURN_IF_CATCH_EXCEPTIONorRETURN_IF_ERROR_OR_CATCH_EXCEPTIONabove in the call chain to catch? If not, exceptions propagate uncaught causing crashes. Current version has related catch in upper layers when Pipeline executes specific operators. -
THROW_IF_ERRORprohibited in Defer/RAII destructor lambdas: Throwing exceptions during stack unwinding triggersstd::terminate. Known issues:pipeline_task.cpp:412-413, 638-639. UseWARN_IF_ERRORorstatic_cast<void>+ logging in Defer -
THROW_IF_ERRORprohibited in Thrift RPC handlers without try/catch protection: Thrift framework doesn't catch Doris exceptions, causing connection drops. Known issues:backend_service.cpp:1323,1326. RPC handlers should useRETURN_IF_ERRORpattern or outermost try/catch - Is
WARN_IF_ERRORusage reasonable? Only allowed in destructors, cleanup paths, best-effort operations. Warning message must not be empty string - Is
DORIS_CHECKused for invariant assertions (not defensive programming)? Correct usage example:DORIS_CHECK(hash_table)asserts hash table is initialized - Do new catch blocks correctly convert exceptions? Should follow standard pattern from
internal_service.cpp:348-357: catchdoris::Exceptionfirst, thenstd::exception, finally...
2.1.2 Cloud RPC Error Handling
The retry_rpc template in cloud_meta_mgr.cpp is the standard pattern for RPC error handling in Cloud mode:
- Do new Cloud RPC calls use
retry_rpc? - Does
INVALID_ARGUMENTreturn directly without retry? (Correct behavior) - Does
KV_TXN_CONFLICTuse an independent conflict retry counter? - Do timeout and retry counts use config variables instead of hardcoding?
2.1.3 compile_check Mechanism
compile_check_begin.h / compile_check_end.h elevate -Wconversion to compile error (excluding sign-conversion and float-conversion).
- Do new
.hfiles include pairedcompile_check_begin.h/compile_check_end.hin declaration regions? - Coverage status: Currently only 31.1% (1,049/3,372) of source files use compile_check. Additionally, 218 files contain
compile_check_begin.hbut lack correspondingcompile_check_end.h(unpaired). New files should always include complete pairing - Are there implicit narrowing conversions? Pay special attention to
int64_t -> int32_t,size_t -> int,uint64_t -> int64_t - If bypassing due to third-party code is necessary, are
compile_check_avoid_begin.h/compile_check_avoid_end.hused?
2.2 Memory Management
2.2.1 MemTracker Hierarchy
Doris uses two-level memory tracking:
MemTrackerLimiter: Heavy-weight tracker with limits (QUERY / LOAD / COMPACTION / SCHEMA_CHANGE / METADATA / CACHE types)MemTracker: Lightweight tracker without limits, for fine-grained sub-component tracking
Thread-level tracking is implemented via ThreadMemTrackerMgr, each thread has one active MemTrackerLimiter and a MemTracker stack.
Review Checkpoints:
- Do newly started threads/bthreads use
SCOPED_ATTACH_TASKat entry? Without it, memory is counted in orphan tracker, debug builds trigger assertions - Does temporary tracker switching use
SCOPED_SWITCH_THREAD_MEM_TRACKER_LIMITER(supports nesting)? -
ThreadPool::submit_funccallbacks: Do lambdas submitted to thread pools useSCOPED_ATTACH_TASKinternally? Currently 87+submit_funccalls in codebase, many rely on "indirect attach" (attach happens deep in call chain), which is fragile. Correct approach references gold-standard wrapper pattern fromcalc_delete_bitmap_executor.h:67-83: explicit attach at callback entry. Known problematic files:s3_file_bufferpool.cpp:114,wal_manager.cpp:414,download_action.cpp:166 - Is
try_reserve()called before large memory allocation? Is thereDEFER_RELEASE_RESERVED()to ensure release after allocation? - Does memory hook path (
ThreadMemTrackerMgr::consume) avoid operations that may trigger memory allocation (e.g., LOG, string formatting)?_stop_consumeflag exists to prevent recursion
2.2.2 Object Lifecycle
- Are classes using
ENABLE_FACTORY_CREATORcreated viacreate_shared()/create_unique()? Rawnewis prohibited - Are there
shared_ptrcircular references? Standard ways to break cycles:Dependency::_shared_stateuses raw pointer (non-owning)Dependency::_blocked_taskusesvector<weak_ptr<PipelineTask>>TrackerLimiterGroup::trackersuseslist<weak_ptr<MemTrackerLimiter>>
- Is Rowset's dual reference counting used correctly?
acquire()/release()is manual reference counting (for performance),shared_ptrmanages ownership. Whenrelease()decrements to 0 and in UNLOADING state, triggersdo_close() - Are rowsets in
_unused_rowsetsjudged deletable viause_count() == 1? - Are there suspicious
shared_ptrcircular reference issues? For clear lifecycles, preferunique_ptr+ raw pointer observer pattern.
2.2.3 COW Column Semantics
Vectorized columns (IColumn) use custom intrusive reference-counted Copy-on-Write mechanism:
- Is exclusive ownership ensured when calling
mutate()? Whenuse_count() > 1, triggers deep copy, which is a performance hotspot - Is
assume_mutable_ref()only used when exclusive ownership is confirmed? This method does not check reference count, using it in shared state causes data corruption - Is
set_columns()called to restore afterBlock::mutate_columns()? After calling, original Block's column pointers are null - Can the pointer returned by
convert_to_full_column_if_const()share with original column? ForColumnConstit materializes a new column, but for normal columns returns shared pointer
2.2.4 Cache Lifecycle
- Is
cache->release(handle)called after usingCache::Handle*? Omission causes memory leaks - Do cache values inherit from
LRUCacheValueBase? This base class automatically releases tracking bytes on destruction - Are new cache types registered with
CacheManager? Unregistered caches don't participate in global GC
2.3 Concurrency and Locks
2.3.1 Tablet Lock Hierarchy
The Tablet class has 8+ independent locks, with confirmed usage rules:
| Lock | Type | Rule |
|---|---|---|
_meta_lock |
shared_mutex |
Modifying version map requires EXCLUSIVE; reading requires SHARED |
_rowset_update_lock |
mutex |
Serializes delete bitmap updates during MoW table publish_txn, independent of _meta_lock (avoids blocking read path) |
_base_compaction_lock |
mutex |
Serializes base compaction |
_cumulative_compaction_lock |
mutex |
Serializes cumulative compaction |
_migration_lock |
shared_timed_mutex |
Protects tablet migration |
_cooldown_conf_lock |
shared_mutex |
Protects cooldown configuration |
Review Checkpoints:
- Do operations on
_rs_version_mapor_stale_rs_version_maphold_meta_lock(shared or exclusive)? - Do
TabletMetamodification operations hold the owning Tablet's EXCLUSIVE_meta_lock? - Is there acquisition of other tablet locks while holding
_meta_lock? This may cause deadlock
2.3.2 TxnManager Lock Order
Confirmed consistent lock order: _txn_lock -> _txn_map_lock (applies to commit, publish, delete, rollback).
- Do new TxnManager operations follow this lock order?
2.3.3 Pipeline Dependency Concurrency
Dependency uses double-checked locking + atomic _ready flag + mutex _task_lock:
-
set_ready(): First check_ready(fast path), then lock to set_ready = trueand wake up -
is_blocked_by(): Lock, check_ready, add to wait list if not ready -
Do new
Dependencysubclasses correctly callblock()/set_ready()? -
Do
CountedFinishDependency'sadd()/sub()operate under_mtxprotection?
2.3.4 Atomic Operation Memory Order
Memory order usage patterns in codebase:
-
memory_order_relaxed: Statistics counters (correct) -
memory_order_acquire/release: Lifecycle flags likeExecEnv::_s_ready(correct) -
memory_order_acq_rel: CAS close operations (correct) -
Do new atomic operations use appropriate memory order? Use relaxed for statistics counters, at least acquire/release for flag/signal
-
Does
std::atomic<bool>as stop signal use at least acquire/release? Relaxed may theoretically delay propagation
2.4 Type System and Serialization
2.4.1 Upgrade/Downgrade Compatibility
Upgrade/downgrade compatibility no longer uses be_exec_version for control. Instead, each feature (e.g., functions, compression algorithms, etc.) adds its own option for confirmation. BE uses is_set&&is_true to branch or replace functionality at appropriate nodes.
2.4.2 Decimal Precision Overflow
- Do Decimal operation results check precision upper limit?
decimal_result_type()may promote result to DECIMAL256, causing unexpected memory layout changes - Are DecimalV2's
original_precision/original_scalecorrectly set? Default valueUINT32_MAXmeans "unknown",get_format_scale()returns different value fromscalein this case
2.4.3 Thrift Type Mapping
- Are
TScalarTypeoptional fields (precision,scale,len) set when needed? DECIMAL needs precision/scale, CHAR/VARCHAR needs len, DATETIMEV2 needs scale - Is depth-first flattened representation of
TTypeDesc.typescorrectly traversed? Off-by-one errors are common with nested complex types (ARRAY<MAP<K,V>>)
2.4.4 Block Merge Nullable Handling
MutableBlock::merge_impl() assumes target is Nullable and source is non-Nullable when types mismatch, using C-style cast (DataTypeNullable*) instead of dynamic_cast.
- Does new Block merge logic correctly handle Nullable promotion? DCHECK only protects in debug builds
2.5 Pipeline Execution Engine
2.5.1 Operator Lifecycle
Pipeline Task state machine: INITED -> RUNNABLE -> BLOCKED -> FINISHED -> FINALIZED
- Are
SharedState's source_deps and sink_deps correctly connected?inject_shared_state()is responsible for connection
2.5.2 Memory Reservation and Spill
- Do memory-intensive operations (Hash Join build, aggregation) use
_try_to_reserve_memory()? - Is there
_memory_sufficient_dependencyfor backpressure when memory is insufficient? - Does
revoke_memory()correctly implement spill logic?
2.6 Storage Engine
2.6.1 Rowset Version Management
- Are
add_rowset()andmodify_rowsets()executed under exclusive_meta_lock? - Are version ranges continuous? There should be no version gaps in
_rs_version_map - Does compaction output rowset's version range correctly cover input rowsets?
2.6.2 Delete Bitmap (MoW Tables, Extremely High Priority)
- In Cloud mode, is TEMP_VERSION_COMMON manually replaced for bitmaps obtained from
CloudTxnDeleteBitmapCache? Code comments explicitly warn about this issue (cloud_txn_delete_bitmap_cache.h:72-75). Omission causes bitmap to be applied to wrong version - Is
_rowset_update_lockheld during delete bitmap calculation? Prevents concurrent calculation on same (txn_id, tablet_id) - Does compaction's delete bitmap calculation use latest compaction stats (
ms_base_compaction_cnt, etc.)? Stale stats will miss recently compacted rowsets
2.6.3 Segment Writing
- Is
SegmentWriterOptions::enable_unique_key_merge_on_writecorrectly set for MoW tables? Default value isfalse, omission causes deduplication to be silently skipped - Is performance impact of row-stored columns (
_serialize_block_to_row_column) on wide tables within expectations?
2.7 IO Subsystem
- Do
FileSystemoperations useFILESYSTEM_Mmacro to handle bthread async IO? - Do remote storage (S3/HDFS) reads use
create_cached_file_reader()for file caching?
2.8 Inverted Index Subsystem
2.8.1 Lifecycle and Destruction Order
- Field declaration order is destruction order: In
inverted_index_writer.h:92-95,_dirmust be declared before_index_writer, because_index_writer'swrite.lockis created by_dir.lockFactory, C++ destructs in reverse declaration order. New fields must follow this constraint - Cache handle destruction extends LRU retention:
inverted_index_cache.h:142-152,InvertedIndexCacheHandledestructor setslast_visit_timeto future timestamp (UnixMillis() + config::index_cache_entry_stay_time_after_lookup_s * 1000), making actually queried searchers survive longer in LRU. Be aware of this extension mechanism when modifying cache logic
2.8.2 CLucene Exception Handling
CLucene doesn't use C++ exceptions, inverted_index_common.h:65-111 implements ErrorContext + FINALLY macro system:
-
FINALLY: Catches exceptions and returnsStatuserror -
FINALLY_EXCEPTION: Catches and rethrows -
All three macros use
static_assertto enforce existence oferror_contextlocal variable in scope -
Does new CLucene-related code use
ErrorContext+FINALLYpattern instead of raw try/catch?
2.8.3 Three-Valued Logic Bitmap
InvertedIndexResultBitmap (inverted_index_reader.h:80-214) implements SQL three-valued logic (TRUE/FALSE/NULL), storing separately via _data_bitmap + _null_bitmap:
-
op_notis const but actually modifies data: Modifies pointed object viashared_ptrto bypass const qualifier. Callers should not assume const reference returned results are immutable -
operator|=implements complete SQL three-valued OR semantics (lines 148-157), same foroperator&=andoperator-=. New bitmap operations must follow three-valued logic
2.9 Schema Change Subsystem
2.9.1 Three-Strategy Selection and Lock Order
_get_sc_procedure() (schema_change.h:303-315) selects among three strategies based on change type:
LinkedSchemaChange: Column name/type alias changes only, zero-copyVSchemaChangeDirectly: Direct conversion without sortingVLocalSchemaChangeWithSorting: Full rewrite with sorting
Lock acquisition order (schema_change.cpp:972-976): base_tablet push_lock → new_tablet push_lock → base_tablet header_lock → new_tablet header_lock
- Do schema change lock acquisitions follow this four-level order? Out-of-order causes deadlock
2.9.2 MOW Delete Bitmap Four-Step Flow (Extremely High Priority)
schema_change.cpp:1595-1657 describes critical correctness flow for MOW table schema change:
- Newly imported rowsets during dual-write skip delete bitmap calculation
- After conversion completes, calculate delete bitmap for rowsets during dual-write period (lines 1618-1634)
- Block new publish, calculate delete bitmap for incremental rowsets (lines 1636-1652)
- Switch tablet state to
TABLET_RUNNING(lines 1653-1656)
- When modifying schema change delete bitmap path, are the four-step order and blocking logic complete? Step 3's blocking is key to ensuring consistency
2.9.3 Column Deletion Order Trap
return_columns is calculated before merge_dropped_columns() (schema_change.cpp:959-964 vs line 1062), this is intentional: return_columns only covers non-dropped columns, dropped columns are handled internally by SegmentIterator for delete predicate evaluation.
- When modifying schema change column processing logic, is the timing of
return_columnsand dropped columns calculation understood? -
_check_row_numscan be bypassed byconfig::ignore_schema_change_check(default false) (schema_change.h:155-171), this config should remain off in production
2.10 Workload Group Memory Tracking
WorkloadGroup (workload_group.h) uses dual-lock pattern: _mutex protects name/version/memory limit/resource_ctxs/shutdown state, _task_sched_lock protects scheduler and cgroup pointers.
- Memory tracking inconsistency:
check_mem_used()(lines 115-123) calculates_total_mem_used + _wg_refresh_interval_memory_growthfor watermark check, butexceed_limit()(lines 132-135) only compares_total_mem_used > _memory_limit, ignoring_wg_refresh_interval_memory_growth. This means during refresh interval, workload group may exceed hard limit without being detected byexceed_limit() - Does new memory check logic use the correct method? If precise limit checking is needed, should reference
check_mem_used()approach
Part 3: FE Module Review Guide
3.1 Metadata Persistence (EditLog)
3.1.1 Write Path
All metadata modifications must go through EditLog persistence. Flow:
- Modify in-memory state
- Call
editLog.logXxx()to write log - Log is replicated to Follower via BDBJE
- Caller blocks until persistence confirmation
Review Checkpoints:
- Do new DDL/DML operations call corresponding
editLog.logXxx()methods? Omission causes Follower state inconsistency - Do new OperationTypes have corresponding replay handling in
EditLog.loadJournal()switch? - Is replay logic strictly equivalent to master execution logic? Any difference causes Follower state drift
- Do new serializable objects correctly implement Gson serialization? Are
@SerializedNameannotations complete? - Can Image checkpoint correctly include new metadata?
- Can master failover at any moment guarantee no duplicate or lost data? Short-term task failures are tolerable, but memory/transaction leaks and data inconsistencies are not allowed.
3.1.2 EditLog Batch and Outside-Lock Commit Mode
DatabaseTransactionMgr implements two EditLog commit modes, controlled by Config.enable_txn_log_outside_lock:
- Inside-lock mode (default): Directly calls
editLog.logInsertTransactionState()within transaction lock - Outside-lock mode: Only modifies in-memory state and calls
submitEdit()to submit to queue within lock, waits for persistence completion viaawaitTransactionState()outside lock (DatabaseTransactionMgr.java:383-395)
Fuzzy testing (DorisFE.java:612-615) randomly switches between these two modes, so both paths must be correct.
Review Checkpoints:
- Do modifications involving transaction EditLog consider both inside-lock and outside-lock commit paths?
- In outside-lock mode, are
awaitTransactionState()timeout and exception handling correct? Timeout doesn't mean failure—transaction may still be committing - Do new transaction state modifications complete all in-memory state changes before
submitEdit()submission? After submission, other threads may immediately see new state
3.1.3 EditLog Fatal Error Handling
When EditLog flush fails, FE calls System.exit(-1) to terminate process—this is intentional, preventing split-brain.
- Does new code catch and swallow exceptions in EditLog-related paths? This may cause metadata loss
3.2 Lock Hierarchy and Concurrency
3.2.1 Confirmed Lock Orders
| Scenario | Lock Order |
|---|---|
| ConsistencyChecker | jobs_lock -> synchronized(job) -> db_lock |
| AlterJobV2 | synchronized(job) -> db_lock |
| TabletScheduler | synchronized(this) internal operations, then release, then acquire db_lock (two-step method avoids deadlock) |
| DatabaseTransactionMgr | Its RWLock is leaf lock, should not acquire other locks internally |
| MTMVTask | Acquire table locks after sorting by table ID (avoids deadlock) |
Review Checkpoints:
- Do new lock acquisitions follow the above lock orders?
- Is
tryLock(timeout)used instead of infinite-waitlock()? FE's standard pattern istryLock+ timeout logging - Lock timeout standard values: Env lock = 5s, DB write lock = 100s, Table read lock = 1min. Operations involving multiple tables must lock in ascending table ID order (implemented via
PriorityQueue) to prevent deadlock - Does
Database.tryWriteLock()log current owner thread on timeout? - Are there paths acquiring database lock within
synchronizedblock? This is a known deadlock risk, should use two-step method
3.2.2 ConcurrentModificationException
- Is traversal of shared collections under lock protection?
CatalogRecycleBin.write()addedsynchronizedbecause dump image API concurrent access caused CME - Can there be concurrent modifications during traversal of
Map/List? PreferConcurrentHashMapor snapshot before traversal
3.2.3 Known Deadlock Cases (New Code Must Avoid Similar Patterns)
The following are confirmed deadlock risk points in codebase, be extra vigilant when reviewing similar patterns:
| Location | Pattern | Description |
|---|---|---|
MasterCatalogExecutor.java:70-78 |
Initiating RPC while holding catalog lock | RPC timeout blocks catalog operations for long time |
ExternalDatabase.java:137-158 |
Nested acquisition of db lock + makeSureInitialized() | Comments explicitly mark deadlock avoidance strategy |
HiveMetaStoreCache.java:126-130 |
Acquiring cache lock in thread pool callback | Comment marks "this may cause a deadlock" |
CacheFactory.java:85-92 |
Circular dependency in cache initialization | Avoided via lazy initialization |
- Does new code initiate RPC or IO operations while holding locks? This is the most common source of deadlock and performance issues
- Does new code acquire locks in callbacks (e.g., thread pool, cache loader) that callers may hold?
3.3 Nereids Optimizer
3.3.1 Rule Implementation Specifications
Review Checkpoints:
- Is new Rule's
RuleTypeadded toRuleType.javaenum? - Does Rule correctly handle all Join types? Pay special attention to ASOF variants (
ASOF_LEFT_INNER_JOIN, etc.).COULD_PUSH_THROUGH_LEFT/RIGHTlists inPushDownFilterThroughJoinexplicitly include ASOF types - Are constant predicates (
slots.isEmpty()) correctly handled? Inner Join can push to both sides, Outer Join cannot - Is there infinite loop risk? Can created nodes be matched again by another rule in same batch? Code comments have multiple "separate topDown passes to avoid dead loops" warnings
- Does
OneRewriteRuleFactorypattern match correct wrapper nodes? For example,EliminateJoinByFKonly matchesproject(join)not barejoin - Are predicates checked for
containsUniqueFunction()? Predicates containing unique functions cannot be pushed down
3.3.2 Rewriter Stage Order
Rewriter organizes rules into multi-stage pipeline, key dependencies:
-
PUSH_DOWN_FILTERSis called 5-6 times throughout pipeline, because it interacts with many other transformations -
InferPredicatesis called twice (withPUSH_DOWN_FILTERSin between), because eliminating outer join exposes new inference opportunities -
costBased(...)rules are controlled byrunCboRulesflag -
Is new rewrite rule placed in correct stage? For example, rules needing predicate pushdown should be after
PUSH_DOWN_FILTERS -
Is new exploration rule registered in
RuleSet?
3.3.3 Property Derivation
RequestPropertyDeriver implements top-down property requests:
- Does new Physical Operator have visit method in
RequestPropertyDeriver? - Are multiple alternative property requirements provided (e.g., HashJoin provides both shuffle and broadcast choices)? Single requirement narrows search space
- Is
PhysicalProjectalias penetration logic correct? OnlyAlias(SlotRef)and bareSlotRefcan penetrate
3.4 Transaction Management
3.4.1 Transaction Lifecycle
State machine: PREPARE -> COMMITTED -> VISIBLE (or ABORTED)
- After transaction commit, can
PublishVersionDaemoncorrectly sendPublishVersionTaskto all BEs? - Does publish wait for quorum confirmation?
- Does transaction correctly abort after timeout?
Known race condition—TransactionState.tableIdList: At TransactionState.java:632, TODO comment marks that addTableId() / removeTableId() lack synchronization protection. tableIdList is plain ArrayList, has race condition (CME or data loss) when multiple threads concurrently add/remove table IDs.
- Are operations on
TransactionState.tableIdListunder external lock protection? Or should it be replaced with thread-safe collection?
3.4.2 Cloud Mode Differences
In Cloud mode, transactions are centrally managed by Meta-Service, not FE:
- Do Cloud mode transaction operations use
CloudGlobalTransactionMgrinstead of localGlobalTransactionMgr? - Unsafe downcasting prohibited: Current code has 4 places forcibly casting
GlobalTransactionMgrIfacetoCloudGlobalTransactionMgr(e.g.,InternalCatalog.java:994). These casts cause ClassCastException in non-Cloud mode. New code should add required methods by extendingGlobalTransactionMgrIfaceinterface, not rely on downcasting - Does version retrieval use
getVisibleVersion()(RPC + cache in Cloud mode)? - Is
getVisibleVersion()'sRpcExceptioncorrectly handled? Multiple callers catch and return 0, may cause MV refresh misjudgment
3.5 Exception System
UserException (checked)
├── AnalysisException -- SQL analysis error
├── DdlException -- DDL execution failure
├── MetaNotFoundException -- Metadata lookup failure
├── LoadException -- Load failure
└── ...
NereidsException (unchecked, RuntimeException)
├── nereids/AnalysisException -- Nereids-specific, different from common/
└── nereids/ParseException
- Note Nereids has its own
AnalysisException(nereids/exceptions/AnalysisException.java, extends RuntimeException), which is a different class fromcommon/AnalysisException(checked) - Does
ErrorReport.reportAnalysisException()use correctErrorCode? Custom Doris error codes start from 5000 - Does new RPC error handling use
Statuspattern (TStatusCode/PStatus)?
3.6 OlapTable Version Visibility
-
getVisibleVersion()behaves differently in Cloud vs non-Cloud mode:- Non-Cloud: Returns local
tableAttributes.getVisibleVersion() - Cloud: RPC to meta-service, has cache TTL
- Non-Cloud: Returns local
- Can Cloud mode's version cache TTL (
cloudTableVersionCacheTtlMs) cause reading stale versions? -
VERSION_NOT_FOUNDis forcibly converted to version=1 (PARTITION_INIT_VERSION)—cannot distinguish "never loaded" from "loaded to version 1"
3.7 External Catalog Concurrency Traps
3.7.1 Initialization Race Condition
ExternalCatalog.makeSureInitialized() (ExternalCatalog.java:354-379) is synchronized, but has the following traps:
-
isInitializingguard is dead code inExternalCatalog: Although code checksisInitializing(line 358), it's never set totrue(only set tofalsein finally). Compare withExternalDatabase.makeSureInitialized()(ExternalDatabase.java:137-158) which correctly setsisInitializing = true. This means ExternalCatalog has no reentrancy protection -
lowerCaseToDatabaseName.clear()race window:getFilteredDatabaseNames()(line 504) firstclear()then refillsConcurrentMap, this method is not synchronized. Between clear and refill completion, concurrentget()calls return null
3.7.2 resetToUninitialized() NPE Risk
resetToUninitialized() (ExternalCatalog.java:581-590) sets objectCreated and initialized to false within synchronized block, then calls onClose() (lines 773-784) to set executionAuthenticator and transactionManager to null.
- Concurrent queries that have passed
makeSureInitialized()may be using these fields being set to null. synchronized only protects callers of other synchronized methods, not non-synchronized read paths already in execution
3.8 Backup/Restore State Machine
3.8.1 BackupJob State Machine and EditLog Timing
EditLog write timing in BackupJob state machine is uneven, this is the most important non-obvious convention:
- PENDING→SNAPSHOTING doesn't write EditLog (
BackupJob.java:620-624): After FE restart, state reverts to PENDING, all snapshot tasks are regenerated and resent. Don't add EditLog when modifying this transition—restart recovery relies on this behavior - BarrierLog mechanism:
prepareSnapshotTaskForOlapTableWithoutLock()createsBarrierLogfor each table and writes viaeditLog.logBarrier()(lines 648-657), returnedcommitSeqis stored in properties map, used to ensure snapshot consistency - UPLOAD_INFO stage errors only retry, not cancel (
BackupJob.java:509-514): This is the final stage, code comments explicitly state continuous retry until timeout even when encountering unrecoverable errors. Don't callcancelInternal()in this stage when modifying error handling
3.8.2 RestoreJob state vs showState
RestoreJob has two independent state fields (RestoreJob.java:157,162):
-
state: Serialized persistent actual state, master sets to FINISHED before writing EditLog -
showState: Non-serialized, only for display, master sets to FINISHED after writing EditLog (line 2191) -
This prevents follower from seeing FINISHED state before EditLog replication completes. After deserialization,
gsonPostProcess()syncsshowState = state. Must update both fields when modifying state transitions
Part 4: Cloud Module Review Guide
4.1 FDB Transaction Mode
Cloud module uses FoundationDB (FDB) as the sole metadata storage. All metadata operations must be completed within FDB transactions.
4.1.1 TxnKv Interface Specifications
- Is
TxnErrorCodereturn value from eachTransaction::get()/commit()checked? Marked as[[nodiscard]] - Is
TXN_CONFLICTcorrectly handled?commit_txn_immediatelyreturns error without retry,commit_txn_eventuallyhas lazy committer path - Can transaction exceed FDB's 10MB limit? Large transactions need to use
TxnLazyCommitterfor batched commits - Is
TXN_MAYBE_COMMITTEDcorrectly handled? This indicates commit may have succeeded or failed, requires idempotent retry
4.1.2 Key Encoding Specifications
Three key spaces:
-
0x01: User space (mutable metadata) -
0x02: System space (service registration, encryption keys) -
0x03: Version space (multi-version metadata with versionstamp, for snapshots) -
Do new keys use correct key space prefix?
-
Does key encoding use
encode_bytes()(order-preserving byte encoding) andencode_int64()(big-endian + sign marker)? -
Does compound key field order match query pattern? Range scan relies on key prefix
4.1.3 Commit Path
Two commit paths:
-
commit_txn_immediately: Single FDB transaction completes all operations (read txn info, move tmp_rowset, bump version, update stats) -
commit_txn_eventually: Uses lazy committer for batched commits when transaction is too large -
Are the three config switches for lazy commit (
is_2pc,enable_txn_lazy_commit,config::enable_cloud_txn_lazy_commit) correctly combined? -
Is fallback path from
commit_txn_immediatelytocommit_txn_eventually(triggered byTXN_BYTES_TOO_LARGE) correct?
Known risk—commit_txn infinite loop: The commit_txn entry in meta_service_txn.cpp:3257-3341 uses do { ... } while(true) loop to dispatch commit_txn_immediately and commit_txn_eventually. When pending lazy-committed transactions are found, it retries, but this loop has no independent retry count limit. If lazy committer continuously fails to complete (e.g., FDB continuous conflicts), theoretically infinite loop.
- Do modifications involving
commit_txndispatch loop consider maximum retry count or total timeout?
4.1.4 Versionstamp Usage
Transaction ID is derived from FDB versionstamp (get_txn_id_from_fdb_ts).
- Does code depending on FDB versionstamp correctly use
atomic_set_ver_key()/atomic_set_ver_value()? - Is
get_versionstamp()only called aftercommit()succeeds?
4.2 Meta-Service RPC Specifications
4.2.1 RPC Preprocessing
Each RPC uses RPC_PREPROCESS macro for preprocessing:
-
StopWatch timing
-
DORIS_CLOUD_DEFER sets response status and records metrics on return
-
RPC_RATE_LIMITperforms per-instance QPS throttling -
Do new RPCs use
RPC_PREPROCESSandRPC_RATE_LIMIT? -
Is response
MetaServiceCodecorrectly set?
4.2.2 MetaServiceProxy Automatic Retry
MetaServiceProxy decorator automatically retries (exponential backoff) on following error codes:
-
KV_TXN_STORE_*_RETRYABLE,KV_TXN_TOO_OLD,KV_TXN_CONFLICT -
Do new RPCs need to be idempotent? Retry means same request may execute multiple times
-
Do non-idempotent operations correctly handle
TXN_MAYBE_COMMITTED?
4.3 Recycler Safety Specifications
Recycler is responsible for GC of expired/deleted data:
- Before deleting rowset data, are associated transactions aborted first (
abort_txn_for_related_rowset)? Prevents commit-after-delete data loss - Are recycle operations best-effort + next-round retry mode? Should not stop entire recycle flow due to single failure
- Is packed file reference count recycling correct? (
recycle_packed_files)
4.4 Delete Bitmap (Cloud MoW, Extremely High Priority)
Cloud mode delete bitmap management fundamentally differs from shared-nothing mode:
- Bitmaps in
CloudTxnDeleteBitmapCachecontain sentinel marks andTEMP_VERSION_COMMON, must manually replace version before use. This is a critical correctness constraint explicitly marked in code comments - Empty rowset marks returned by
is_empty_rowset()are not actively cleared (rely on expiration cleanup), does this affect correctness? -
CloudTxnDeleteBitmapCacheLRU eviction side effect: When cache entry is LRU evicted,publish_statusis reset toINIT, causing next access to completely recalculate delete bitmap (not incremental update). This is correct but expensive. Additionally, marks written bymark_empty_rowset()are never actively cleared—they only disappear when entire cache entry is evicted. On high-frequency MoW tables with insufficient cache capacity, causes repeated recalculation - Do modifications involving
CloudTxnDeleteBitmapCacheconsider impact of state reset after eviction? - Are
CalcDeleteBitmapTask's compaction stats consistent with latest state in meta-service?
Part 5: Cross-Module Concerns
5.1 FE-BE Protocol Compatibility
- Do new
TPlanNodeTypes have corresponding branches in BE's processing switch? - Can deprecated node types (e.g.,
MERGE_JOIN_NODE,KUDU_SCAN_NODE) still be gracefully handled? - If new variables are added, are they correctly set in all paths where FE sends execution plans to BE?
5.2 Cloud Mode vs Shared-Nothing Mode
Code extensively uses Config.isNotCloudMode() / Config.isCloudMode() branches.
- Do new features consider both modes?
- In Cloud mode, tablets are not bound to specific BE—any compute node can load any tablet's metadata
- In Cloud mode, rowsets go through two-phase protocol (
prepare_rowset->commit_rowset), with recycle marker protection in between - In Cloud mode, partition version is centrally managed by Meta-Service, not FE
5.3 Merge-on-Write Unique Key Tables
MoW tables are one of the most complex data paths, require special attention during review:
- Write path: Does
SegmentWriter::probe_key_for_mow()correctly probe existing rowsets? - Publish path: Is delete bitmap calculation under
_rowset_update_lockprotection? - Query path: Is delete bitmap version-aligned before reading?
- Compaction path: Does
calc_delete_bitmap_for_compaction()correctly handleallow_delete_in_cumu_compaction? - Cloud mode: Does delete bitmap cache have expiration mechanism? Is
_unused_delete_bitmapcleaned up promptly?
5.4 Performance Hotspot Concerns
- Does
SendBatchThreadPool(64 threads, 102K queue) have saturation risk in heavy shuffle scenarios? - Does
SegmentPrefetchThreadPool(max 2000 threads) create too many OS threads in intensive scan scenarios? - In Cloud mode, does label deduplication check perform linear scan on labels with many aborted transactions, causing performance cliff?
- Is serialization overhead of
_serialize_block_to_row_column()for wide tables within expectations? - COW columns trigger deep copy on
mutate()whenuse_count > 1—are there unnecessary copies in high-concurrency scenarios?
Part 6: Testing Review Guide
6.1 Regression Test Specifications
- Is
order_qt_prefix or manualORDER BYused to ensure ordered results? - Do expected error cases use
test { sql "..."; exception "..." }pattern? - Are tables DROPped before use (
DROP TABLE IF EXISTS xxx), not after test ends? - Do simple tests directly hardcode table names instead of using
def tableName? - Do tests cover these seven scenarios: empty table, all null, mixed null, pure non-null, constant, constant null, nullable wrapper (
nullable())? - Are result files (
.out) auto-generated via-genOut? Handwriting prohibited
6.2 BE Unit Test Specifications
- Are Google Test's
TEST/TEST_Fmacros used? - Are tests independent? Not dependent on other tests' execution order
- Are
ASSERT_*(failure terminates test) andEXPECT_*(failure continues execution) used correctly? - Do memory-intensive tests use
MemTrackerto check for leaks?
6.3 FE Unit Test Specifications
- Is JUnit 5's
@Testannotation used? - Do test method names clearly describe test scenarios?
- Is
Assertions.assertXxx()used instead of JUnit 4's static imports? - Do tests involving concurrency use
CountDownLatch/CyclicBarrierfor synchronization?
6.4 FE-BE Integration Test Specifications
Part 7: Known High-Risk Code Patterns Quick Reference
| Pattern | Risk Level | Location | Description |
|---|---|---|---|
static_cast<void>(status_returning_call) |
CRITICAL | BE global (195+ places) | Silently swallows errors, no logging |
| Delete bitmap TEMP_VERSION not replaced | CRITICAL | cloud_txn_delete_bitmap_cache.h:72-75 |
Data visibility error |
be_exec_version mismatch |
CRITICAL | FE-BE communication all paths | Serialization format incompatibility causes data corruption |
(DataTypeNullable*) C-style cast |
HIGH | block.h merge_impl |
No protection in release builds |
sizeof(size_t*) vs sizeof(size_t) |
HIGH | data_type.cpp:195 |
Happens to work on 64-bit, crashes on 32-bit |
| Lazy commit three-switch combination | HIGH | meta_service_txn.cpp |
Missing config causes large transaction failure |
| DecimalV2 original_precision default UINT32_MAX | HIGH | data_type_decimal.h |
Affects string representation correctness |
MOW default false |
MEDIUM | segment_writer.h:70 |
Omission causes deduplication to be silently skipped |
| Cloud version cache TTL | MEDIUM | OlapTable.getVisibleVersion() |
May read stale version |
| Label scan linear complexity | MEDIUM | meta_service_txn.cpp |
Many aborted txns cause performance cliff |
THROW_IF_ERROR no outer catch |
MEDIUM | vec/ execution engine | Uncaught exception propagation causes crash |
THROW_IF_ERROR in Defer lambda |
CRITICAL | pipeline_task.cpp:412-413, 638-639 |
Throwing exception during stack unwinding causes std::terminate |
THROW_IF_ERROR in Thrift RPC handler |
HIGH | backend_service.cpp:1323,1326 |
No try/catch protection, connection drops |
DCHECK disappears in RELEASE builds |
HIGH | BE global (2,872 places) | No invariant protection in production, should migrate to DORIS_CHECK |
ThreadPool::submit_func no SCOPED_ATTACH_TASK |
HIGH | BE global (87+ places) | Memory counted in orphan tracker, debug crashes |
TransactionState.tableIdList no sync |
MEDIUM | TransactionState.java:632 |
ArrayList concurrent operations cause CME or data loss |
GlobalTransactionMgrIface unsafe downcast |
MEDIUM | InternalCatalog.java:994 etc 4 places |
ClassCastException in non-Cloud mode |
commit_txn dispatch loop no retry limit |
MEDIUM | meta_service_txn.cpp:3257-3341 |
Theoretically infinite loop when lazy commit continuously fails |
CloudTxnDeleteBitmapCache LRU eviction reset |
MEDIUM | cloud_txn_delete_bitmap_cache.h |
After eviction, publish_status resets to INIT, triggers full recalculation |
| TabletScheduler synchronized + db lock | MEDIUM | TabletScheduler.java |
Mitigated with two-step method, new code must follow |
getVisibleVersion() RpcException |
MEDIUM | FE Cloud mode multiple places | Catch and return 0 causes MV misjudgment |
| FE/BE function return type mismatch | HIGH | FE SIGNATURES vs BE get_return_type_impl | build() throws INTERNAL_ERROR, query fails directly |
| Null framework disabled without custom null map | HIGH | use_default_implementation_for_nulls()=false |
Null row data treated as valid data |
| IFunction subclass has member data | HIGH | SimpleFunctionFactory registration |
sizeof static_assert compilation failure or UB |
| Aggregate function Nullable/NotNullable tag error | HIGH | helpers.h creator_without_type |
count returns null or sum doesn't return null |
| Variadic function family name mismatch | MEDIUM | SimpleFunctionFactory::get_function |
BE function lookup fails, returns nullptr |
need_replace_null_data_to_default omitted |
MEDIUM | Arithmetic/string functions | Garbage data at null positions causes overflow or huge memory allocation |
| ColumnString offsets update error | MEDIUM | String function execute_impl | Silently corrupts subsequent row data |
| Inverted index field declaration order | HIGH | inverted_index_writer.h:92-95 |
_dir must be declared before _index_writer, otherwise use-after-free on destruction |
op_not const but modifies data |
MEDIUM | inverted_index_reader.h:170 |
Modifies via shared_ptr bypassing const, callers assume immutable |
| Schema change four-level lock order | HIGH | schema_change.cpp:972-976 |
base push→new push→base header→new header, out-of-order deadlock |
| MOW schema change delete bitmap four steps | CRITICAL | schema_change.cpp:1595-1657 |
Step 3 must block publish, omission causes bitmap inconsistency |
ExternalCatalog.isInitializing dead code |
MEDIUM | ExternalCatalog.java:358 |
Never set to true, reentrancy protection ineffective |
lowerCaseToDatabaseName.clear() race |
HIGH | ExternalCatalog.java:504 |
Concurrent get returns null between clear and refill |
resetToUninitialized() sets fields to null |
HIGH | ExternalCatalog.java:581-590 |
Concurrent queries encounter null fields midway causing NPE |
| BackupJob PENDING→SNAPSHOTING no EditLog | MEDIUM | BackupJob.java:620-624 |
Intentionally no log, restart reverts to PENDING and resends |
| RestoreJob dual state fields | MEDIUM | RestoreJob.java:157,162 |
state set to FINISHED first, showState after EditLog |
WorkloadGroup exceed_limit omission |
MEDIUM | workload_group.h:132-135 |
Only checks _total_mem_used, ignores refresh interval increment |
Part 8: Function System Review Guide (Non-Obvious Conventions Only)
This section only records implicit conventions and traps in function development that are not easily apparent from the code itself. Interface definitions, registration steps, and other content directly readable from code are not repeated.
8.1 FE-BE Consistency Traps (Most Common Issues)
- FE and BE independently calculate return types:
FunctionBuilderImpl::build()compares both sides' results, throwsINTERNAL_ERRORon mismatch. Date/DateV2/DateTime/DateTimeV2 and Decimal versions allow interoperability, other types must strictly match - Function names must be all lowercase: FE automatically
toLowerCase()inFunctionNameconstructor, but BESimpleFunctionFactorymatches exact strings. Spelling or case mismatch = BE can't find function - Variadic function lookup key = function name + each parameter type's
get_family_name(): For example,hex(String)->"hexString",hex(Int64)->"hexInt64". FE-passed parameter type family names must correspond to BE-registeredget_variadic_argument_types_impl(), otherwise lookup fails - FE/BE aliases must be registered on both sides: If
register_alias()alias doesn't have corresponding registration on FE side (alias list inBuiltinScalarFunctions.java), query fails - Aggregate function intermediate type mismatch causes shuffle serialization failure—FE and BE must agree on
intermediate_type
8.2 Null Handling Framework Traps
- Must build custom null map after disabling
use_default_implementation_for_nulls(): This is the most common null-related bug. Framework defaults to automatically strippingColumnNullablewrapper and OR-merging null map after execution; after disabling, function must handle completely - FE nullable semantics must match BE: FE's
PropagateNullablecorresponds to BE default null framework enabled. If BE disables default null handling, FE-side nullable interface choice must match -
need_replace_null_data_to_default()omission causes crashes: Underlying data at null positions is garbage. String repeat count as garbage → huge memory allocation; arithmetic operation garbage → overflow. Operations that may produce such issues must be set totrue -
ColumnConst(ColumnNullable(...))nesting: After disabling default null handling, const null columns are this nested type, need special checking. Framework won't auto-unpack when some parameters are const
8.3 BE Registration Implicit Constraints
-
IFunctionsubclasses prohibited from having member data:sizeof(Function) == sizeof(IFunction)static_assert enforces at compile time. State should be placed inFunctionContext -
_foreachcombinator must be registered last: It traverses all registered functions to auto-generate wrappers, early registration misses subsequent functions - Aggregate function Nullable/NotNullable tag directly affects result:
NullableAggregateFunctiontag makes result nullable (e.g., sum),NotNullableAggregateFunctionmakes result non-nullable (e.g., count). Wrong tag causes count to return null or sum to not return null on all-null input - Aggregate function parameter tag affects nullable wrapper choice:
UnaryExpression/MultiExpression/VarargsExpressiondetermines whether to use single-parameter or multi-parameter nullable wrapper -
ColumnStringdoesn't include terminating zero:insert_data(ptr, len)'slendoesn't include\0,offsetsmust be strictly correctly updated, otherwise silently corrupts subsequent rows
Part 9: Development Standards Supplement
9.1 Configuration Parameter Addition Specifications
When adding new config::xxx parameters:
- Define in
be/src/common/config.h, must provide default value and detailed comments - Configuration name uses
snake_case, add// NOLINTto avoid lint warnings - Does it need to support dynamic modification? If yes, use
DEFINE_mBool/Int32/...(m = mutable) - In Cloud mode, does configuration need to be synchronously added in Meta-Service? Check configuration definitions under
cloud/src/meta-service/
Example:
DEFINE_mInt32(my_config, "100"); // NOLINT
// Describe configuration purpose, unit, scope of impact, rationale for default value choice
9.2 Metrics and Observability
New features should add monitoring metrics:
BE Metrics:
- Use
ADD_TIMER/ADD_COUNTER/ADD_GAUGEmacros to add metrics in critical paths - Metric names use
snake_case, prefix with module name (e.g.,compaction_xxx,query_xxx) - Expose to HTTP interface:
be_ip:webserver_port/metrics
Critical paths must monitor:
- Memory allocation frequency and size statistics
- RPC call latency and error rate
- IO operation latency (S3/HDFS/OSS)
- Lock wait time (lock operations exceeding 100ms)
9.3 bthread/Coroutine Safety
BE extensively uses bthread, following are easily overlooked bthread-specific constraints:
- Avoid holding
std::mutexfor long time in bthread—this blocks entire bthread worker thread (not just current coroutine). Preferbthread_mutex_t - Use
bthread_usleepinstead ofsleep/usleep, yields coroutine not thread
9.4 Error Message Specifications
- Error messages include context information: table name, partition ID, TabletID, transaction ID, operation type
- User-visible errors use
ErrorCodedefinition (custom error codes start from 5000)
9.5 Code Comment Specifications
- Lock-protected data: Comment next to member variables
// protected by xxx_lock - TODO/FIXME: Include author, date, brief description, e.g.,
// TODO(zhaochangle, 2026-03-01): optimize this path