Reduce Unoptimized Query Oracle Test Failure

Reduce an unoptimized-query-oracle test failure log to the simplest possible reproduction case.

The unoptimized-query-oracle roachtest runs a series of random SQL statements to create a random dataset, and then executes a random "Query of Interest" twice, with different optimization settings. If the two executions return different results, it indicates a bug in CockroachDB.

When to Use

Use this skill when:

You have a test failure from the unoptimized-query-oracle roachtest.
You need to find the minimal SQL to reproduce the test failure.

Step 1: Locate artifacts

Ask the user where the artifacts directory is.

Find the relevant files in the artifacts directory:

Test parameters: params.log (the parameters from the roachtest)
Test log: test.log (the log from the roachtest)
Failure log: failure*.log (the failure log from the roachtest)
Full SQL log: unoptimized-query-oracle*.log (the SQL statements that led to failure)
Query of interest log: unoptimized-query-oracle*.failure.log (containing the query of interest and possibly more information about the failure)
Cockroach log: logs/1.unredacted/cockroach.log or logs/unredacted/cockroach.log (contains the git commit)

Step 2: Determine test configuration

Determine the git commit from cockroach.log:

grep "binary: CockroachDB" cockroach.log

Look for the commit hash in the version string (e.g., cb94db961b8f55e3473f279d98ae90f0eeb0adcb).

Determine if runtime assertions are enabled by checking for:

"runtimeAssertionsBuild": "true" in params.log
or Runtime assertions enabled in test.log

Determine if metamorphic settings apply by looking for:

lines like these in params.log:

"metamorphicBufferedSender": "true",
"metamorphicWriteBuffering": "true",

or lines like these in test.log:

metamorphically setting "kv.rangefeed.buffered_sender.enabled" to 'true'
metamorphically setting "kv.transaction.write_buffering.enabled" to 'true'

Determine environment variables from the beginning of cockroach.log:

grep -A10 "using local environment variables:" cockroach.log

Important environment variables include:

COCKROACH_INTERNAL_CHECK_CONSISTENCY_FATAL
COCKROACH_INTERNAL_DISABLE_METAMORPHIC_TESTING
COCKROACH_RANDOM_SEED
COCKROACH_TESTING_FORCE_RELEASE_BRANCH But there might be more important environment variables, so best to get all of them.

Determine if this is a multi-region test or single-region test by checking:

the test name (e.g., seed-multi-region in test.log indicates multi-region)
or the presence of \connect lines in the full SQL log If both of these are missing, it's a single-region test.

Step 3: Check Out and Build

For a normal build use:

git checkout <commit-hash>
./dev build short

If runtime assertions were enabled, use a test build instead:

git checkout <commit-hash>
./dev build short -- --crdb_test

Note: Only build libgeos if the reproduction uses geospatial functions (BOX2D, geometry, geography, etc.):

./dev build libgeos

Step 4: Prepare the Full SQL Log File

First, check that the following statements are at the top of the full SQL log file. If they are not, add them:

SET statement_timeout='1m0s';
SET sql_safe_updates = false;

If metamorphic settings were used, also add them to the top of the full SQL log file:

SET CLUSTER SETTING kv.rangefeed.buffered_sender.enabled = true;
SET CLUSTER SETTING kv.transaction.write_buffering.enabled = true;

Create an appropriate directory either in the artifacts directory or in the repository root for holding temp files.

Step 5: Initial Reproduction

Determine the correct demo command based on test type:

Multi-region test: Use --nodes=9
Single-region test: Omit --nodes option

Use a command like this to try reproducing the test failure from the full SQL log file. This command could take up to 20 minutes to finish.

<env vars> ./cockroach demo --multitenant=false --nodes=9 --insecure --set=errexit=false --no-example-database --format=tsv -f <full-sql-log-file>

Check that the output reproduces the test failure described in the failure log. There are many possible failure modes. Look for one of the following, which should match the failure log:

Different results between the two executions of the "Query of Interest" (which is the randomly generated SELECT statement repeated twice near the end of the log, wrapped in various SET and RESET staements). These different results could take the form of different result sets, or could also be an error in one case and no error in the other case. This is an "oracle" failure.
Or, internal error or assertion failure. Note the error message for the reduce step.
Or, a panic. Note the error message for the reduce step.
Or, a timeout. Note the statement that timed out.

Troubleshooting

IMPORTANT: Many failures are nondeterministic, especially for multi-region tests. If no failure happens on the first run, try up to 10 times before concluding it doesn't reproduce.

It can be helpful at this point to compare the output with the failure*.log which should show the failure from the original test run.

If the initial run fails to reproduce after 10 times, pause here and report to the user that the failure cannot be reproduced, and show the command that was tried. The user might have additional instructions.

If it looks like it reproduces, it's time to move on to the next step.

Step 6: Use the Reduce Tool

Build the reduce tool:

./dev build reduce

Prepare the Full SQL Log File again

For multi-region tests, remove \connect lines (they cause syntax errors in the reduce tool):

grep -v '^\\connect' <full-sql-log-file> > <cleaned-log>

Run Reduce

IMPORTANT: The reduce tool must be run from the cockroach repository root directory, because it looks for ./cockroach in the current directory.

Use the -multi-region option for multi-region tests, or omit it for single-region tests.

For "oracle" failures (different results):

./bin/reduce -unoptimized-query-oracle -multi-region -chunk 25 -v -file <cleaned-log> 2>&1 | tee reduce-output.log

The -unoptimized-query-oracle option checks whether the two executions of the "Query of Interest" produce the same results.

For internal errors/assertion failures/panics:

./bin/reduce -contains "<error-regex>" -multi-region -chunk 25 -v -file <cleaned-log> 2>&1 | tee reduce-output.log

Use a distinctive part of the error message as the -contains regex (e.g., "nil LeafTxnInputState").

The reduce tool might take up to an hour to run.

Extract the Reduced SQL

The reduce tool outputs progress lines followed by the final SQL. Extract just the SQL:

grep -A1000 "^reduction: " reduce-output.log | tail -n +2 > reduced.sql

IMPORTANT: Immediately save a backup of the reduce output before manual simplification:

cp reduced.sql reduced_original.sql

This provides a recovery point if the working file gets corrupted during simplification.

If the reduce tool fails to reproduce, pause here and report this to the user. They might have additional instructions. Occasionally we have to modify the reduce tool itself, if the test failure is not reproducing.

Step 7: Create Test Script and Determine Reproduction Rate

IMPORTANT: Many bugs are nondeterministic. Before manual simplification, create a reusable test script and determine the reproduction rate.

Create a small test script (adjust as needed):

cat > test_repro.sh << 'EOF'
#!/bin/bash
# Test if reduced_v2.sql reproduces the error (exits on first success, up to 10 attempts)
for i in {1..10}; do
  if ./cockroach demo --multitenant=false --nodes=9 --insecure \
     --set=errexit=false --no-example-database --format=tsv \
     -f reduced_v2.sql 2>&1 | grep -q "<error-pattern>"; then
    echo "Run $i: REPRODUCED"
    exit 0
  else
    echo "Run $i: no error"
  fi
done
echo "FAILED"
EOF
chmod +x test_repro.sh

For "oracle" failures, instead of checking for an error pattern, the test script probably needs to isolate and diff the results of the two executions of the "Query of Interest".

Run the test script to determine the reproduction rate. It's not always 100%.

This rate determines how many attempts you need when testing simplifications:

100% rate: Single attempt sufficient
50% rate: 2-3 attempts usually sufficient
10% rate: Need ~10 attempts to be confident
<5% rate: May need 20+ attempts

Note that in some cases, the following settings might need to be added back to the reduced file to get a repro:

SET statement_timeout='1m0s';
SET sql_safe_updates = false;

If the reduced SQL fails to reproduce after 10 attempts, pause here and report this to the user. They might have additional instructions.

Step 8: Manual Simplification

Now iteratively simplify the SQL while maintaining reproduction.

CRITICAL: For nondeterministic failures, you MUST test each simplification with enough attempts based on the repro rate. A single failed attempt does NOT mean the simplification broke the repro - it may just be nondeterminism.

Workflow for Each Simplification

Copy reduced.sql to reduced_v2.sql
Make ONE small change to reduced_v2.sql
Run ./test_repro.sh (which tests reduced_v2.sql)
If it reproduces: Copy reduced_v2.sql to reduced.sql, continue simplifying
If it doesn't reproduce after enough attempts: Discard reduced_v2.sql, try a different change (i.e. backtrack).

This workflow avoids needing to restore files - you always keep the last working version in reduced.sql.

IMPORTANT: Run copy, edit, and test as separate bash commands (not chained with &&). This reduces the number of permission checks.

What to Try Removing (in rough order)

Query projections and aggregations - Simplify SELECT list to just essential columns
Query predicates - Simplify WHERE clause
Indexes - Try removing secondary indexes
Query joins - Simplify WHERE clause
Columns from CREATE TABLE - Remove columns not referenced in the failing query
Weird characters - Remove or replace non-ASCII characters from names and data
other SQL simplifications

For "oracle" failures, when editing the Query of Interest, be sure to edit BOTH copies of the Query of Interest so that they are identical. Otherwise it won't be an apples-to-apples comparison when diffing the result sets.

Common Required Elements

These often cannot be removed:

Optimizer random seed: SET testing_optimizer_random_seed = <value> - this specific value often cannot be changed, as it determines which optimizer rules are disabled
Optimizer rule probability: SET testing_optimizer_disable_rule_probability
- affects query plan selection
Specific RESET/SET sequences for optimizer settings, such as distsql and vectorize
Certain indexes (affect query plans)
Multi-node setup (--nodes=9) for distributed query bugs (though try single-node first - it may work and is simpler)
CREATE STATISTICS statements (affect query planning)

Backtracking

If a change breaks reproduction:

Discard reduced_v2.sql (don't copy it to reduced.sql)
Verify reduced.sql still reproduces. If it doesn't, this means the repro is nondeterministic. (It might have started out nondeterministic, or might have become nondeterministic over the course of simplification.) Try reproducing it 10 times and note the new repro rate. Use the new repro rate to adjust the number of repro attempts during each simplification step going forward.
Try a DIFFERENT simplification

Never continue simplifying from a broken state.

If you get stuck (i.e. cannot reproduce again after backtracking), stop and report to the user with the exact command you were trying.

Step 9: Final Verification and Output

After about 20 minutes of simplification, or if there are no more simplifications after backtracking a few times, it's time to stop.

Run reproduction 10+ times to confirm stability and determine final repro rate
Document the minimal reproduction steps
Note which elements were required vs optional

Output

The final output should include two files that can be shown to the user:

reduced.sql - The minimal SQL script that reproduces the bug
bisect_run.sh - A script for use with git bisect run

Write the output in such a way that it could be copied and pasted into a terminal.

Example Output Format

(The commands in this output should be edited to match what was necessary to reproduce.)

# Minimal Reproduction

# reduced.sql
cat > reduced.sql << 'EOF'
CREATE TABLE t ();

SET testing_optimizer_random_seed = 1234567890;
SET testing_optimizer_disable_rule_probability = 0.5;

SELECT ...;
EOF

# bisect_run.sh
cat > bisect_run.sh << 'EOF'
#!/bin/bash
# Git bisect run script
# Exit codes: 0=good (bug not present), 1=bad (bug present), 125=skip (build failed)

REPO_DIR="/path/to/cockroach"
REPRO_SQL="/path/to/reduced.sql"

cd "$REPO_DIR" || exit 125

echo "=== Testing commit $(git rev-parse --short HEAD) ==="

# Build (use --crdb_test if runtime assertions were enabled in the original test)
if ! ./dev build short -- --crdb_test 2>&1 | grep -q "Successfully built"; then
    echo "BUILD FAILED - skipping"
    exit 125
fi

# Test for bug (try 3 times for flaky bugs)
for i in {1..3}; do
    if ./cockroach demo --multitenant=false --insecure \
        --set=errexit=false --no-example-database --format=tsv \
        -f "$REPRO_SQL" 2>&1 | grep -q "<error-pattern>"; then
        echo "BUG PRESENT - marking as BAD"
        exit 1
    fi
done

echo "Bug not present - marking as GOOD"
exit 0
EOF
chmod +x bisect_run.sh

# Command to reproduce
git checkout <commit-hash>
./bisect_run.sh

# Command to bisect
git bisect start ...
git bisect run bisect_run.sh

# Failure
# <paste stacktrace or relevant failure details here>

# Repro rate: ~X% (may need multiple attempts)

After showing this output, ask the user if they want to try reproducing the bug on master branch.

Optional Step 10: Check if Bug is Fixed on Master

Before bisecting, check whether the bug has already been fixed on master.

git stash  # if needed
git checkout master
./dev build short -- --crdb_test
./cockroach demo --multitenant=false --insecure --set=errexit=false --no-example-database --format=tsv -f reduced.sql

Run this a few times to account for flakiness. Note whether the bug reproduces on master or not.

Optional Step 11: Bisect

If the user wants to find the commit that introduced or fixed the bug, use git bisect.

If the Bug is Already Fixed on Master

Bisect to find the fix commit (the first commit where the bug no longer reproduces). Use custom terms since the "good" commit (master) is newer than the "bad" commit:

git bisect start --first-parent --term-old=broken --term-new=fixed
git bisect broken <commit-where-bug-exists>   # e.g., the original failing commit
git bisect fixed master                        # master is fixed

git bisect run ./bisect_run.sh

# When done
git bisect reset

Note: The --first-parent option follows only merge commits on the main branch, avoiding detours into feature branches. The bisect script must return 0 when the bug is NOT present (fixed) and 1 when the bug IS present (broken).

If the Bug Still Exists on Master

Bisect to find the regression commit (the first commit where the bug was introduced):

git bisect start --first-parent
git bisect good <known-good-commit>   # e.g., a previous release tag
git bisect bad master                  # master has the bug

git bisect run ./bisect_run.sh

# When done
git bisect reset

The bisect will identify the commit that introduced or fixed the bug.

Finding a Good Commit

If you don't know a good commit (where the bug doesn't exist), you can jump back in time to find one.

# Find a commit from ~6 months ago on the main branch
git rev-list --first-parent -1 --before="6 months ago" HEAD

Test whether the bug exists at that commit. If not, use it as the good commit for bisect. If the bug still exists, try going back further in time, but don't go back further than 1 year.

If a known good commit can't be found within 1 year, stop and report this to the user.

reduce-unoptimized-query-oracle