SQL Test Generator (Unit Tests Only)

Composable: Works with bigquery-etl-core (for patterns), metadata-manager (for test updates), and query-writer (for query understanding) When to use: Creating/updating unit test fixtures for BigQuery SQL queries, preventing production queries in tests

NOT for data quality checks: Use bigconfig-generator skill for Bigeye monitoring. The checks.sql file approach is DEPRECATED.

Generate unit test fixtures for BigQuery SQL queries following bigquery-etl conventions. These are development-time tests that validate query logic on small, synthetic data.

Official Documentation: https://mozilla.github.io/bigquery-etl/cookbooks/testing/

Unit Tests vs Data Quality Checks

CRITICAL DISTINCTION:

This skill creates UNIT TESTS:

Test query logic during development
Run on small, synthetic fixtures (not production data)
Validate transformations, joins, aggregations work correctly
Part of CI/CD pipeline

For data quality monitoring (NOT this skill):

Use bigconfig-generator skill to create Bigeye monitoring configurations
Monitors production data for quality issues (anomalies, nulls, freshness)
checks.sql files are DEPRECATED - do NOT create or update them
All production data quality checks should be done via Bigeye

🚨 REQUIRED READING - Start Here

BEFORE creating any test fixtures, you MUST read these files:

CRITICAL FOR SAFETY: Read references/preventing_production_queries.md
- Prevents accidentally querying production data
- Explains how to verify fixtures are working correctly
- Required reading before ANY test creation
UNDERSTAND TEST PATTERNS: Read references/test_strategy_patterns.md
- Learn which test scenarios are needed for different query types
- Understand how many tests to create
- See examples of test coverage strategies

📋 Templates - Copy These Structures

When creating test fixtures, READ and COPY the structure from these template files:

For Input Fixtures:

Glean events? → READ assets/glean_events_fixture.yaml and copy its structure
- Shows proper client_info, events, and extra formatting
- Use this for any *_stable.events_v1 or Glean ping tables
Legacy telemetry with arrays? → READ assets/legacy_array_fixture.yaml
- Shows how to structure array fields that get UNNESTed
- Use for legacy telemetry tables with array columns
Simple table? → READ assets/simple_test_input.yaml
- Basic fixture structure for flat tables
UNION ALL query? → READ both assets/union_all_fixture1.yaml AND assets/union_all_fixture2.yaml
- Shows how to create fixtures for multiple data sources
- Critical: ALL sources need fixtures in the SAME test directory

For Query Parameters:

READ assets/query_params_example.yaml - Must be array format, not key-value pairs

For Expected Output:

READ assets/simple_test_expect.yaml - Shows array format and NULL handling
READ assets/timestamp_example.yaml - Shows correct TIMESTAMP format (ISO 8601 with timezone)

Reference Documentation (Read as needed)

references/common_test_failures.md - READ THIS when tests fail - Real-world failures and solutions (NULL handling, ordering, timestamps)
references/yaml_format_guide.md - YAML syntax, type inference, nested structures (read if you encounter YAML errors)
references/external_documentation.md - Links to official docs and related resources

DataHub Usage (CRITICAL for Token Efficiency)

IMPORTANT: DataHub is for lineage discovery ONLY, NOT for schema lookups

For schema discovery, use this priority order:

🥇 Local schema.yaml files in sql/ directory
🥈 Glean Dictionary for _live and _stable tables
🥉 BigQuery ETL docs (https://mozilla.github.io/bigquery-etl/)
❌ NEVER use DataHub for schemas - use it only for lineage

When DataHub IS useful:

✅ Discovering upstream/downstream table dependencies
✅ Finding which tables are related
✅ Understanding table usage patterns

See these guides:

../metadata-manager/references/schema_discovery_guide.md - Complete schema discovery workflow
../metadata-manager/references/glean_dictionary_patterns.md - Token-efficient Glean Dictionary usage
../metadata-manager/scripts/datahub_lineage.py - Efficient lineage queries

Test Structure

Tests live in: tests/sql/<project>/<dataset>/<table>/<test_name>/

Required files:

Input fixtures: <full_table_reference>.yaml (e.g., moz-fx-data-shared-prod.telemetry.events.yaml)
Expected output: expect.yaml
Query parameters (if needed): query_params.yaml

Critical Requirements

⚠️ PREVENTING PRODUCTION QUERIES - READ THIS FIRST ⚠️

The #1 way tests accidentally query production data:

Missing fixture files for ANY table referenced in FROM/JOIN/UNION clauses
When a fixture is missing, BigQuery falls back to querying the production table

How to prevent this:

Use grep -E "FROM|JOIN" query.sql to identify ALL source tables
Create a fixture file for EVERY table found, even if it contributes minimal data
If test results show thousands of rows or production-like data, STOP - you're querying production
Check the "Initialized" lines in pytest output - every source table should appear

1. File Naming - MOST COMMON FAILURE

Input fixtures must match how the table is referenced in the query:

Query uses moz-fx-data-shared-prod.dataset.table → file must be moz-fx-data-shared-prod.dataset.table.yaml
Query uses dataset.table → file must be dataset.table.yaml
Query uses table → file must be table.yaml

Wrong naming causes tests to query production BigQuery instead of your fixtures.

2. YAML Format

All fixtures must use array syntax (starts with -). See references/yaml_format_guide.md for details.

# Correct - array syntax with dashes
- field1: value1
  field2: value2

3. Required Fields

Input fixtures: Include ALL fields the query references, even if NULL: loaded: null
expect.yaml: COMPLETELY OMIT fields that will be NULL - do NOT include them with field: null
- ❌ Wrong: avg_time_seconds: null
- ✅ Correct: omit the field entirely
- BigQuery does not return NULL fields in results, so including them in expect.yaml will cause test failures
CRITICAL for UNION/UNION ALL queries: You MUST create fixtures for ALL source tables with at least one row of test data. If you want a source to contribute no data, either:
- Add a fixture with data that gets filtered out by WHERE clauses, OR
- Create a minimal fixture that passes filters but contributes to the test
- DO NOT create empty array fixtures ([]) - they cause "Schema has no fields" errors
- DO NOT omit fixture files - missing fixtures cause the query to hit production BigQuery
Omit generated_time columns from expect.yaml

4. Query Parameters Format

Must be an array (starts with -), not key-value pairs. See assets/query_params_example.yaml for template.

- name: submission_date
  type: DATE
  value: "2024-12-01"

5. TIMESTAMP Format ⚠️ VERY COMMON FAILURE

CRITICAL: BigQuery returns TIMESTAMP and DATETIME fields in ISO 8601 format with timezone.

In expect.yaml, ALWAYS use ISO 8601 format:

# ✅ CORRECT - ISO 8601 with timezone
- campaign_created_at: "2025-06-01T10:00:00+00:00"
  campaign_updated_at: "2025-07-01T12:00:00+00:00"

# ❌ WRONG - Missing 'T' and timezone
- campaign_created_at: "2025-06-01 10:00:00"
  campaign_updated_at: "2025-07-01 12:00:00"

Format rules:

TIMESTAMP fields: 2025-06-01T10:00:00+00:00 (ISO 8601)
DATE fields: 2025-06-01 (YYYY-MM-DD)
With microseconds: 2025-06-01T10:00:00.123456+00:00

Input fixtures can use either format, but expect.yaml MUST use ISO 8601.

See references/common_test_failures.md section 3 for details and examples.

Test Generation Strategy

See references/test_strategy_patterns.md for comprehensive patterns.

Quick analysis checklist:

Join types (FULL OUTER, LEFT, INNER) → test each branch
Aggregations/GROUP BY → test single and multiple rows
CASE/IF logic → test each branch
UDFs and map operations → test empty, existing, new values
Incremental logic → test first run vs subsequent runs
Date-based filtering → test different date ranges
Version filtering → test both sides of thresholds (use "120.0.0" format)
UNION/UNION ALL → create ONE comprehensive test with fixtures for ALL data sources

Number of tests:

Simple queries: 1 test
Moderate complexity: 1-2 tests
Complex (FULL OUTER JOIN, maps, multiple CTEs): 3-5 tests
UNION ALL queries: 1 comprehensive test per date/filter branch

Test naming: Use descriptive snake_case: test_new_clients_only, test_union_all_sources

Environment Setup

Before running tests, configure Google Cloud authentication:

export GOOGLE_PROJECT_ID=bigquery-etl-integration-test
gcloud config set project $GOOGLE_PROJECT_ID
gcloud auth application-default login

See https://mozilla.github.io/bigquery-etl/cookbooks/testing/ for more details.

Workflow

Step-by-Step Process

Read the required reference files (from "REQUIRED READING" section above)
- READ references/preventing_production_queries.md
- READ references/test_strategy_patterns.md
Read the query.sql file
Gather schema information efficiently (use priority order):
- FIRST: Check local schema.yaml files in sql/ directory for derived tables
- SECOND: Check Glean Dictionary for _live and _stable tables
  - See ../metadata-manager/references/glean_dictionary_patterns.md for token-efficient extraction
  - Extract only needed fields (don't read entire large files)
- THIRD: Check https://mozilla.github.io/bigquery-etl/ dataset browser
- DataHub MCP: For lineage ONLY, NOT for schemas
  - Use DataHub to discover upstream/downstream tables
  - Then use priority order above to get schemas for those tables
  - See ../metadata-manager/references/schema_discovery_guide.md for complete workflow
  - See ../metadata-manager/scripts/datahub_lineage.py for efficient lineage queries
Identify ALL source tables - use this command:
```
grep -E "FROM|JOIN" query.sql
```
Write down EVERY table you find. This is your checklist.
Determine test scenarios needed based on query complexity
Read the appropriate template files for your data sources:
- For Glean events tables → READ assets/glean_events_fixture.yaml
- For legacy telemetry with arrays → READ assets/legacy_array_fixture.yaml
- For simple tables → READ assets/simple_test_input.yaml
- For query parameters → READ assets/query_params_example.yaml
- For UNION ALL queries → READ assets/union_all_fixture1.yaml and assets/union_all_fixture2.yaml
- For queries with TIMESTAMP fields → READ assets/timestamp_example.yaml (VERY IMPORTANT)
Create test directory and fixtures:
- For each test scenario, create a directory: tests/sql/<project>/<dataset>/<table>/<test_name>/
- Create a fixture file for EVERY source table from your step 3 checklist
- Copy the structure from the template files you read in step 5
- Each fixture needs at least one row of data
- Match the file naming to how the table is referenced in the query
Create expect.yaml and query_params.yaml (if needed)
- Use the template structures from step 5

Run the test:

pytest tests/sql/<project>/<dataset>/<table>/<test_name>/ -v

Common pytest debugging options:

# Stop on first failure (faster debugging)
pytest tests/sql/.../test_name/ -x

# Very verbose output (shows individual test details)
pytest tests/sql/.../test_name/ -vv

# Show full diff on assertion failures
pytest tests/sql/.../test_name/ -vv --tb=short

# Run specific test by pattern
pytest tests/sql/.../test_name/ -k "test_pattern"

# Show local variables in traceback
pytest tests/sql/.../test_name/ -l

# Combined: stop on first failure with very verbose output
pytest tests/sql/.../test_name/ -xvv

Fix common test failures (if test fails):

A. Timestamp format mismatches (VERY COMMON):
- If test fails with timestamp format differences like 2025-06-01T10:00:00+00:00 != 2025-06-01 10:00:00
- Update expect.yaml to use ISO 8601 format: 2025-06-01T10:00:00+00:00
- See references/common_test_failures.md section 3 for details
B. Ordering issues (VERY COMMON):
- If test fails with ordering differences (row order or map/array key order mismatch)
- Look at the "Actual:" section in pytest output
- Copy the EXACT order from "Actual:" to your expect.yaml
- This includes both row-level ordering AND nested field ordering (maps, arrays)
- See references/common_test_failures.md section 2 for detailed examples
Verify you're NOT querying production:

✅ Check pytest output for "Initialized" lines - count should match your table checklist from step 2
✅ Test should complete in <30 seconds
✅ Results should be small (< 10 rows typically)
❌ If you see thousands of rows, real production IDs, or slow execution → you're querying production!
❌ Missing "Initialized" line for a table → add that fixture file

Ask about data quality monitoring:

After tests pass successfully, check if monitoring exists:
- Check for bigconfig.yml file: sql/<project>/<dataset>/<table>/bigconfig.yml
If bigconfig.yml does NOT exist, proactively ask the user:
- "Would you like to add Bigeye monitoring/data quality checks for this table?"
Suggest relevant checks based on table type:
- Aggregation tables (DAU, MAU, etc.) → freshness, volume/anomaly detection, null checks
- Event tables → freshness, volume checks
- Derived tables → freshness, schema validation
If user agrees, use bigconfig-generator skill to create monitoring configurations
Note: Only ask if bigconfig.yml doesn't exist - don't ask if monitoring is already configured

Production Query Checklist

Before finalizing tests, verify:

I ran grep -E "FROM|JOIN" query.sql to find all source tables
I created a fixture file for each source table found
I checked pytest output for "Initialized" messages matching each source table
Test results are small and match my expect.yaml
Test runs quickly (<30 seconds)

Note: After running tests, if you need to use metadata-manager or query production tables, switch back to the main project:

export GOOGLE_PROJECT_ID=mozdata
gcloud config set project $GOOGLE_PROJECT_ID

Example Tests

Example 1: Simple Query with Join

For a query at sql/moz-fx-data-shared-prod/dataset/table_v1/query.sql that joins telemetry.clients_daily with the target table:

Test directory: tests/sql/moz-fx-data-shared-prod/dataset/table_v1/test_new_clients/

Input fixture: moz-fx-data-shared-prod.telemetry.clients_daily.yaml

- client_id: "abc123"
  submission_date: "2025-01-01"
  search_count: 5

Query params: query_params.yaml

- name: submission_date
  type: DATE
  value: "2025-01-01"

Expected output: expect.yaml

- client_id: "abc123"
  total_searches: 5
  first_seen_date: "2025-01-01"

Example 2: UNION ALL Query with Multiple Sources

For a query that unions three data sources (legacy, glean, and ads):

Test directory: tests/sql/moz-fx-data-shared-prod/dataset/table_v1/test_union_all_sources/

IMPORTANT: Create fixtures for ALL three sources in ONE test directory:

Fixture 1: moz-fx-data-shared-prod.legacy_source.table.yaml

- submission_timestamp: "2024-12-15 10:00:00"
  document_id: "doc1"
  version: "120.0.0"
  event_count: 5

Fixture 2: moz-fx-data-shared-prod.glean_source.table.yaml

- submission_timestamp: "2024-12-15 14:30:00"
  document_id: "doc2"
  client_info:
    app_display_version: "121.0.0"
  events:
    - category: "interaction"
      name: "click"

Fixture 3: moz-fx-data-shared-prod.ads_source.table.yaml

- submission_hour: "2024-12-15 10:00:00"
  ad_id: 12345
  interaction_type: "impression"
  interaction_count: 100

Query params: query_params.yaml

- name: submission_date
  type: DATE
  value: "2024-12-15"

Expected output: expect.yaml

- submission_date: "2024-12-15"
  source: "legacy"
  event_count: 5
- submission_date: "2024-12-15"
  source: "glean"
  event_count: 1
- submission_date: "2024-12-15"
  source: "ads"
  event_count: 100

Mozilla-Specific Telemetry Patterns

See example assets for templates:

assets/glean_events_fixture.yaml - Glean events with extras structure
assets/legacy_array_fixture.yaml - Legacy telemetry with array fields

Glean Events with Extras

See assets/glean_events_fixture.yaml for complete example. Events use nested structure:

Access via events.category, events.name
Extra fields via mozfun.map.get_key(events.extra, 'field_name')

Legacy Telemetry Arrays

See assets/legacy_array_fixture.yaml. Arrays get unnested with CROSS JOIN UNNEST(tiles).

Version Numbers

Always use three-part versions ("121.0.0") to avoid YAML float parsing. See references/yaml_format_guide.md.

Common Errors

See references/common_test_failures.md for real-world examples and solutions. See references/yaml_format_guide.md and references/preventing_production_queries.md for detailed guides.

1. NULL Fields in expect.yaml ⚠️ COMMON

Including field: null in expect.yaml causes test failures because BigQuery omits NULL fields from results. Solution: Completely omit NULL fields from expect.yaml - don't include them at all. See references/common_test_failures.md section 1.

2. Non-Deterministic ORDER BY

When ORDER BY fields have duplicate values, row order becomes non-deterministic and tests fail. Solution: Either create test data with different ORDER BY values, or run test to see actual order and update expect.yaml. See references/common_test_failures.md section 2.

3. Type Inference Issues

Version numbers like "120.0" parsed as FLOAT64 instead of STRING. Solution: Use three-part versions: "120.0.0" or simple integers: "120"

4. Empty Fixture Arrays

Creating [] causes "Schema has no fields" errors. Solution: Always include at least one row. To filter out data, use WHERE clause filtering.

5. Missing Fixtures Query Production ⚠️ CRITICAL

If ANY source table lacks a fixture, query hits production BigQuery. Symptoms: Thousands of rows, real production values, slow execution (>10 seconds) Solution: Create fixtures for ALL source tables. See references/preventing_production_queries.md

6. NULL Field Handling (Duplicate - see #1 above)

Omit NULL fields from expect.yaml - BigQuery doesn't return them in results.

Full troubleshooting: https://mozilla.github.io/bigquery-etl/cookbooks/testing/

Integration with Other Skills

sql-test-generator is invoked by other skills when test fixtures need to be created or updated:

Works with bigquery-etl-core

References core skill for test structure, naming conventions, and query patterns
Uses common table patterns for test creation
Invocation: Automatically available as foundation skill

Invoked by metadata-manager

When queries are modified: metadata-manager detects when test fixtures need updating
metadata-manager delegates to sql-test-generator for:
- New source tables added to query (JOINs, new FROM clauses)
- Tests querying production (thousands of rows in output)
- Creating new test scenarios for new logic
metadata-manager handles simple updates (expect.yaml changes) directly
Invocation pattern: metadata-manager explicitly invokes sql-test-generator with "Use sql-test-generator skill" instructions

Works with query-writer

After query-writer creates queries: Use sql-test-generator to create test fixtures
Analyzes query structure to identify all source tables
Creates comprehensive test coverage
Invocation: After query.sql is complete, invoke sql-test-generator to create tests

Coordinates with bigconfig-generator

After tests pass: Check if bigconfig.yml exists for the table
If bigconfig.yml does NOT exist, proactively ask user if they want data quality monitoring
Suggests relevant monitoring checks based on table type (aggregation, event, derived)
If user agrees, delegate to bigconfig-generator skill to create Bigeye configurations
Workflow: sql-test-generator → check for bigconfig.yml → ask user → bigconfig-generator (if approved)

Typical Invocation Scenarios

Direct invocation by user:

User asks: "Generate tests for query X"
User asks: "Create test fixtures for table Y"

Invoked by metadata-manager:

Query modified with new JOINs → metadata-manager invokes sql-test-generator
Tests showing production data → metadata-manager invokes sql-test-generator
New test scenarios needed → metadata-manager invokes sql-test-generator

After query-writer:

New query.sql created → invoke sql-test-generator to create initial tests

Key Capabilities

Prevents production queries:

Ensures ALL source tables have fixtures (critical for UNION/UNION ALL queries)
Detects missing fixtures via "Initialized" lines in pytest output
Provides clear symptoms of production query issues

Handles complex queries:

UNION ALL with multiple sources
Nested structures and complex types
YAML type inference issues (version numbers, etc.)

Provides best practices:

Fixture naming conventions
Test organization (one comprehensive test vs multiple)
Production query detection and prevention