bigquery
BigQuery CLI Skill
You are a BigQuery specialist using the bigquery CLI tool. This skill provides comprehensive guidance for working with Google BigQuery through a unified TypeScript/Bun CLI with query execution, cost awareness, and MCP server integration.
Core Capabilities
The bigquery CLI provides:
- Authentication: Check status and gcloud integration
- Query Execution: Run SQL queries with automatic cost estimation and confirmation prompts
- Dry Run: Estimate query costs ($0 to run)
- Dataset Operations: List, create, update, delete, describe datasets
- Table Operations: List, describe, insert, load, extract, create, update, delete, copy tables
- Job Management: List, get, and cancel BigQuery jobs
- MCP Server: Built-in stdio and HTTP modes for AI integration (read-only by default,
--enable-writesflag) - Streaming Support: Native
--streamflag for large result sets (outputs JSONL) - Output Formats: JSON (default), JSONL (streaming), and text (human-readable)
Authentication
Check Authentication Status
# Check if authenticated and verify required scopes
bigquery auth check
# Output shows:
# - Authentication status
# - Active account
# - Token expiration
# - BigQuery scopes availability
Authentication Methods
Uses Google Cloud SDK (gcloud) authentication:
- Application Default Credentials (ADC)
- OAuth access tokens
- Service account keys (via
GOOGLE_APPLICATION_CREDENTIALS)
No separate BigQuery authentication required - uses existing gcloud credentials.
Best Practice: Always run bigquery auth check first to verify authentication before operations.
Output Format Defaults
Different commands have different default output formats:
| Command | Default Format | Notes |
|---|---|---|
query |
JSON | Machine-readable for pipelines, includes metadata |
datasets list |
text | Human-readable |
tables list |
text | Human-readable |
tables describe |
text | Human-readable |
jobs list |
text | Human-readable |
All commands support --format json or --format text to override defaults.
Available Formats:
- JSON: Structured output with metadata (rows, bytesProcessed, cost, cacheHit)
- Text: Human-readable formatted output
- JSONL: Newline-delimited JSON (one object per line), ideal for streaming and pipelines
Query Operations
Running Queries
# Basic query execution (automatic dry-run + cost confirmation)
bigquery query "SELECT * FROM dataset.table LIMIT 10"
# Skip cost confirmation for automation
bigquery query --yes "SELECT COUNT(*) FROM dataset.table"
# JSON output (default, includes full metadata)
bigquery query "SELECT * FROM dataset.table LIMIT 5"
# JSONL output (one JSON object per line, ideal for pipelines)
bigquery query "SELECT * FROM dataset.table" --jsonl
# Stream large results (implies JSONL format)
bigquery query "SELECT * FROM large_table" --stream > results.jsonl
# Text format (human-readable)
bigquery query --format text "SELECT * FROM dataset.table LIMIT 5"
# Query from file
bigquery query --file query.sql
# Query from stdin
cat query.sql | bigquery query --stdin --yes
# Custom project
bigquery query --project my-other-project "SELECT 1"
Cost Awareness: The query command automatically:
- Runs a dry-run to estimate cost before execution (costs $0)
- Displays bytes to be processed and estimated cost
- Prompts for confirmation if cost exceeds threshold (default: 1 GB)
- Skips confirmation for queries below threshold or when
--yesis used - Includes cost metadata in response (bytesProcessed, estimatedCostUSD, cacheHit)
Query Options
# Skip cost confirmation (REQUIRED for automation/scripts)
bigquery query --yes "SELECT * FROM dataset.table"
# Set custom cost threshold (default: 1 GB)
bigquery query --cost-threshold 5 "SELECT * FROM large_table"
# Environment variable for persistent threshold
export BIGQUERY_COST_THRESHOLD_GB=5.0
# Limit rows returned (does NOT reduce cost/bytes scanned)
bigquery query "SELECT * FROM table" --max-results 100
# Stream large results (outputs JSONL)
bigquery query "SELECT * FROM large_table" --stream > results.jsonl
Query Flags:
| Flag | Description |
|---|---|
--yes / -y |
Skip cost confirmation prompt (for automation) |
--format <format> |
Output format: json (default), text, jsonl |
--jsonl |
Shorthand for --format jsonl |
--stream |
Stream results as they arrive (implies JSONL) |
--cost-threshold <gb> |
Cost confirmation threshold in GB (default: 1) |
--max-results <n> |
Max rows to return (does NOT reduce cost) |
--file <path> |
Read query from file |
--stdin |
Read query from stdin |
-p, --project <id> |
GCP project ID |
Important Notes:
--max-resultsonly limits returned rows, NOT bytes scanned (no cost reduction)- Use
LIMITin SQL to reduce scanned data - Use
--yesfor automation (not--force, which doesn't exist)
Query Output Formats
# JSON output (default, machine-readable)
bigquery query "SELECT * FROM dataset.table"
# Returns:
# {
# "rows": [...],
# "totalRows": 123,
# "bytesProcessed": 1048576,
# "bytesProcessedGB": 0.001,
# "estimatedCostUSD": 0.00000625,
# "cacheHit": false
# }
# JSONL output (one JSON object per line)
bigquery query "SELECT * FROM dataset.table" --jsonl
# Returns:
# {"col1":"value1","col2":123}
# {"col1":"value2","col2":456}
# Text output (human-readable table)
bigquery query --format text "SELECT * FROM dataset.table"
Dry Run (Cost Estimation)
# Estimate cost without executing
bigquery dry-run "SELECT * FROM large_dataset.table WHERE date >= '2025-01-01'"
# Returns (JSON):
# {
# "bytesProcessed": "1073741824",
# "bytesProcessedGB": 1.0,
# "bytesProcessedMB": 1024.0,
# "estimatedCostUSD": 0.00625
# }
# Text format
bigquery dry-run "SELECT * FROM large_table" --format text
# Returns:
# Dry-run Results:
# ──────────────────────────────────────────────────
# Bytes to process: 1.00 GB
# Estimated cost: $0.0063
Use dry-run to:
- Estimate costs before running expensive queries
- Validate query syntax
- Check partition pruning effectiveness
- Test queries in CI/CD pipelines
Cost Formula: (bytesProcessed / 1TB) * $6.25
Dataset Operations
Listing Datasets
# List datasets in current project (text format, default)
bigquery datasets list
# List datasets in another project
bigquery datasets list --project other-project
# JSON output
bigquery datasets list --format json
# Example output shows:
# - Dataset ID
# - Location
# - Creation time
# - Labels (if any)
Describing Datasets
# Show dataset metadata
bigquery datasets describe project.dataset
# JSON output
bigquery datasets describe project.dataset --format json
Creating Datasets
# Create dataset in default location
bigquery datasets create my-project.new_dataset
# Create with description and location
bigquery datasets create my-project.new_dataset \
--description "Analytics data warehouse" \
--location US
# Create with default table expiration (30 days)
bigquery datasets create my-project.temp_dataset \
--default-ttl 30 \
--location US
# Create with labels
bigquery datasets create my-project.new_dataset \
--labels "env=prod,team=analytics"
Updating Datasets
# Update description
bigquery datasets update my-project.existing_dataset \
--description "Updated description"
# Update default table expiration
bigquery datasets update my-project.existing_dataset \
--default-ttl 30
# Add/update labels
bigquery datasets update my-project.existing_dataset \
--labels env=staging \
--labels team=data
Deleting Datasets
# Delete empty dataset (prompts for confirmation)
bigquery datasets delete my-project.old_dataset
# Delete non-empty dataset (includes all tables)
bigquery datasets delete my-project.old_dataset --force
# Skip confirmation (for automation)
bigquery datasets delete my-project.old_dataset --force --yes
Table Operations
Listing Tables
# List tables in a dataset (text format, first 10)
bigquery tables list my-project.my-dataset
# JSON output
bigquery tables list my-project.my-dataset --format json
# With pagination
bigquery tables list my-project.my-dataset --max-results 20 --page-token <token>
Describing Table Schema
# Show table schema and metadata (text format)
bigquery tables describe my-project.my-dataset.my-table
# JSON output
bigquery tables describe my-project.my-dataset.my-table --format json
# Output includes:
# - Column names and types
# - Nullability (NULLABLE, REQUIRED, REPEATED)
# - Mode information
# - Table metadata (row count, size, location)
Creating Tables
# Create table with JSON schema
bigquery tables create my-project.dataset.users \
--schema '[{"name":"id","type":"STRING"},{"name":"email","type":"STRING"}]' \
--description "User data"
Updating Tables
# Update description
bigquery tables update my-project.dataset.table --description "Updated description"
# Set expiration (30 days in seconds)
bigquery tables update my-project.dataset.table --expiration 2592000
# Add labels
bigquery tables update my-project.dataset.table \
--labels owner=team-data \
--labels environment=production
# Require partition filter for queries
bigquery tables update my-project.dataset.table --require-partition-filter true
Copying Tables
# Copy table within same project
bigquery tables copy my-project.source.table my-project.dest.table_copy
# Copy to different dataset
bigquery tables copy my-project.source.table my-project.archive.table_backup
# Overwrite destination if exists
bigquery tables copy my-project.source.table my-project.dest.existing_table \
--write-disposition WRITE_TRUNCATE
Deleting Tables
# Delete table
bigquery tables delete my-project.dataset.old_table
Inserting Rows (Small Datasets)
Best for <1000 rows. Uses streaming insert API for immediate availability.
JSONL (Newline-Delimited JSON) Format
From JSONL File:
# Create sample JSONL file
cat > users.jsonl <<EOF
{"id": "1", "name": "Alice Johnson", "email": "alice@example.com", "age": 30}
{"id": "2", "name": "Bob Smith", "email": "bob@example.com", "age": 25}
{"id": "3", "name": "Charlie Brown", "email": "charlie@example.com", "age": 35}
EOF
# Insert from JSONL file
bigquery tables insert my-project.dataset.users users.jsonl --format jsonl
From JSONL Stream (stdin):
# Stream from command output
echo '{"id": "1", "name": "Alice", "email": "alice@example.com"}' | \
bigquery tables insert my-project.dataset.users - --format jsonl
# Stream from heredoc
cat << EOF | bigquery tables insert my-project.dataset.users - --format jsonl
{"id": "1", "name": "Alice", "email": "alice@example.com", "age": 30}
{"id": "2", "name": "Bob", "email": "bob@example.com", "age": 25}
{"id": "3", "name": "Charlie", "email": "charlie@example.com", "age": 35}
EOF
# Stream from application output
my-etl-tool --output jsonl | bigquery tables insert my-project.dataset.events -
# Stream from jq transformation
cat raw_data.json | jq -c '.records[]' | \
bigquery tables insert my-project.dataset.processed -
JSONL Format Requirements:
- Each line is a separate JSON object
- Empty lines are automatically skipped
- No commas between objects
- Ideal for streaming and large datasets
- Format:
{"field1":"value1","field2":"value2"}\n
Additional Insert Options
# Skip invalid rows instead of failing
bigquery tables insert my-project.dataset.users users.jsonl --skip-invalid-rows
# Ignore unknown fields in data
bigquery tables insert my-project.dataset.users users.jsonl --ignore-unknown-values
# Combine options for production pipelines
cat production_data.jsonl | \
bigquery tables insert my-project.dataset.production - \
--format jsonl \
--skip-invalid-rows \
--ignore-unknown-values
Insert Options:
--format <FORMAT>: Data format (json or jsonl)--skip-invalid-rows: Skip invalid rows instead of failing--ignore-unknown-values: Ignore unknown fields in data
Loading Data (Large Datasets)
Best for >10MB files or >1000 rows. Uses BigQuery load jobs.
# Load from Cloud Storage URI (RECOMMENDED)
bigquery tables load my-project.dataset.users \
gs://my-bucket/data.csv --format csv
# Load with schema auto-detection
bigquery tables load my-project.dataset.new_table \
gs://my-bucket/data.csv --format csv --autodetect
# Load with replace write disposition (truncates table first)
bigquery tables load my-project.dataset.users \
gs://my-bucket/data.csv --format csv --write-disposition WRITE_TRUNCATE
# Load JSON file
bigquery tables load my-project.dataset.events \
gs://my-bucket/events.json --format json
# Supported formats: csv, json, jsonl, avro, parquet
bigquery tables load my-project.dataset.table \
gs://my-bucket/data.parquet --format parquet
# CSV with skip leading rows
bigquery tables load my-project.dataset.table \
gs://my-bucket/data.csv --format csv --skip-leading-rows 1
# Append to existing table (default)
bigquery tables load my-project.dataset.table \
gs://my-bucket/data.json --format json --write-disposition WRITE_APPEND
Load Options:
--format <FORMAT>: Source format (csv, json, jsonl, parquet, avro)--autodetect: Auto-detect schema from source--skip-leading-rows <N>: Skip N leading rows (CSV)--write-disposition <MODE>: WRITE_TRUNCATE (replace), WRITE_APPEND (default), WRITE_EMPTY--create-disposition <MODE>: CREATE_IF_NEEDED (default), CREATE_NEVER
When to Use:
- Large datasets (>1000 rows or >10MB)
- Data already in Cloud Storage
- Bulk data migrations
When NOT to Use:
- Small datasets (<1000 rows) → Use
tables insertinstead
Extracting Data
Export table data to Cloud Storage in various formats:
# Extract table to Cloud Storage as CSV
bigquery tables extract my-project.dataset.users \
gs://my-bucket/exports/users.csv --format csv
# Extract as JSON (newline-delimited)
bigquery tables extract my-project.dataset.events \
gs://my-bucket/exports/events-*.json --format json
# Extract with compression
bigquery tables extract my-project.dataset.large_table \
gs://my-bucket/exports/data-*.csv.gz --format csv --compression GZIP
# Extract as Parquet
bigquery tables extract my-project.dataset.analytics \
gs://my-bucket/exports/analytics.parquet --format parquet
# CSV with header
bigquery tables extract my-project.dataset.data \
gs://my-bucket/data.csv --format csv --print-header
Supported Formats: CSV, JSON (newline-delimited), Parquet, Avro Compression: NONE (default), GZIP
Job Management
BigQuery jobs are asynchronous operations for queries, loads, exports, and copies.
Listing Jobs
# List recent jobs
bigquery jobs list
# List with pagination
bigquery jobs list --max-results 20 --page-token <token>
# Filter by state
bigquery jobs list --state-filter running
bigquery jobs list --state-filter done
# Show jobs from all users
bigquery jobs list --all-users
# JSON output
bigquery jobs list --format json
Showing Job Details
# Show job details
bigquery jobs get job_abc123xyz
# JSON output
bigquery jobs get job_abc123xyz --format json
Canceling Jobs
# Cancel a running job
bigquery jobs cancel job_abc123xyz
MCP Server Integration
The BigQuery MCP server provides AI integration via Model Context Protocol.
Starting MCP Server
STDIO Mode (for local clients):
# Start MCP server in stdio mode (read-only)
bigquery mcp stdio
# Start with write operations enabled
bigquery mcp stdio --enable-writes
# Server will:
# - Accept MCP protocol messages on stdin
# - Send responses on stdout
# - Expose BigQuery tools to MCP clients
HTTP Mode (for network clients):
# Start HTTP MCP server on default port 8080
bigquery mcp http
# Specify custom port
bigquery mcp http --port 3000
# With OAuth and email domain restriction
bigquery mcp http \
--google-client-id <id> \
--google-client-secret <secret> \
--domain example.com
# Enable write operations
bigquery mcp http --enable-writes
# Server provides:
# - HTTP endpoint for MCP protocol
# - JSON-RPC over HTTP
# - Remote access to BigQuery tools
MCP Configuration
Configure in Claude Code or other MCP-enabled applications:
STDIO Mode (.claude/mcp.json or similar):
{
"mcpServers": {
"bigquery": {
"command": "bigquery",
"args": ["mcp", "stdio"],
"env": {
"GOOGLE_CLOUD_PROJECT": "my-project"
}
}
}
}
STDIO Mode with Writes Enabled:
{
"mcpServers": {
"bigquery": {
"command": "bigquery",
"args": ["mcp", "stdio", "--enable-writes"],
"env": {
"GOOGLE_CLOUD_PROJECT": "my-project"
}
}
}
}
HTTP Mode:
{
"mcpServers": {
"bigquery": {
"url": "http://localhost:8080",
"transport": "http"
}
}
}
Common Workflows
Workflow 1: Exploratory Data Analysis
# 1. Verify authentication
bigquery auth check
# 2. List available datasets
bigquery datasets list
# 3. List tables in dataset
bigquery tables list my-project.analytics
# 4. Check table schema
bigquery tables describe my-project.analytics.events
# 5. Preview data (text format for readability)
bigquery query --format text \
"SELECT * FROM my-project.analytics.events LIMIT 10"
# 6. Get row count
bigquery query "SELECT COUNT(*) as total FROM my-project.analytics.events"
# 7. Check data distribution
bigquery query --format text "
SELECT
DATE(timestamp) as date,
COUNT(*) as events
FROM my-project.analytics.events
GROUP BY date
ORDER BY date DESC
LIMIT 30
"
Workflow 2: Cost-Aware Query Development
# 1. Dry run to estimate cost
bigquery dry-run "
SELECT *
FROM my-project.large_dataset.table
WHERE date >= '2025-01-01'
"
# 2. If cost is acceptable, run query
bigquery query "
SELECT *
FROM my-project.large_dataset.table
WHERE date >= '2025-01-01'
"
# 3. For automation, skip confirmation
bigquery query --yes "
SELECT *
FROM my-project.large_dataset.table
WHERE date >= '2025-01-01'
" > results.json
Workflow 3: Data Loading Pipeline
# 1. Load initial data
bigquery tables load my-project.dataset.events \
gs://bucket/events-2025-01-01.csv \
--format csv \
--write-disposition WRITE_TRUNCATE
# 2. Append incremental data
bigquery tables load my-project.dataset.events \
gs://bucket/events-2025-01-02.csv \
--format csv \
--write-disposition WRITE_APPEND
# 3. Verify data loaded
bigquery query "
SELECT
DATE(timestamp) as date,
COUNT(*) as count
FROM my-project.dataset.events
GROUP BY date
ORDER BY date
"
Workflow 4: Real-Time Data Insertion
# 1. Stream JSONL from application
my-app --output jsonl | bigquery tables insert my-project.dataset.events -
# 2. Stream with transformation and error handling
cat raw_events.json | jq -c '.events[]' | \
bigquery tables insert my-project.dataset.events - \
--skip-invalid-rows \
--ignore-unknown-values
Workflow 5: Streaming Large Results
# 1. Stream query results to JSONL file
bigquery query "SELECT * FROM my-project.dataset.large_table" \
--stream > results.jsonl
# 2. Process with DuckDB
duckdb -c "SELECT user_id, COUNT(*) FROM read_json_auto('results.jsonl') GROUP BY user_id"
# 3. Or process line-by-line
cat results.jsonl | while IFS= read -r line; do
echo "$line" | jq '.field'
done
Append-Only Raw Table Pattern
Use this architecture when a table needs to track evolving state over time without destructive updates.
Structure:
table_raw— append-only write target; never UPDATE or DELETE rows in-placetable(VIEW) — the read surface; deduplicates raw viaROW_NUMBER() OVER (PARTITION BY <natural_key> ORDER BY loaded_at DESC) = 1
Write rules:
- All INSERT operations target
table_rawonly; never INSERT into the view (BigQuery rejects DML on views) - Include a
loaded_at TIMESTAMP REQUIREDcolumn on the raw table; always set it toCURRENT_TIMESTAMP()at write time - To "update" a record: INSERT a new row with corrected values and a fresh
loaded_at; the view surfaces the newest row automatically - Validate domain constraints before writing (e.g., expected ID prefixes, enum values, FK existence); skip invalid rows with a warning rather than failing the batch
Read rules:
- Always query the view, never the raw table directly
- Apply domain-validity guards in the view's inner WHERE clause to prevent corrupt rows from surfacing even if they reach the raw table
Integrity checks:
- Verify periodically: row count on view ==
COUNT(DISTINCT <natural_key>)on raw (filtered to valid rows only) - Within-version duplicates (same
(<natural_key>, version_tag)appearing more than once in raw) indicate a write path ran more than once; clean withMERGE … WHEN NOT MATCHED BY SOURCE THEN DELETEusing aMAX(loaded_at)keeper subquery - Foreign key validation: LEFT JOIN the raw table to the referenced source table; rows where the join produces NULL are orphans and should be investigated or deleted
Best Practices
Query Development
- Always dry-run first: Use
bigquery dry-runto estimate costs - Validate before running: Check syntax and cost before execution
- Use text format for exploration:
--format textfor human-readable tables - Use JSON for automation:
--format jsonfor machine processing - Use JSONL for streaming:
--jsonlor--streamfor large results - Skip confirmations in scripts: Use
--yesflag for automation
Cost Management
- Dry run expensive queries: Always estimate with
bigquery dry-run - Monitor bytes processed: Check query cost estimates before running
- Use partition pruning: Filter on partitioned columns in WHERE clauses
- Limit result sets: Use LIMIT for exploratory queries
- Select only needed columns:
SELECT col1, col2notSELECT * - Set cost thresholds: Use
BIGQUERY_COST_THRESHOLD_GBenvironment variable
Table Architecture
- Raw + view split: Use an append-only
*_rawtable for writes and a dedup VIEW for reads; never write to the view, never read from raw directly loaded_atas dedup key: Theloaded_at TIMESTAMP REQUIREDcolumn determines which row wins in the view; always writeCURRENT_TIMESTAMP()at INSERT time- Updates via INSERT: To correct a row, INSERT a new version with corrected values and a fresh
loaded_at; do not UPDATE the raw table in-place - Validate before INSERT: Check domain constraints (ID format, enum values, FK existence) before writing; skip bad rows with a logged warning rather than allowing them to corrupt the raw table
- Guard the view: Add a WHERE clause to the view's inner query enforcing domain invariants as a last-resort filter against rows that bypass write-time validation
- Never target a view for DML: BigQuery rejects INSERT/UPDATE/DELETE on views; always use the underlying raw table name in write code paths
Authentication
- Check auth first: Run
bigquery auth checkbefore operations - Use service accounts: For automation and CI/CD
- Verify scopes: Ensure all required BigQuery scopes are granted
- Re-authenticate when needed: If check fails
Data Loading
- Choose the right method:
- Use
insertfor <1000 rows (streaming insert API, immediate availability) - Use
loadfor >10MB files or >1000 rows (load jobs with Cloud Storage)
- Use
- Use JSONL for streaming: Newline-delimited JSON is ideal for streaming pipelines
- Stream from stdin: Use
-as file argument to pipe data from applications - Handle bad records: Use
--skip-invalid-rowsfor messy data - Choose write disposition:
WRITE_TRUNCATEfor full refresh,WRITE_APPENDfor incremental - Use appropriate formats: CSV for simple data, JSON/JSONL for complex, Parquet for large datasets
Output Format Selection
- JSON (
--format json): Default, full metadata, parsing withjq - Text (
--format text): Human-readable, terminal inspection - JSONL (
--jsonlor--stream): Streaming, pipelines, DuckDB ingestion
Examples:
# Parse with jq
bigquery query "SELECT * FROM table" | jq '.rows[] | select(.amount > 100)'
# Stream to file
bigquery query "SELECT * FROM large_table" --stream > data.jsonl
# Load into DuckDB
bigquery query "SELECT * FROM table" --jsonl | \
duckdb -c "SELECT * FROM read_json_auto('/dev/stdin')"
MCP Server
- Use stdio for local: Prefer stdio mode for local MCP clients
- Use HTTP for remote: Use HTTP mode for networked deployments
- Read-only by default: Only enable writes when needed (
--enable-writes) - Secure HTTP endpoints: Put HTTP server behind authentication/firewall
- Monitor server logs: Check for errors and performance issues
Configuration
Environment Variables
# Set default project
export GOOGLE_CLOUD_PROJECT=my-project
# Set cost threshold (in GB, default: 1)
export BIGQUERY_COST_THRESHOLD_GB=5.0
# Set credentials (for service accounts)
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json
# Add to ~/.zshrc or ~/.bashrc for persistence
echo 'export GOOGLE_CLOUD_PROJECT=my-project' >> ~/.zshrc
echo 'export BIGQUERY_COST_THRESHOLD_GB=5.0' >> ~/.zshrc
Authentication Methods
User Credentials (interactive):
gcloud auth application-default login
bigquery auth check
Service Account (automation):
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/sa-key.json
bigquery auth check
Troubleshooting
Issue: "Not authenticated" or "Permission denied"
Solution: Check authentication and scopes
# Check current auth status
bigquery auth check
# Re-authenticate if needed
gcloud auth application-default login
# Verify gcloud is set to correct project
gcloud config get-value project
# Set project if needed
gcloud config set project my-project
Issue: "Table not found"
Solution: Use fully qualified table names
# Wrong - missing project/dataset
bigquery query "SELECT * FROM table"
# Correct - fully qualified
bigquery query "SELECT * FROM my-project.my-dataset.my-table"
# Or use backticks for reserved words
bigquery query "SELECT * FROM \`my-project.my-dataset.my-table\`"
Issue: "Query too expensive"
Solution: Check cost with dry-run and optimize
# Check estimated cost
bigquery dry-run "SELECT * FROM large_table WHERE date >= '2025-01-01'"
# Optimize with partition filters
bigquery dry-run "
SELECT * FROM large_table
WHERE _PARTITIONDATE = '2025-01-15'
"
Quick Reference
# Authentication
bigquery auth check # Check auth status
# Queries (default: JSON output)
bigquery query "SELECT ..." # Execute query (JSON)
bigquery query --yes "SELECT ..." # Skip confirmation
bigquery query --format text "SELECT ..." # Human-readable table
bigquery query --jsonl "SELECT ..." # JSONL output
bigquery query --stream "SELECT ..." # Stream large results
bigquery dry-run "SELECT ..." # Estimate cost
# Datasets
bigquery datasets list # List datasets
bigquery datasets describe PROJECT.DATASET # Describe dataset
bigquery datasets create PROJECT.DATASET # Create dataset
bigquery datasets update PROJECT.DATASET --description "..." # Update dataset
bigquery datasets delete PROJECT.DATASET # Delete dataset
# Tables - Read Operations
bigquery tables list PROJECT.DATASET # List tables
bigquery tables describe TABLE # Show schema
# Tables - Write Operations
bigquery tables create TABLE --schema "..." # Create table
bigquery tables insert TABLE file.jsonl --format jsonl # Insert from file
cat data.jsonl | bigquery tables insert TABLE - # Stream insert
bigquery tables load TABLE gs://bucket/file.csv # Bulk load
bigquery tables copy SOURCE DEST # Copy table
bigquery tables delete TABLE # Delete table
# Tables - Extract
bigquery tables extract TABLE gs://bucket/output.csv # Export to GCS
# Jobs
bigquery jobs list # List jobs
bigquery jobs get JOB_ID # Job details
bigquery jobs cancel JOB_ID # Cancel job
# MCP Server
bigquery mcp stdio # MCP server (stdio, read-only)
bigquery mcp stdio --enable-writes # MCP server (stdio, with writes)
bigquery mcp http --port 3000 # MCP server (HTTP)
Summary
Primary commands:
bigquery auth check- Authentication managementbigquery query- Execute SQL with automatic cost awarenessbigquery dry-run- Estimate query costs ($0 to run)bigquery datasets {list,describe,create,update,delete}- Dataset operationsbigquery tables {list,describe,insert,load,extract,create,update,delete,copy}- Table operationsbigquery jobs {list,get,cancel}- Job managementbigquery mcp {stdio,http}- MCP server modes
Key features:
- Cost-aware query execution with automatic dry-run and confirmation prompts
- Configurable cost thresholds (
BIGQUERY_COST_THRESHOLD_GB) - Streaming insert API for real-time data (<1000 rows)
- Bulk load from Cloud Storage (>10MB files)
- Native streaming support for large results (
--streamflag outputs JSONL) - Built-in MCP server for AI integration (stdio and HTTP modes)
- Three output formats: JSON (default, full metadata), JSONL (streaming), text (human-readable)
- Instant startup with TypeScript/Bun (~10ms vs ~500ms for Python
bq)
More from lanej/dotfiles
jq
JSON processing, parsing, and manipulation. STRONGLY PREFERRED for all JSON formatting, filtering, transformations, and analysis. Use instead of Python/Node.js scripts for JSON operations.
58xlsx
Use xlsx binary for Excel file manipulation including viewing, SQL-like filtering, cell editing, conversion to/from CSV, and data analysis operations.
29az
Use az CLI for Azure cloud resource management, Azure DevOps operations, VMs, storage, networking, AKS, and Key Vault with comprehensive authentication and output control.
24gspace
Use gspace CLI and MCP tools for Google Workspace operations (Drive, Gmail, Docs, Sheets, Calendar, Tasks). Use when working with Google Workspace URLs (docs.google.com, drive.google.com, sheets.google.com, slides.google.com, mail.google.com), Google Drive file IDs, or any Google Workspace file/email/calendar operations. Supports both CLI commands (via Bash) and 40+ MCP tools.
22pkm
Use pkm for personal knowledge management with temporal awareness, quality filtering, hybrid search, and relationship tracking with LSP and MCP server integration.
20claude-cli
Use claude CLI for interactive AI sessions, scripting with print mode, MCP server management, and plugin configuration. Master session management, tool control, and automation workflows.
19