sm-bigquery-analyst
SKILL.md
SourceMedium BigQuery Analyst
Use this skill to help end users work with SourceMedium BigQuery data from setup to analysis.
Workflow
- Verify environment (run these before any analysis)
- Confirm project and dataset/table visibility
- Use docs-first guidance for definitions and table discovery
- Answer analytical questions with reproducible SQL receipts
- Call out assumptions and caveats explicitly
Setup Verification
Run these commands in order before writing analysis SQL:
# 1. Check CLI tools are installed
gcloud --version && bq version
# 2. Check authenticated account
gcloud auth list
# 3. Check active project
gcloud config get-value project
# 4. Validate BigQuery API access (dry-run)
bq query --use_legacy_sql=false --dry_run 'SELECT 1 AS ok'
# 5. Test table access (your project is named sm-<tenant_id>)
# Example: if your tenant is "acme-corp", your project is sm-acme-corp
bq query --use_legacy_sql=false --dry_run "
SELECT 1
FROM \`sm-<tenant_id>.sm_transformed_v2.obt_orders\`
LIMIT 1
"
# 6. Confirm you can actually read data (not just dry-run)
bq query --use_legacy_sql=false "
SELECT 1
FROM \`sm-<tenant_id>.sm_transformed_v2.obt_orders\`
WHERE is_order_sm_valid = TRUE
LIMIT 1
"
If any step fails, see references/TROUBLESHOOTING.md and guide the user to request access.
Safety Rules
These are hard constraints. Do not bypass.
Query Safety
- SELECT-only — deny: INSERT, UPDATE, DELETE, MERGE, CREATE, DROP, EXPORT, COPY
- Dry-run first when iterating on new queries:
bq query --use_legacy_sql=false --dry_run '<SQL>' - Enforce cost limit with maximum bytes billed:
(1GB = 1073741824 bytes. If it fails due to bytes billed, tighten filters or ask for approval.)bq query --use_legacy_sql=false --maximum_bytes_billed=1073741824 '<SQL>' - Always bound queries:
- Add
LIMITclause (max 100 rows for exploratory) - Use date/partition filters when querying partitioned tables
- Prefer
WHEREfilters on partition columns
- Add
Data Safety
- Default to aggregates — avoid outputting raw rows unless explicitly requested
- PII handling:
- Do not output columns likely containing PII (email, phone, address, name) without explicit confirmation
- If PII is requested, confirm scope and purpose before proceeding
- Prefer anonymization. Example:
-- Hash PII instead of exposing raw values SELECT TO_HEX(SHA256(LOWER(email))) AS email_hash, ...
Cost Guardrails
-- Good: bounded scan
SELECT ... FROM `sm-<tenant_id>.sm_transformed_v2.obt_orders`
WHERE DATE(order_processed_at_local_datetime) >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
LIMIT 100
-- Bad: full table scan
SELECT ... FROM `sm-<tenant_id>.sm_transformed_v2.obt_orders` -- no filters
Output Contract
For analytical questions, always return:
- Answer — concise plain-English conclusion
- SQL (copy/paste) — BigQuery Standard SQL used for the result
- Notes — timeframe, metric definitions, grain, scope, timezone, attribution lens
- Verify —
bq query --use_legacy_sql=false --dry_run '<SQL>'command - Bytes scanned — if >1GB, note this and ask for approval before running
If access/setup fails, do not fabricate results. Return:
- Exact failing step
- Exact project/dataset that failed
- Direct user to
assets/BIGQUERY_ACCESS_REQUEST_TEMPLATE.md
Query Guardrails
- Fully qualify tables as
`sm-<tenant_id>.dataset.table` - For order analyses, default to
WHERE is_order_sm_valid = TRUE - Use
sm_store_id(notsmcid— that name does not exist in customer tables) - Use
SAFE_DIVIDEfor ratio math - Handle DATE/TIMESTAMP typing explicitly (
DATE(ts_col)when comparing to dates) - Use
order_net_revenuefor revenue metrics (notorder_gross_revenueunless explicitly asked) - Prefer
*_local_datetimecolumns when available for date-based reporting; otherwise be explicit about UTC vs local - For enumerations (channel, platform, status), discover values with
SELECT DISTINCTfirst, then use exact match. ReserveLIKE/REGEXPfor free-text fields (utm_campaign,product_title,page_path) - LTV tables (
rpt_cohort_ltv_*): always filtersm_order_line_typeto exactly ONE value
References
references/SCHEMA.md— key tables, grains, columns, and naming conventionsreferences/QUERY_PATTERNS.md— common SQL patterns and LTV/cohort rulesreferences/TROUBLESHOOTING.md— auth, permission, and API issuesassets/BIGQUERY_ACCESS_REQUEST_TEMPLATE.md— copy/paste request for users without access
Weekly Installs
6
Repository
source-medium/skillsFirst Seen
Feb 12, 2026
Security Audits
Installed on
gemini-cli6
github-copilot6
codex6
cursor6
opencode6
amp4