bigquery-analytics
SKILL.md
BigQuery Best Practices
Table Design
Partitioning (REQUIRED for tables > 1GB)
CREATE TABLE `project.dataset.events` (
event_id STRING,
shop_id STRING,
event_type STRING,
created_at TIMESTAMP,
data JSON
)
PARTITION BY DATE(created_at)
CLUSTER BY shop_id, event_type;
| Data Size | Partition By |
|---|---|
| < 1GB | Not needed |
| 1GB - 1TB | DATE/TIMESTAMP |
| > 1TB | DATE + consider sharding |
Query Patterns
Always Use Partition Filter
-- ❌ BAD: No partition filter (full scan)
SELECT * FROM events WHERE shop_id = 'shop_123';
-- ✅ GOOD: Partition filter included
SELECT * FROM events
WHERE created_at >= '2024-01-01'
AND created_at < '2024-02-01'
AND shop_id = 'shop_123';
Select Only Needed Columns
-- ❌ BAD: SELECT *
SELECT * FROM events;
-- ✅ GOOD: Select specific columns
SELECT event_id, event_type, created_at FROM events;
Node.js Integration
Always Batch Inserts
// ✅ GOOD: Single batch insert
await table.insert(batch.map(row => ({
...row,
time: new Date()
})));
// ❌ BAD: Insert one row at a time
for (const row of batch) {
await table.insert([row]);
}
| Scenario | Max Batch Size |
|---|---|
| Streaming inserts | 500-1000 rows |
| High throughput | Up to 10,000 rows |
Cost Control
// Dry run before expensive queries
const [job] = await bigquery.createQueryJob({
query: sql,
dryRun: true
});
const estimatedCost = (job.statistics.totalBytesProcessed / 1e12) * 5;
Checklist
□ Large tables (>1GB) have partitioning
□ Queries include partition column in WHERE
□ Tables clustered by frequently filtered columns
□ No SELECT * - select specific columns
□ Using parameterized queries
□ Batch inserts (not row-by-row)
Weekly Installs
7
Repository
trantuananh-17/…-reviewsFirst Seen
Jan 30, 2026
Security Audits
Installed on
github-copilot7
codex7
cursor7
opencode6
gemini-cli6
claude-code5