data-sql
SKILL.md
Data SQL Skill
Version: 1.0 Stack: SQL (PostgreSQL, Redshift, Athena, Spark SQL)
SQL is the only language where a working query can be 1,000x slower than an equivalent working query. A missing index turns a 50ms lookup into a 50-second full table scan. SELECT * breaks downstream code when someone adds a column. Correlated subqueries silently execute N+1 patterns that disappear in development and cripple production. The query planner doesn't warn you — it just does what you asked.
SQL written with index awareness, explicit columns, and CTEs is both faster to execute and easier to debug. Clarity and performance reinforce each other.
Scope and Boundaries
This skill covers:
- Query optimization patterns
- Schema design and normalization
- Data modeling (star schema, slowly changing dimensions)
- CTEs and query organization
- Index strategy
- Window functions
Defers to other skills:
security: SQL injection prevention, parameterized queriesdata-pipelines: When to use SQL vs. application code
Use this skill when: Writing SQL queries or designing schemas.
Core Principles
- CTEs for Readability — Break complex queries into named steps.
- Filter Early — Push WHERE clauses as close to source as possible.
- Explicit Columns — Never
SELECT *in production code. - Index-Aware Queries — Know what's indexed, write queries that use them.
- Idempotent Operations — INSERT/UPDATE should be safe to retry.
Patterns
CTE Organization
-- Good - logical steps, easy to debug
WITH active_users AS (
SELECT user_id, email
FROM users
WHERE status = 'active'
),
recent_orders AS (
SELECT user_id, COUNT(*) as order_count
FROM orders
WHERE created_at > CURRENT_DATE - INTERVAL '30 days'
GROUP BY user_id
)
SELECT
u.email,
COALESCE(o.order_count, 0) as recent_orders
FROM active_users u
LEFT JOIN recent_orders o USING (user_id);
Upsert Pattern (PostgreSQL)
INSERT INTO inventory (sku, quantity, updated_at)
VALUES ('ABC123', 100, NOW())
ON CONFLICT (sku) DO UPDATE SET
quantity = EXCLUDED.quantity,
updated_at = EXCLUDED.updated_at;
Window Functions
-- Running total and row number
SELECT
date,
revenue,
SUM(revenue) OVER (ORDER BY date) as running_total,
ROW_NUMBER() OVER (PARTITION BY category ORDER BY revenue DESC) as rank
FROM daily_sales;
Anti-Patterns
| Anti-Pattern | Problem | Fix |
|---|---|---|
SELECT * |
Breaks on schema changes, wastes bandwidth | List explicit columns |
| Correlated subqueries | N+1 query behavior | Use JOINs or window functions |
OR in JOIN conditions |
Prevents index use | Restructure or use UNION |
| Functions on indexed columns | WHERE YEAR(date) = 2024 |
Use range: date >= '2024-01-01' |
Missing GROUP BY columns |
Undefined behavior | Include all non-aggregated columns |
Checklist
- No
SELECT *in production queries - CTEs used for complex queries
- WHERE clauses filter early
- JOINs prefer indexed columns
- Upserts are idempotent
- Large queries tested with EXPLAIN
References
references/query-optimization.md— Index usage, join optimization, performancereferences/window-functions.md— Window function patterns and examples
Assets
assets/query-patterns.md— Common SQL patterns and templates
Weekly Installs
8
Repository
alexanderstephe…aude-hubGitHub Stars
1
First Seen
Feb 17, 2026
Security Audits
Installed on
github-copilot8
codex8
gemini-cli8
opencode8
kimi-cli7
cursor7