postgresql-advanced-sql
PostgreSQL Advanced SQL
Use When
- Advanced PostgreSQL SQL patterns sourced from "Mastering PostgreSQL" (Supabase/Manning) and "Introduction to PostgreSQL for the Data Professional". Covers modern SQL (CTEs, recursive queries, window functions), JSONB operators, array operations, full-text search (tsvector/tsquery, GIN indexes, ranking), LATERAL joins, and anonymous code blocks. Companion to postgresql-fundamentals and postgresql-performance.
- The task needs reusable judgment, domain constraints, or a proven workflow rather than ad hoc advice.
Do Not Use When
- The task is unrelated to
postgresql-advanced-sqlor would be better handled by a more specific companion skill. - The request only needs a trivial answer and none of this skill's constraints or references materially help.
Required Inputs
- Gather relevant project context, constraints, and the concrete problem to solve.
- Confirm the desired deliverable: design, code, review, migration plan, audit, or documentation.
Workflow
- Read this
SKILL.mdfirst, then load only the referenced deep-dive files that are necessary for the task. - Apply the ordered guidance, checklists, and decision rules in this skill instead of cherry-picking isolated snippets.
- Produce the deliverable with assumptions, risks, and follow-up work made explicit when they matter.
Quality Standards
- Keep outputs execution-oriented, concise, and aligned with the repository's baseline engineering standards.
- Preserve compatibility with existing project conventions unless the skill explicitly requires a stronger standard.
- Prefer deterministic, reviewable steps over vague advice or tool-specific magic.
Anti-Patterns
- Treating examples as copy-paste truth without checking fit, constraints, or failure modes.
- Loading every reference file by default instead of using progressive disclosure.
Outputs
- A concrete result that fits the task: implementation guidance, review findings, architecture decisions, templates, or generated artifacts.
- Clear assumptions, tradeoffs, or unresolved gaps when the task cannot be completed from available context alone.
- References used, companion skills, or follow-up actions when they materially improve execution.
Evidence Produced
| Category | Artifact | Format | Example |
|---|---|---|---|
| Correctness | Advanced query test plan | Markdown doc covering CTE, window function, LATERAL, and JSONB-path test cases | docs/data/postgres-advanced-tests.md |
References
- Use the links and companion skills already referenced in this file when deeper context is needed.
Common Table Expressions (CTEs)
CTEs name a subquery for reuse within a single statement. Use them for readability and to avoid repeating complex subqueries.
-- Basic CTE
WITH active_orders AS (
SELECT id, customer_id, total, created_at
FROM orders
WHERE status NOT IN ('cancelled', 'refunded')
),
customer_totals AS (
SELECT customer_id, SUM(total) AS lifetime_value
FROM active_orders
GROUP BY customer_id
)
SELECT c.email, ct.lifetime_value
FROM customers c
JOIN customer_totals ct ON ct.customer_id = c.id
ORDER BY ct.lifetime_value DESC;
Writable CTEs (Data-Modifying)
-- Move rows atomically: delete from one table, insert into another
WITH deleted AS (
DELETE FROM staging_orders
WHERE created_at < NOW() - INTERVAL '7 days'
RETURNING *
)
INSERT INTO archive_orders SELECT * FROM deleted;
Materialised vs Inline CTEs
-- Force materialisation (PostgreSQL 12+) — evaluate once, cache result
WITH MATERIALIZED expensive_query AS (
SELECT ... FROM large_table WHERE complex_condition
)
SELECT * FROM expensive_query WHERE ...;
-- Default: planner decides inline vs materialised
WITH sales AS ( SELECT ... ) SELECT * FROM sales;
Recursive Queries
Recursive CTEs model hierarchies (org charts, category trees, bill of materials).
-- Org chart: find all reports under a manager
WITH RECURSIVE reports AS (
-- Anchor: start with the manager
SELECT id, name, manager_id, 0 AS depth
FROM employees
WHERE id = 42
UNION ALL
-- Recursive: employees whose manager_id is in prior results
SELECT e.id, e.name, e.manager_id, r.depth + 1
FROM employees e
JOIN reports r ON e.manager_id = r.id
)
SELECT id, name, depth
FROM reports
ORDER BY depth, name;
-- Category tree: full path
WITH RECURSIVE category_path AS (
SELECT id, name, parent_id, name::TEXT AS path
FROM categories WHERE parent_id IS NULL
UNION ALL
SELECT c.id, c.name, c.parent_id, cp.path || ' > ' || c.name
FROM categories c
JOIN category_path cp ON c.parent_id = cp.id
)
SELECT id, path FROM category_path ORDER BY path;
Window Functions
Window functions compute across a set of rows related to the current row without collapsing them (unlike GROUP BY).
-- Syntax
function_name(args) OVER (
[PARTITION BY column]
[ORDER BY column]
[ROWS/RANGE BETWEEN ...]
)
Ranking
SELECT
employee_id,
department_id,
salary,
RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS rank,
DENSE_RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS dense_rank,
ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY salary DESC) AS row_num,
NTILE(4) OVER (ORDER BY salary DESC) AS quartile
FROM employees;
Aggregate Windows
SELECT
order_id,
customer_id,
amount,
SUM(amount) OVER (PARTITION BY customer_id ORDER BY created_at) AS running_total,
AVG(amount) OVER (PARTITION BY customer_id) AS customer_avg,
COUNT(*) OVER (PARTITION BY customer_id) AS order_count,
-- 7-day moving average
AVG(amount) OVER (
PARTITION BY customer_id
ORDER BY created_at
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS moving_avg_7d
FROM orders;
Lead/Lag — Access Adjacent Rows
SELECT
month,
revenue,
LAG(revenue, 1) OVER (ORDER BY month) AS prev_month_revenue,
LEAD(revenue, 1) OVER (ORDER BY month) AS next_month_revenue,
revenue - LAG(revenue, 1) OVER (ORDER BY month) AS month_delta
FROM monthly_revenue;
First/Last Value
SELECT
product_id,
sale_date,
price,
FIRST_VALUE(price) OVER (PARTITION BY product_id ORDER BY sale_date) AS original_price,
LAST_VALUE(price) OVER (
PARTITION BY product_id
ORDER BY sale_date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS latest_price
FROM price_history;
LATERAL Joins
LATERAL allows a subquery on the right side to reference columns from the left side — like a correlated subquery in a FROM clause.
-- Get the 3 most recent orders per customer
SELECT c.id, c.email, o.id AS order_id, o.total, o.created_at
FROM customers c
CROSS JOIN LATERAL (
SELECT id, total, created_at
FROM orders
WHERE customer_id = c.id
ORDER BY created_at DESC
LIMIT 3
) o;
-- Function producing rows per input row
SELECT u.id, stats.*
FROM users u
CROSS JOIN LATERAL get_user_stats(u.id) AS stats;
JSONB Operations
-- Operators
payload->>'key' -- extract as text
payload->'key' -- extract as jsonb
payload#>>'{a,b}' -- nested path as text
payload @> '{"key":"val"}' -- contains
payload ? 'key' -- key exists
payload ?| ARRAY['a','b'] -- any key exists
payload ?& ARRAY['a','b'] -- all keys exist
-- Update: merge (PostgreSQL 16+)
UPDATE configs
SET payload = payload || '{"theme": "light"}'::jsonb
WHERE id = 1;
-- Update: remove key
UPDATE configs
SET payload = payload - 'deprecated_key'
WHERE id = 1;
-- Expand to rows
SELECT key, value
FROM configs, jsonb_each_text(configs.payload);
-- Aggregate to JSON
SELECT jsonb_agg(row_to_json(u)) FROM users u WHERE active;
SELECT jsonb_object_agg(id, name) FROM categories;
JSONB Index Types
-- GIN for @>, ?, ?|, ?& operators (containment/existence)
CREATE INDEX configs_payload_gin ON configs USING GIN (payload);
-- GIN with jsonb_path_ops (smaller, only for @>)
CREATE INDEX configs_payload_path ON configs USING GIN (payload jsonb_path_ops);
-- B-tree on extracted value (for = and ORDER BY)
CREATE INDEX configs_theme ON configs ((payload->>'theme'));
Array Operations
-- Operators
ARRAY['a','b'] @> ARRAY['a'] -- contains
ARRAY['a','b'] <@ ARRAY['a','c'] -- is contained by
ARRAY['a','b'] && ARRAY['b','c'] -- overlap (any common)
-- Functions
array_length(tags, 1) -- length of dimension 1
array_append(tags, 'new') -- append element
array_remove(tags, 'old') -- remove all occurrences
array_to_string(tags, ',') -- join to string
string_to_array('a,b,c', ',') -- split to array
unnest(tags) -- expand to rows
array_agg(name ORDER BY name) -- aggregate to array
-- Index for array operators
CREATE INDEX products_tags_gin ON products USING GIN (tags);
Full-Text Search
Basic Setup
-- tsvector: pre-processed document
-- tsquery: search query
-- On-the-fly (not recommended for large tables)
SELECT title FROM articles
WHERE to_tsvector('english', title || ' ' || body) @@ to_tsquery('english', 'PostgreSQL & performance');
-- Stored tsvector column (recommended)
ALTER TABLE articles ADD COLUMN search_vector tsvector;
UPDATE articles
SET search_vector = to_tsvector('english', coalesce(title,'') || ' ' || coalesce(body,''));
-- Auto-update with trigger
CREATE FUNCTION articles_search_trigger() RETURNS trigger AS $$
BEGIN
NEW.search_vector := to_tsvector('english', coalesce(NEW.title,'') || ' ' || coalesce(NEW.body,''));
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER articles_search_update
BEFORE INSERT OR UPDATE ON articles
FOR EACH ROW EXECUTE FUNCTION articles_search_trigger();
-- GIN index (required for performance)
CREATE INDEX articles_search_gin ON articles USING GIN (search_vector);
Search Query Syntax
-- Basic search
SELECT title FROM articles
WHERE search_vector @@ to_tsquery('english', 'PostgreSQL');
-- AND
to_tsquery('english', 'PostgreSQL & performance')
-- OR
to_tsquery('english', 'PostgreSQL | MySQL')
-- NOT
to_tsquery('english', 'PostgreSQL & !MySQL')
-- Phrase (proximity)
to_tsquery('english', 'PostgreSQL <-> performance') -- adjacent
to_tsquery('english', 'PostgreSQL <2> tuning') -- within 2 words
-- Prefix search (autocomplete)
to_tsquery('english', 'Postgre:*')
-- plainto_tsquery: natural language input, no operators
WHERE search_vector @@ plainto_tsquery('english', 'postgres full text search')
-- websearch_to_tsquery: Google-style (PostgreSQL 11+)
WHERE search_vector @@ websearch_to_tsquery('english', '"full text" postgres -mysql')
Ranking Results
SELECT
title,
ts_rank(search_vector, query) AS rank,
ts_rank_cd(search_vector, query, 32) AS rank_cover_density
FROM articles, to_tsquery('english', 'PostgreSQL & performance') query
WHERE search_vector @@ query
ORDER BY rank DESC
LIMIT 20;
Highlighting Matches
SELECT
title,
ts_headline('english', body, to_tsquery('english', 'PostgreSQL'),
'StartSel=<mark>, StopSel=</mark>, MaxWords=50, MinWords=20'
) AS excerpt
FROM articles
WHERE search_vector @@ to_tsquery('english', 'PostgreSQL');
Trigram Fuzzy Search (pg_trgm)
For typo-tolerant search and LIKE/ILIKE acceleration:
CREATE EXTENSION IF NOT EXISTS pg_trgm;
-- Index
CREATE INDEX users_email_trgm ON users USING GIN (email gin_trgm_ops);
-- Fuzzy similarity
SELECT email, similarity(email, 'john@example.com') AS sim
FROM users
WHERE email % 'john@example.com' -- similarity > threshold (default 0.3)
ORDER BY sim DESC;
-- Accelerate LIKE/ILIKE
SELECT * FROM products WHERE name ILIKE '%coffee%'; -- uses trgm index
Anonymous Code Blocks (DO)
DO $$
DECLARE
counter INT := 0;
rec RECORD;
BEGIN
FOR rec IN SELECT id FROM users WHERE migrated = false LOOP
UPDATE users SET migrated = true WHERE id = rec.id;
counter := counter + 1;
END LOOP;
RAISE NOTICE 'Migrated % users', counter;
END $$;
Useful Utility Queries
-- Table sizes
SELECT relname, pg_size_pretty(pg_total_relation_size(oid)) AS total
FROM pg_class WHERE relkind = 'r' ORDER BY pg_total_relation_size(oid) DESC LIMIT 20;
-- Index usage
SELECT indexrelname, idx_scan, idx_tup_read, idx_tup_fetch
FROM pg_stat_user_indexes ORDER BY idx_scan;
-- Long-running queries
SELECT pid, age(clock_timestamp(), query_start), usename, query
FROM pg_stat_activity
WHERE query != '<IDLE>' AND query NOT ILIKE '%pg_stat_activity%'
ORDER BY query_start;
-- Duplicate rows
SELECT col1, col2, COUNT(*) FROM table GROUP BY col1, col2 HAVING COUNT(*) > 1;
-- Generate series
SELECT generate_series('2026-01-01'::date, '2026-12-31'::date, '1 month'::interval) AS month;
Anti-Patterns
SELECT *in CTEs with large tables — materialisation pulls all columns- Recursive CTE without depth limit — add
WHERE depth < 100guard to_tsvector()in WHERE without a GIN index — full sequential scan- Using
LIKE '%term%'without pg_trgm index — sequential scan - Missing
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWINGforLAST_VALUE— returns wrong result - Extracting JSONB value in WHERE without functional index — full scan
More from peterbamuhigire/skills-web-dev
google-play-store-review
Google Play Store compliance and review readiness for Android apps. Use
76multi-tenant-saas-architecture
Use when designing or reviewing a multi-tenant SaaS platform — tenant
62jetpack-compose-ui
Jetpack Compose UI standards for beautiful, sleek, minimalistic Android
49gis-mapping
Use for web apps that need Leaflet-first GIS mapping, location selection,
48saas-accounting-system
Implement a complete double-entry accounting system inside any SaaS app.
47manual-guide
Generate end-user manuals and reference guides for ERP modules. Use when
38