sql-bi-reporting
SQL BI & Reporting
When to Use
- Writing summary reports with subtotals across multiple dimensions
- Building time-series queries (bucketing, gap filling, moving averages)
- Pivoting row data into columns or unpivoting columns into rows
- Generating date ranges or number sequences (tally tables)
- Solving gaps-and-islands problems (consecutive ranges, session analysis)
- Exporting query results as JSON, XML, or CSV
- Writing queries optimized for Power BI DirectQuery or SSRS
- Year-over-year, cohort, or funnel analysis
When NOT to use: application schema design (table design, naming conventions, access control), query performance tuning (execution plans, index tuning, wait stats), or ETL pipeline design.
Multi-Level Aggregation
Three operators produce subtotals from a single GROUP BY — choose based on your needs:
| Operator | Produces | Use when |
|---|---|---|
ROLLUP(A, B, C) |
(A,B,C), (A,B), (A), () | Columns form a hierarchy (year > quarter > month) |
CUBE(A, B) |
All 2^n combinations | Need every cross-dimensional combination |
GROUPING SETS(...) |
Exactly what you list | Need specific subtotals, not a formula |
-- Hierarchical subtotals: year > quarter > grand total
SELECT
YEAR(OrderDate) AS OrderYear,
DATEPART(quarter, OrderDate) AS OrderQtr,
SUM(TotalAmount) AS Revenue
FROM Sales.Orders
GROUP BY ROLLUP(
YEAR(OrderDate),
DATEPART(quarter, OrderDate)
)
ORDER BY OrderYear, OrderQtr;
Use GROUPING(col) to distinguish subtotal NULLs from real NULLs — returns 1 for subtotal rows, 0 for data rows.
Full reference: Aggregation Patterns — GROUPING_ID, conditional aggregation, HAVING vs WHERE, COUNT semantics with NULLs.
Window Functions
Quick Reference
| Category | Functions | Frame needed? |
|---|---|---|
| Ranking | ROW_NUMBER, RANK, DENSE_RANK, NTILE |
No |
| Offset | LAG, LEAD, FIRST_VALUE, LAST_VALUE |
Yes for LAST_VALUE |
| Aggregate | SUM, AVG, COUNT, MIN, MAX |
Yes for running/sliding |
Running total
SELECT
OrderDate,
TotalAmount,
SUM(TotalAmount) OVER (
ORDER BY OrderDate
ROWS UNBOUNDED PRECEDING
) AS RunningTotal
FROM Sales.Orders;
Always specify ROWS explicitly. The default frame (when ORDER BY is present but no frame is specified) is RANGE UNBOUNDED PRECEDING — this includes ties, produces different results, and is slower.
Moving average
-- 7-day moving average
AVG(DailyRevenue) OVER (
ORDER BY OrderDate
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS MovingAvg7Day
Period-over-period comparison
-- Year-over-year monthly revenue (offset 12 for monthly data)
LAG(Revenue, 12) OVER (ORDER BY OrderYear, OrderMonth) AS PriorYearRevenue
Greatest-N-per-group
-- Most recent order per customer
WITH Ranked AS (
SELECT *,
ROW_NUMBER() OVER (
PARTITION BY CustomerID
ORDER BY OrderDate DESC
) AS RN
FROM Sales.Orders
)
SELECT * FROM Ranked WHERE RN = 1;
Named WINDOW clause (2022+)
When multiple window functions share the same partition and order, avoid repetition:
SELECT
OrderID,
ROW_NUMBER() OVER Win AS RowNum,
SUM(TotalAmount) OVER Win AS RunningTotal,
LAG(TotalAmount) OVER Win AS PrevAmount
FROM Sales.Orders
WINDOW Win AS (PARTITION BY CustomerID ORDER BY OrderDate);
Date Math Patterns
Bucketing
| Need | SQL Server 2022+ | Pre-2022 |
|---|---|---|
| Truncate to day | DATETRUNC(day, col) |
CAST(col AS DATE) |
| Truncate to month | DATETRUNC(month, col) |
DATEFROMPARTS(YEAR(col), MONTH(col), 1) |
| 15-minute buckets | DATE_BUCKET(minute, 15, col) |
DATEADD(minute, DATEDIFF(minute, 0, col)/15*15, 0) |
| Custom week start | DATE_BUCKET(week, 1, col, @origin) |
Manual DATEADD/DATEDIFF calculation |
SARGable date filters
-- GOOD: index seek
WHERE OrderDate >= '2024-01-01' AND OrderDate < '2025-01-01'
-- BAD: index scan (function on column)
WHERE YEAR(OrderDate) = 2024
Gap filling
LEFT JOIN from a continuous date spine (tally table or calendar table) to your sparse data. ISNULL replaces NULL with zero for missing dates. Use a range join to keep the date column SARGable — never CAST(col AS DATE) in a JOIN:
SELECT
C.FullDate,
ISNULL(SUM(O.TotalAmount), 0) AS Revenue
FROM dbo.Calendar C
LEFT JOIN Sales.Orders O
ON O.OrderDate >= C.FullDate
AND O.OrderDate < DATEADD(day, 1, C.FullDate)
WHERE C.FullDate >= '2024-01-01'
AND C.FullDate < '2025-01-01'
GROUP BY C.FullDate
ORDER BY C.FullDate;
Full reference: Time Series — calendar table design, fiscal calendars, cohort analysis, temporal table AS OF queries, moving averages.
Tally Tables
Generate number sequences with zero I/O using the stacking CTE pattern:
WITH
L0 AS (SELECT 1 AS c UNION ALL SELECT 1),
L1 AS (SELECT 1 AS c FROM L0 CROSS JOIN L0 AS B),
L2 AS (SELECT 1 AS c FROM L1 CROSS JOIN L1 AS B),
L3 AS (SELECT 1 AS c FROM L2 CROSS JOIN L2 AS B),
L4 AS (SELECT 1 AS c FROM L3 CROSS JOIN L3 AS B),
Nums AS (
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N
FROM L4
)
SELECT N FROM Nums
WHERE N <= @Count; -- REQUIRED: limits output to needed rows
Every tally CTE must include WHERE N <= <limit> — without it, the full 65,536 rows are generated. When used for date spines, the limit is DATEDIFF(day, @Start, @End). Never omit the limit and rely on an outer query to filter — the optimizer may still materialize all rows.
On SQL Server 2022+, use GENERATE_SERIES(1, @Count) for simpler syntax. Do not use recursive CTEs for number generation — they are row-by-row and 10-50x slower.
Full reference: Tally Tables — inline function wrapper, date range generation, gap filling, string splitting, permanent vs inline tradeoffs.
Pivot & Unpivot
Static PIVOT (known columns):
SELECT ProductName, [Q1], [Q2], [Q3], [Q4]
FROM (...) AS Src
PIVOT (SUM(Revenue) FOR Qtr IN ([Q1],[Q2],[Q3],[Q4])) AS Pvt;
Dynamic PIVOT (runtime columns): build with STRING_AGG + QUOTENAME + sp_executesql. QUOTENAME prevents SQL injection.
Unpivot: use CROSS APPLY VALUES instead of the UNPIVOT operator. UNPIVOT silently drops NULL rows; CROSS APPLY VALUES preserves them.
CROSS APPLY (
VALUES ('Q1', S.Q1), ('Q2', S.Q2), ('Q3', S.Q3), ('Q4', S.Q4)
) AS Q(Quarter, Revenue)
Full reference: Pivot & Unpivot — dynamic PIVOT pattern, multi-column unpivot, conditional aggregation alternative.
Gaps and Islands
Detecting consecutive sequences (islands) and breaks (gaps) in ordered data.
Island detection — the ROW_NUMBER difference technique:
-- GroupKey is constant within each consecutive run
DATEADD(day,
-ROW_NUMBER() OVER (PARTITION BY SensorID ORDER BY ReadingDate),
ReadingDate
) AS GroupKey
Group by GroupKey to find each island's start, end, and length.
Gap detection — LEAD to find the next value, then check for breaks:
LEAD(OrderDay) OVER (ORDER BY OrderDay) AS NextDay
-- Gap exists when NextDay - OrderDay > 1
Full reference: Gaps and Islands — session analysis, status change tracking, date-based vs sequence-based patterns.
Data Export
| Format | T-SQL | Notes |
|---|---|---|
| JSON | FOR JSON PATH |
Dot-notation aliases control nesting: AS "customer.name" produces {"customer":{"name":"..."}} |
| XML | FOR XML PATH |
Only when downstream requires XML (SOAP, EDI) |
| CSV column | STRING_AGG(col, ',') |
WITHIN GROUP for ordering (2017+) |
| Bulk file | BCP / BULK INSERT | TABLOCK for columnstore direct-path |
FOR JSON PATH nesting
Use dot-notation aliases to control JSON structure — no subqueries needed for flat nesting:
SELECT
O.OrderID AS "id",
C.CustomerName AS "customer.name",
C.Email AS "customer.email"
FROM Sales.Orders O
JOIN Sales.Customers C ON C.CustomerID = O.CustomerID
FOR JSON PATH;
-- {"id":1001,"customer":{"name":"Acme","email":"info@acme.com"}}
Full reference: Export Patterns — FOR JSON nesting, JSON_OBJECT (2022+), Power BI DirectQuery optimization, BCP parameters.
Validation Checklist
Before finalizing any BI query, verify:
- Subtotal rows —
GROUPING(col) = 1filters identify rollup rows; confirm NULL isn't confused with real data - Window frames — every windowed aggregate has an explicit
ROWS BETWEENclause - Date filters — all WHERE and JOIN predicates on date columns use range predicates, never
CAST(col AS DATE)orYEAR(col) - Gap fills — LEFT JOIN from calendar/tally produces rows for every expected period; ISNULL handles missing data
- Dynamic PIVOT — generated SQL is printed and inspected before
sp_executesql; all column names pass through QUOTENAME - Export shape — FOR JSON PATH output matches the consumer's expected schema; test with a LIMIT before full run
Common Mistakes
| Mistake | Fix |
|---|---|
| Default window frame with ORDER BY (RANGE, not ROWS) | Always specify ROWS BETWEEN ... explicitly |
| Using RANGE for moving averages (includes ties) | Use ROWS BETWEEN N PRECEDING AND CURRENT ROW |
| YEAR(col) = 2024 in WHERE (kills seeks) | Range predicate: col >= '2024-01-01' AND col < '2025-01-01' |
| CAST(col AS DATE) in JOIN conditions | Use range join: ON col >= date AND col < DATEADD(day, 1, date) |
| Tally CTE without WHERE N <= limit | Always add WHERE N <= @count — without it, generates full 65K rows |
| COUNT(column) when you want total rows | COUNT(*) includes NULLs; COUNT(col) excludes them |
| AVG ignoring NULL semantics | AVG uses COUNT(col) as denominator — use ISNULL(col, 0) if NULLs mean zero |
| UNPIVOT dropping NULL rows | Use CROSS APPLY VALUES to preserve NULLs |
| NOT IN with nullable subquery | Use NOT EXISTS — NOT IN silently returns nothing when subquery contains NULL |
| Recursive CTE for number generation | Use the stacking CTE pattern — set-based and 10-50x faster |
| FOR JSON AUTO in production | Use FOR JSON PATH — AUTO changes shape when aliases change |
| HAVING for non-aggregate filters | Move to WHERE — it filters before grouping, which is cheaper |
| FORMAT for date truncation | FORMAT is CLR-backed (10-50x slower) — use DATETRUNC (2022+) or DATEADD/DATEDIFF |
| LAST_VALUE without explicit frame | Default frame ends at CURRENT ROW — use ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING |
| LAG with fixed offset on sparse data | If periods are missing, LAG(val, 12) skips to the wrong year — use a self-join |
Reference Files
| File | Topics |
|---|---|
| Aggregation Patterns | GROUPING SETS, ROLLUP, CUBE, conditional aggregation, HAVING vs WHERE, COUNT with NULLs |
| Time Series | Date bucketing, DATETRUNC, DATE_BUCKET, calendar tables, gap filling, YoY/MoM, moving averages, fiscal calendars, cohort analysis, temporal AS OF |
| Tally Tables | Stacking CTE pattern, GENERATE_SERIES, date ranges, gap filling, inline vs permanent |
| Pivot & Unpivot | Static PIVOT, dynamic PIVOT with QUOTENAME, CROSS APPLY VALUES unpivot, multi-column unpivot |
| Gaps and Islands | ROW_NUMBER difference, LAG/LEAD gaps, session analysis, status tracking, consecutive ranges |
| Export Patterns | FOR JSON, FOR XML, STRING_AGG, BCP/BULK INSERT, Power BI DirectQuery |