analytics-engineer
SKILL.md
Analytics Engineer
The agent operates as a senior analytics engineer, building scalable dbt transformation layers, designing dimensional models, writing tested SQL, and managing semantic-layer metric definitions.
Workflow
- Understand the data request -- Identify the business question, required grain, and downstream consumers (dashboard, notebook, reverse-ETL). Confirm source tables exist and check freshness.
- Design the dimensional model -- Choose star or snowflake schema. Map source entities to dimension and fact tables at the correct grain. Document grain, primary keys, and foreign keys.
- Build staging models -- One
stg_model per source table. Rename columns, cast types, filter soft-deletes, and add metadata columns. Validate:dbt build --select stg_*. - Build intermediate models -- Encapsulate reusable business logic in
int_models (e.g.,int_orders_enriched). Keep each CTE single-purpose. - Build mart models -- Create
dim_andfct_models for consumption. Configure materialization (view for staging, incremental for large facts, table for small marts). - Add tests and documentation -- Every primary key gets
unique+not_null. Foreign keys getrelationships. Addaccepted_valuesfor enums. Write model descriptions in YAML. - Define semantic-layer metrics -- Register metrics (sum, average, count_distinct) with time grains and dimension slices so BI consumers get a single source of truth.
- Validate end-to-end -- Run
dbt build, confirm test pass rate = 100%, check row counts against source, and verify dashboard numbers match.
dbt Project Structure
analytics/
dbt_project.yml
models/
staging/ # stg_<source>__<table>.sql (one per source table)
intermediate/ # int_<entity>_<verb>.sql (reusable logic)
marts/
core/ # dim_*.sql, fct_*.sql (consumption-ready)
marketing/
finance/
macros/ # Reusable Jinja helpers
tests/ # Custom generic + singular tests
seeds/ # Static CSV lookups
snapshots/ # SCD Type 2 captures
Concrete Example: Customer Dimension
Staging model (models/staging/crm/stg_crm__customers.sql):
WITH source AS (
SELECT * FROM {{ source('crm', 'customers') }}
),
renamed AS (
SELECT
id AS customer_id,
TRIM(LOWER(name)) AS customer_name,
TRIM(LOWER(email)) AS email,
created_at::timestamp AS created_at,
updated_at::timestamp AS updated_at,
is_active::boolean AS is_active,
_fivetran_synced AS _loaded_at
FROM source
WHERE _fivetran_deleted = false
)
SELECT * FROM renamed
Mart model (models/marts/core/dim_customer.sql):
WITH customers AS (
SELECT * FROM {{ ref('stg_crm__customers') }}
),
customer_orders AS (
SELECT
customer_id,
MIN(order_date) AS first_order_date,
MAX(order_date) AS most_recent_order_date,
COUNT(*) AS lifetime_orders,
SUM(order_amount) AS lifetime_value
FROM {{ ref('stg_orders__orders') }}
GROUP BY customer_id
),
final AS (
SELECT
c.customer_id,
c.customer_name,
c.email,
c.created_at,
co.first_order_date,
co.most_recent_order_date,
co.lifetime_orders,
co.lifetime_value,
CASE
WHEN co.lifetime_value >= 10000 THEN 'platinum'
WHEN co.lifetime_value >= 5000 THEN 'gold'
WHEN co.lifetime_value >= 1000 THEN 'silver'
ELSE 'bronze'
END AS customer_tier
FROM customers c
LEFT JOIN customer_orders co
ON c.customer_id = co.customer_id
)
SELECT * FROM final
Test configuration (models/marts/core/_core__models.yml):
version: 2
models:
- name: dim_customer
description: Customer dimension with lifetime order metrics and tier classification.
columns:
- name: customer_id
tests: [unique, not_null]
- name: email
tests: [unique, not_null]
- name: customer_tier
tests:
- accepted_values:
values: ['platinum', 'gold', 'silver', 'bronze']
- name: lifetime_value
tests:
- dbt_utils.expression_is_true:
expression: ">= 0"
Incremental Fact Table Pattern
-- models/marts/core/fct_orders.sql
{{
config(
materialized='incremental',
unique_key='order_id',
partition_by={'field': 'order_date', 'data_type': 'date'},
cluster_by=['customer_id', 'product_id']
)
}}
WITH orders AS (
SELECT * FROM {{ ref('stg_orders__orders') }}
{% if is_incremental() %}
WHERE order_date >= (SELECT MAX(order_date) FROM {{ this }})
{% endif %}
),
order_items AS (
SELECT * FROM {{ ref('stg_orders__order_items') }}
),
final AS (
SELECT
o.order_id,
o.order_date,
o.customer_id,
oi.product_id,
o.store_id,
oi.quantity,
oi.unit_price,
oi.quantity * oi.unit_price AS line_total,
o.discount_amount,
o.tax_amount,
o.total_amount
FROM orders o
INNER JOIN order_items oi ON o.order_id = oi.order_id
)
SELECT * FROM final
Materialization Strategy
| Layer | Materialization | Rationale |
|---|---|---|
| Staging | View | Thin wrappers; no storage cost |
| Intermediate | Ephemeral / View | Business logic; referenced multiple times |
| Marts (small) | Table | Query performance for BI tools |
| Marts (large) | Incremental | Efficient appends for large fact tables |
Semantic-Layer Metric Definition
# models/marts/core/_core__metrics.yml
metrics:
- name: revenue
label: Total Revenue
model: ref('fct_orders')
calculation_method: sum
expression: total_amount
timestamp: order_date
time_grains: [day, week, month, quarter, year]
dimensions: [customer_tier, product_category, store_region]
filters:
- field: is_cancelled
operator: '='
value: 'false'
- name: average_order_value
label: Average Order Value
model: ref('fct_orders')
calculation_method: average
expression: total_amount
timestamp: order_date
time_grains: [day, week, month]
Useful Macros
-- macros/cents_to_dollars.sql
{% macro cents_to_dollars(column_name) %}
({{ column_name }} / 100.0)::decimal(18,2)
{% endmacro %}
-- macros/get_incremental_filter.sql
{% macro get_incremental_filter(column_name, lookback_days=3) %}
{% if is_incremental() %}
WHERE {{ column_name }} >= (
SELECT DATEADD(day, -{{ lookback_days }}, MAX({{ column_name }}))
FROM {{ this }}
)
{% endif %}
{% endmacro %}
CI/CD: Slim CI for Pull Requests
# Only run modified models and their downstream dependents
dbt run --select state:modified+ --defer --state ./target-base
dbt test --select state:modified+ --defer --state ./target-base
For full CI/CD pipeline configuration, see REFERENCE.md.
Reference Materials
REFERENCE.md-- Extended patterns: source config, custom tests, CI/CD workflows, exposures, documentation templatesreferences/modeling_patterns.md-- Data modeling best practicesreferences/dbt_style_guide.md-- SQL and dbt conventionsreferences/testing_guide.md-- Testing strategiesreferences/optimization.md-- Performance tuning
Scripts
python scripts/impact_analyzer.py --model dim_customer
python scripts/schema_diff.py --source prod --target dev
python scripts/doc_generator.py --format markdown
python scripts/quality_scorer.py --model fct_orders
Weekly Installs
84
Repository
borghei/claude-skillsGitHub Stars
38
First Seen
Jan 24, 2026
Security Audits
Installed on
claude-code66
opencode61
gemini-cli55
codex52
github-copilot51
cursor51