microsoft-data-stack

Installation

SKILL.md

Microsoft Data Stack Skill for Claude

Expert guidance for data engineering on Microsoft and Azure platforms. Covers ADF, Synapse, Fabric, SQL Server, ADLS Gen2, SSIS migration, and dbt-sqlserver.

When to Use This Skill

Activate when:

Designing Azure Data Factory pipelines (linked services, integration runtimes, triggers)
Choosing between Synapse Analytics, Microsoft Fabric, and SQL Server
Implementing SQL Server CDC, temporal tables, or partition strategies
Configuring ADLS Gen2 hierarchical namespace, ACLs, or storage integration
Migrating SSIS packages to modern ETL (ADF, dbt, dlt)
Setting up dbt-sqlserver or dbt-fabric adapters

Don't use for: General orchestration patterns (use data-pipelines), dbt modeling methodology (use dbt-transforms), Kafka/Flink streaming (use event-streaming), Azure DevOps CI/CD.

Scope Constraints

Generate ADF pipeline JSON, SQL Server DDL, and configuration guidance only.
Credential management: reference Azure Key Vault and Managed Identity. Never hardcode secrets. See Security & Compliance Patterns.
Limit scope to data engineering concerns. Hand off BI/reporting to Power BI documentation and general orchestration to data-pipelines.

Model Routing

reasoning_demand	preferred	acceptable	minimum
medium	Sonnet	Sonnet, Opus	Sonnet

Core Principles

Managed Identity over keys -- Use Azure Managed Identity for service-to-service auth. Key Vault for external secrets only.
Lakehouse by default -- Store raw data in ADLS Gen2 (Parquet/Delta), transform in Synapse or Fabric, serve from SQL pools or lakehouses.
Metadata-driven pipelines -- ADF pipelines should be parameterized and table-driven, not one pipeline per table.
Incremental over full load -- Use watermark columns, CDC, or Change Tracking for incremental extraction.
Right-size the engine -- SQL Server for OLTP + light analytics, Synapse for warehouse-scale, Fabric for unified lakehouse.

Platform Decision Matrix

Factor	SQL Server	Synapse Analytics	Microsoft Fabric
Best for	OLTP, <1TB analytics, existing SQL Server shops	Dedicated warehouse, complex queries, PB-scale	Unified lakehouse, OneLake, Power BI integration
Compute model	Always-on instance	Dedicated or serverless SQL pools	Capacity units (CU), pause/resume
Storage	Local/SAN	ADLS Gen2 (external tables)	OneLake (auto-managed ADLS)
dbt adapter	`dbt-sqlserver`	`dbt-synapse`	`dbt-fabric`
Cost profile	License + infra	Per-DWU or per-query	Per-CU capacity reservation
Governance	SQL Server audit, RLS	Synapse RBAC, column masking	Purview integration, OneLake RBAC

For detailed comparison, see Synapse vs Fabric Decision Guide.

ADF Pipeline Patterns

Metadata-driven ingestion -- Single parameterized pipeline reads a control table listing source tables, watermark columns, and destinations. ForEach activity iterates and copies incrementally.

Integration Runtime tiers:

Azure IR: cloud-to-cloud (default)
Self-Hosted IR: on-prem SQL Server, file shares, legacy systems
Azure-SSIS IR: lift-and-shift SSIS packages (migration stepping stone only)

For pipeline patterns and linked service config, see ADF Patterns Reference.

SQL Server Data Engineering

CDC (Change Data Capture): Captures row-level changes to dedicated change tables. Enable per-table with sys.sp_cdc_enable_table. Query changes via cdc.fn_cdc_get_all_changes_<capture_instance>.

Temporal Tables: System-versioned tables with automatic history tracking. Use for SCD Type 2, audit trails, and point-in-time queries (FOR SYSTEM_TIME AS OF).

Partition Strategies: Partition large fact tables by date using partition functions and schemes. Enables partition switching for fast loads and archive operations.

SSIS Migration Quick Guide

SSIS Component	Modern Replacement	Notes
Data Flow Task	ADF Copy Activity / dbt model	ADF for E/L, dbt for T
Script Task (C#)	Azure Function / Python	Wrap custom logic in serverless functions
Execute SQL Task	dbt model / ADF Stored Proc	Prefer dbt for transformations
For Loop / ForEach	ADF ForEach activity	Metadata-driven pattern
SSISDB Catalog	ADF + Git integration	Version control via Azure DevOps/GitHub
Package variables	ADF parameters + Key Vault	Externalize configuration

For full migration playbook, see SSIS Migration Playbook.

dbt-sqlserver Setup

# profiles.yml
my_project:
  target: dev
  outputs:
    dev:
      type: sqlserver
      driver: "ODBC Driver 18 for SQL Server"
      server: my-server.database.windows.net
      port: 1433
      database: my_database
      schema: dbt_dev
      authentication: ActiveDirectoryServicePrincipal
      tenant_id: "{{ env_var('AZURE_TENANT_ID') }}"
      client_id: "{{ env_var('AZURE_CLIENT_ID') }}"
      client_secret: "{{ env_var('AZURE_CLIENT_SECRET') }}"

Gotchas: No QUALIFY clause -- use subquery with ROW_NUMBER(). No MERGE in views. BOOLEAN type unavailable -- use BIT. Collation-sensitive string comparisons by default.

Security Posture

See Security & Compliance Patterns for the full framework.

Auth: Azure Managed Identity (preferred), Service Principal, Azure AD tokens.
Secrets: Azure Key Vault only. ADF linked services reference Key Vault secrets.
Network: Private endpoints for Synapse/SQL Server. VNet-integrated IR for ADF.
Data classification: Purview for automated scanning. SQL Server column-level classification.

Reference Files

Azure Data Factory Patterns -- Pipeline patterns, linked services, integration runtimes, triggers
Synapse vs Fabric Decision Guide -- When to use Synapse vs Fabric vs SQL Server
SSIS Migration Playbook -- SSIS to modern ETL migration patterns

Handoffs

Pipeline orchestration patterns -> data-pipelines (Dagster, Airflow, Prefect scheduling and monitoring)
dbt modeling methodology -> dbt-transforms (staging/marts layers, testing, CI/CD)
Data governance features -> data-governance (cataloging, lineage, classification, access control)

Related skills

More from dtsong/data-engineering-skills

Installs

Repository

dtsong/data-eng…g-skills

GitHub Stars

First Seen

Mar 10, 2026

Security Audits

Gen Agent Trust HubPass

SocketPass

SnykPass