microsoft-data-stack
Microsoft Data Stack Skill for Claude
Expert guidance for data engineering on Microsoft and Azure platforms. Covers ADF, Synapse, Fabric, SQL Server, ADLS Gen2, SSIS migration, and dbt-sqlserver.
When to Use This Skill
Activate when:
- Designing Azure Data Factory pipelines (linked services, integration runtimes, triggers)
- Choosing between Synapse Analytics, Microsoft Fabric, and SQL Server
- Implementing SQL Server CDC, temporal tables, or partition strategies
- Configuring ADLS Gen2 hierarchical namespace, ACLs, or storage integration
- Migrating SSIS packages to modern ETL (ADF, dbt, dlt)
- Setting up dbt-sqlserver or dbt-fabric adapters
Don't use for: General orchestration patterns (use data-pipelines), dbt modeling methodology (use dbt-transforms), Kafka/Flink streaming (use event-streaming), Azure DevOps CI/CD.
Scope Constraints
- Generate ADF pipeline JSON, SQL Server DDL, and configuration guidance only.
- Credential management: reference Azure Key Vault and Managed Identity. Never hardcode secrets. See Security & Compliance Patterns.
- Limit scope to data engineering concerns. Hand off BI/reporting to Power BI documentation and general orchestration to data-pipelines.
Model Routing
| reasoning_demand | preferred | acceptable | minimum |
|---|---|---|---|
| medium | Sonnet | Sonnet, Opus | Sonnet |
Core Principles
- Managed Identity over keys -- Use Azure Managed Identity for service-to-service auth. Key Vault for external secrets only.
- Lakehouse by default -- Store raw data in ADLS Gen2 (Parquet/Delta), transform in Synapse or Fabric, serve from SQL pools or lakehouses.
- Metadata-driven pipelines -- ADF pipelines should be parameterized and table-driven, not one pipeline per table.
- Incremental over full load -- Use watermark columns, CDC, or Change Tracking for incremental extraction.
- Right-size the engine -- SQL Server for OLTP + light analytics, Synapse for warehouse-scale, Fabric for unified lakehouse.
Platform Decision Matrix
| Factor | SQL Server | Synapse Analytics | Microsoft Fabric |
|---|---|---|---|
| Best for | OLTP, <1TB analytics, existing SQL Server shops | Dedicated warehouse, complex queries, PB-scale | Unified lakehouse, OneLake, Power BI integration |
| Compute model | Always-on instance | Dedicated or serverless SQL pools | Capacity units (CU), pause/resume |
| Storage | Local/SAN | ADLS Gen2 (external tables) | OneLake (auto-managed ADLS) |
| dbt adapter | dbt-sqlserver |
dbt-synapse |
dbt-fabric |
| Cost profile | License + infra | Per-DWU or per-query | Per-CU capacity reservation |
| Governance | SQL Server audit, RLS | Synapse RBAC, column masking | Purview integration, OneLake RBAC |
For detailed comparison, see Synapse vs Fabric Decision Guide.
ADF Pipeline Patterns
Metadata-driven ingestion -- Single parameterized pipeline reads a control table listing source tables, watermark columns, and destinations. ForEach activity iterates and copies incrementally.
Integration Runtime tiers:
- Azure IR: cloud-to-cloud (default)
- Self-Hosted IR: on-prem SQL Server, file shares, legacy systems
- Azure-SSIS IR: lift-and-shift SSIS packages (migration stepping stone only)
For pipeline patterns and linked service config, see ADF Patterns Reference.
SQL Server Data Engineering
CDC (Change Data Capture): Captures row-level changes to dedicated change tables. Enable per-table with sys.sp_cdc_enable_table. Query changes via cdc.fn_cdc_get_all_changes_<capture_instance>.
Temporal Tables: System-versioned tables with automatic history tracking. Use for SCD Type 2, audit trails, and point-in-time queries (FOR SYSTEM_TIME AS OF).
Partition Strategies: Partition large fact tables by date using partition functions and schemes. Enables partition switching for fast loads and archive operations.
SSIS Migration Quick Guide
| SSIS Component | Modern Replacement | Notes |
|---|---|---|
| Data Flow Task | ADF Copy Activity / dbt model | ADF for E/L, dbt for T |
| Script Task (C#) | Azure Function / Python | Wrap custom logic in serverless functions |
| Execute SQL Task | dbt model / ADF Stored Proc | Prefer dbt for transformations |
| For Loop / ForEach | ADF ForEach activity | Metadata-driven pattern |
| SSISDB Catalog | ADF + Git integration | Version control via Azure DevOps/GitHub |
| Package variables | ADF parameters + Key Vault | Externalize configuration |
For full migration playbook, see SSIS Migration Playbook.
dbt-sqlserver Setup
# profiles.yml
my_project:
target: dev
outputs:
dev:
type: sqlserver
driver: "ODBC Driver 18 for SQL Server"
server: my-server.database.windows.net
port: 1433
database: my_database
schema: dbt_dev
authentication: ActiveDirectoryServicePrincipal
tenant_id: "{{ env_var('AZURE_TENANT_ID') }}"
client_id: "{{ env_var('AZURE_CLIENT_ID') }}"
client_secret: "{{ env_var('AZURE_CLIENT_SECRET') }}"
Gotchas: No QUALIFY clause -- use subquery with ROW_NUMBER(). No MERGE in views. BOOLEAN type unavailable -- use BIT. Collation-sensitive string comparisons by default.
Security Posture
See Security & Compliance Patterns for the full framework.
- Auth: Azure Managed Identity (preferred), Service Principal, Azure AD tokens.
- Secrets: Azure Key Vault only. ADF linked services reference Key Vault secrets.
- Network: Private endpoints for Synapse/SQL Server. VNet-integrated IR for ADF.
- Data classification: Purview for automated scanning. SQL Server column-level classification.
Reference Files
- Azure Data Factory Patterns -- Pipeline patterns, linked services, integration runtimes, triggers
- Synapse vs Fabric Decision Guide -- When to use Synapse vs Fabric vs SQL Server
- SSIS Migration Playbook -- SSIS to modern ETL migration patterns
Handoffs
- Pipeline orchestration patterns -> data-pipelines (Dagster, Airflow, Prefect scheduling and monitoring)
- dbt modeling methodology -> dbt-transforms (staging/marts layers, testing, CI/CD)
- Data governance features -> data-governance (cataloging, lineage, classification, access control)