spark-consumption-cli
Installation
SKILL.md
Update Check — ONCE PER SESSION (mandatory) The first time this skill is used in a session, run the check-updates skill before proceeding.
- GitHub Copilot CLI / VS Code: invoke the
check-updatesskill.- Claude Code / Cowork / Cursor / Windsurf / Codex: compare local vs remote package.json version.
- Skip if the check was already performed earlier in this session.
CRITICAL NOTES
- To find the workspace details (including its ID) from workspace name: list all workspaces and, then, use JMESPath filtering
- To find the item details (including its ID) from workspace ID, item type, and item name: list all items of that type in that workspace and, then, use JMESPath filtering
Data Engineering Consumption — CLI Skill
Table of Contents
| Task | Reference | Notes |
|---|---|---|
| Fabric Topology & Key Concepts | COMMON-CORE.md § Fabric Topology & Key Concepts | |
| Environment URLs | COMMON-CORE.md § Environment URLs | |
| Authentication & Token Acquisition | COMMON-CORE.md § Authentication & Token Acquisition | Wrong audience = 401; read before any auth issue |
| Core Control-Plane REST APIs | COMMON-CORE.md § Core Control-Plane REST APIs | |
| Pagination | COMMON-CORE.md § Pagination | |
| Long-Running Operations (LRO) | COMMON-CORE.md § Long-Running Operations (LRO) | |
| Rate Limiting & Throttling | COMMON-CORE.md § Rate Limiting & Throttling | |
| OneLake Data Access | COMMON-CORE.md § OneLake Data Access | Requires storage.azure.com token, not Fabric token |
| Job Execution | COMMON-CORE.md § Job Execution | |
| Capacity Management | COMMON-CORE.md § Capacity Management | |
| Gotchas & Troubleshooting | COMMON-CORE.md § Gotchas & Troubleshooting | |
| Best Practices | COMMON-CORE.md § Best Practices | |
| Tool Selection Rationale | COMMON-CLI.md § Tool Selection Rationale | |
| Finding Workspaces and Items in Fabric | COMMON-CLI.md § Finding Workspaces and Items in Fabric | Mandatory — READ link first [needed for finding workspace id by its name or item id by its name, item type, and workspace id] |
| Authentication Recipes | COMMON-CLI.md § Authentication Recipes | az login flows and token acquisition |
Fabric Control-Plane API via az rest |
COMMON-CLI.md § Fabric Control-Plane API via az rest | Always pass --resource https://api.fabric.microsoft.com or az rest fails |
| Pagination Pattern | COMMON-CLI.md § Pagination Pattern | |
| Long-Running Operations (LRO) Pattern | COMMON-CLI.md § Long-Running Operations (LRO) Pattern | |
OneLake Data Access via curl |
COMMON-CLI.md § OneLake Data Access via curl | Use curl not az rest (different token audience) |
| SQL / TDS Data-Plane Access | COMMON-CLI.md § SQL / TDS Data-Plane Access | sqlcmd (Go) connect, query, CSV export |
| Job Execution (CLI) | COMMON-CLI.md § Job Execution | |
| OneLake Shortcuts | COMMON-CLI.md § OneLake Shortcuts | |
| Capacity Management (CLI) | COMMON-CLI.md § Capacity Management | |
| Composite Recipes | COMMON-CLI.md § Composite Recipes | |
| Gotchas & Troubleshooting (CLI-Specific) | COMMON-CLI.md § Gotchas & Troubleshooting (CLI-Specific) | az rest audience, shell escaping, token expiry |
Quick Reference: az rest Template |
COMMON-CLI.md § Quick Reference: az rest Template | |
| Quick Reference: Token Audience / CLI Tool Matrix | COMMON-CLI.md § Quick Reference: Token Audience ↔ CLI Tool Matrix | Which --resource + tool for each service |
| Relationship to SPARK-AUTHORING-CORE.md | SPARK-CONSUMPTION-CORE.md § Relationship to SPARK-AUTHORING-CORE.md | |
| Data Engineering Consumption Capability Matrix | SPARK-CONSUMPTION-CORE.md § Data Engineering Consumption Capability Matrix | |
| OneLake Table APIs (Schema-enabled Lakehouses) | SPARK-CONSUMPTION-CORE.md § OneLake Table APIs (Schema-enabled Lakehouses) | Unity Catalog-compatible metadata; requires storage.azure.com token |
| Livy Session Management | SPARK-CONSUMPTION-CORE.md § Livy Session Management | Session creation, states, lifecycle, termination |
| Interactive Data Exploration | SPARK-CONSUMPTION-CORE.md § Interactive Data Exploration | Statement execution, output retrieval, data discovery |
| PySpark Analytics Patterns | SPARK-CONSUMPTION-CORE.md § PySpark Analytics Patterns | Cross-lakehouse 3-part naming, performance optimization |
| Must/Prefer/Avoid | SKILL.md § Must/Prefer/Avoid | MUST DO / AVOID / PREFER checklists |
| Quick Start | SKILL.md § Quick Start | CLI-specific Livy session setup and data exploration |
| Key Fabric Patterns | SKILL.md § Key Fabric Patterns | Spark pattern quick-reference table |
| Session Cleanup | SKILL.md § Session Cleanup | Clean up idle Livy sessions via CLI |
Must/Prefer/Avoid
MUST DO
- Check for existing idle sessions before creating new ones
- Use dynamic workspace/lakehouse discovery
- Follow API patterns from COMMON-CLI.md
PREFER
- sqldw-consumption-cli for simple lakehouse queries — row counts, SELECT, schema exploration, filtering, and aggregation on lakehouse Delta tables should use the SQL Endpoint via
sqlcmd, not Spark. Only use this skill when the user explicitly requests PySpark, DataFrames, or Spark-specific features. - SQL Endpoint for Delta tables
- Livy for unstructured/JSON data or complex Python analytics
- Session reuse over creation
AVOID
- Hardcoded workspace IDs
- Creating unnecessary sessions
- Large result sets without LIMIT
Quick Start
Environment Setup
Apply environment detection from COMMON-CORE.md Environment Detection Pattern to set:
$FABRIC_API_BASEand$FABRIC_RESOURCE_SCOPE$FABRIC_API_URLand$LIVY_API_PATHfor Livy operations
Authentication: Use token acquisition from COMMON-CLI.md Environment Detection and API Configuration
Workspace & Item Discovery
Preferred: Use COMMON-CLI.md item discovery patterns (Finding things in Fabric) to find workspaces and items by name.
Fallback (when workspace is already known):
# List workspaces
az rest --method get --resource "$FABRIC_RESOURCE_SCOPE" --url "$FABRIC_API_URL/workspaces" --query "value[].{name:displayName, id:id}" --output table
read -p "Workspace ID: " workspaceId
# List lakehouses in workspace
az rest --method get --resource "$FABRIC_RESOURCE_SCOPE" --url "$FABRIC_API_URL/workspaces/$workspaceId/items?type=Lakehouse" --query "value[].{name:displayName, id:id}" --output table
read -p "Lakehouse ID: " lakehouseId
Session Management
# Check for existing idle session (avoid resource waste)
sessionId=$(az rest --method get --resource "$FABRIC_RESOURCE_SCOPE" --url "$FABRIC_API_URL/workspaces/$workspaceId/lakehouses/$lakehouseId/$LIVY_API_PATH/sessions" --query "sessions[?state=='idle'][0].id" --output tsv)
# Create if none available - FORCE STARTER POOL USAGE
if [[ -z "$sessionId" ]]; then
cat > /tmp/body.json << 'EOF'
{
"name":"analysis",
"driverMemory":"56g",
"driverCores":8,
"executorMemory":"56g",
"executorCores":8,
"conf": {
"spark.dynamicAllocation.enabled": "true",
"spark.fabric.pool.name": "Starter Pool"
}
}
EOF
sessionId=$(az rest --method post --resource "$FABRIC_RESOURCE_SCOPE" --url "$FABRIC_API_URL/workspaces/$workspaceId/lakehouses/$lakehouseId/$LIVY_API_PATH/sessions" --body @/tmp/body.json --query "id" --output tsv)
echo "⏳ Waiting for starter pool session to be ready..."
# With starter pools, this should be 3-5 seconds
timeout=30 # Reduced from 90s since starter pools are fast
while [ $timeout -gt 0 ]; do
state=$(az rest --resource "$FABRIC_RESOURCE_SCOPE" --url "$FABRIC_API_URL/workspaces/$workspaceId/lakehouses/$lakehouseId/$LIVY_API_PATH/sessions/$sessionId" --query "state" --output tsv)
if [[ "$state" == "idle" ]]; then
echo "✅ Session ready in starter pool!"
break
fi
echo " Session state: $state (${timeout}s remaining)"
sleep 3
timeout=$((timeout - 3))
done
fi
Data Exploration (Fabric-Specific Patterns)
# Execute statement (LLM knows Python/Spark syntax)
cat > /tmp/body.json << 'EOF'
{
"code": "spark.sql(\"SHOW TABLES\").show(); df = spark.table(\"your_table\"); df.describe().show()",
"kind": "pyspark"
}
EOF
az rest --method post --resource "$FABRIC_RESOURCE_SCOPE" --url "$FABRIC_API_URL/workspaces/$workspaceId/lakehouses/$lakehouseId/$LIVY_API_PATH/sessions/$sessionId/statements" --body @/tmp/body.json
Key Fabric Patterns
| Pattern | Code | Use Case |
|---|---|---|
| Table Discovery | spark.sql("SHOW TABLES") |
List available tables |
| Cross-Lakehouse | spark.sql("SELECT * FROM other_workspace.table") |
Query across workspaces |
| Delta Features | df.history(), df.readVersion(1) |
Time travel, versioning |
| Schema Evolution | df.printSchema() |
Understand structure |
Session Cleanup
# Clean up idle sessions (optional)
az rest --method get --resource "$FABRIC_RESOURCE_SCOPE" --url "$FABRIC_API_URL/workspaces/$workspaceId/lakehouses/$lakehouseId/$LIVY_API_PATH/sessions" --query "sessions[?state=='idle'].id" --output tsv | xargs -I {} az rest --method delete --resource "$FABRIC_RESOURCE_SCOPE" --url "$FABRIC_API_URL/workspaces/$workspaceId/lakehouses/$lakehouseId/$LIVY_API_PATH/sessions/{}"
Focus: This skill provides Fabric-specific REST API patterns. LLM already knows Python/Spark syntax — we focus on Fabric integration, session management, and API endpoints.
Weekly Installs
18
Repository
microsoft/skill…r-fabricGitHub Stars
301
First Seen
1 day ago
Security Audits