skills/aliyun/alibabacloud-aiops-skills/alibabacloud-emr-spark-manage

alibabacloud-emr-spark-manage

SKILL.md

Alibaba Cloud EMR Serverless Spark Workspace Full Lifecycle Management

Manage EMR Serverless Spark workspaces through Alibaba Cloud API. You are a Spark-savvy data engineer who not only knows how to call APIs, but also knows when to call them and what parameters to use.

Domain Knowledge

Product Architecture

EMR Serverless Spark is a fully-managed Serverless Spark service provided by Alibaba Cloud, supporting batch processing, interactive queries, and stream computing:

  • Serverless Architecture: No need to manage underlying clusters, compute resources allocated on-demand, billed by CU
  • Multi-engine Support: Supports Spark batch processing, Kyuubi (compatible with Hive/Spark JDBC), session clusters
  • Elastic Scaling: Resource queues scale on-demand, no need to reserve fixed resources

Core Concepts

Concept Description
Workspace Top-level resource container, containing resource queues, jobs, Kyuubi services, etc.
Resource Queue Compute resource pool within a workspace, allocated in CU units
CU (Compute Unit) Compute resource unit, 1 CU = 1 core CPU + 4 GiB memory
JobRun Submission and execution of a Spark job
Kyuubi Service Interactive SQL gateway compatible with open-source Kyuubi, supports JDBC connections
SessionCluster Long-running interactive session environment
ReleaseVersion Available Spark engine versions

Job Types

Type Description Applicable Scenarios
Spark JAR Java/Scala packaged JAR jobs ETL, data processing pipelines
PySpark Python Spark jobs Data science, machine learning
Spark SQL Pure SQL jobs Data analysis, report queries

Recommended Configurations

  • Development & Testing: Pay-as-you-go + 50 CU resource queue
  • Small-scale Production: 200 CU resource queue
  • Large-scale Production: 2000+ CU resource queue, elastic scaling on-demand

Prerequisites

1. Credential Configuration

Alibaba Cloud CLI/SDK will automatically obtain authentication information from the default credential chain, no need to explicitly configure credentials. Supports multiple credential sources, including configuration files, environment variables, instance roles, etc.

Recommended to use Alibaba Cloud CLI to configure credentials:

aliyun configure

For more credential configuration methods, refer to Alibaba Cloud CLI Credential Management.

2. Grant Service Roles (Required for First-time Use)

Before using EMR Serverless Spark, you need to grant the account the following two roles (see RAM Permission Policies for details):

Role Name Type Description
AliyunServiceRoleForEMRServerlessSpark Service-linked role EMR Serverless Spark service uses this role to access your resources in other cloud products
AliyunEMRSparkJobRunDefaultRole Job execution role Spark jobs use this role to access OSS, DLF and other cloud resources during execution

For first-time use, you can authorize through the EMR Serverless Spark Console with one click, or manually create in the RAM console.

3. RAM Permissions

RAM users need corresponding permissions to operate EMR Serverless Spark. For detailed permission policies, specific Action lists, and authorization commands, refer to RAM Permission Policies.

4. OSS Storage

Spark jobs typically need OSS storage for JAR packages, Python scripts, and output data:

# Check for available OSS Buckets
aliyun oss ls --user-agent AlibabaCloud-Agent-Skills

CLI/SDK Invocation

Invocation Method

All APIs are version 2023-08-08, request method is ROA style (RESTful).

# Using Alibaba Cloud CLI (ROA style)
# Important:
#   1. Must add --force --user-agent AlibabaCloud-Agent-Skills parameters, otherwise local metadata validation will report "can not find api by path" error
#   2. Recommend always adding --region parameter to specify region (GET can omit if CLI has default Region configured, but recommend explicit specification; must add if not configured, otherwise server reports MissingParameter.regionId error)
#   3. POST/PUT/DELETE write operations need to append ?regionId=cn-hangzhou at end of URL, --region alone is not enough
#      GET requests only need --region

# POST request (note URL append ?regionId=cn-hangzhou)
aliyun emr-serverless-spark POST "/api/v1/workspaces?regionId=cn-hangzhou" \
  --region cn-hangzhou \
  --header "Content-Type=application/json" \
  --body '{"workspaceName":"my-workspace","ossBucket":"oss://my-bucket","ramRoleName":"AliyunEMRSparkJobRunDefaultRole","paymentType":"PayAsYouGo","resourceSpec":{"cu":8}}' \
  --force --user-agent AlibabaCloud-Agent-Skills

# GET request (only need --region)
aliyun emr-serverless-spark GET /api/v1/workspaces --region cn-hangzhou --force --user-agent AlibabaCloud-Agent-Skills

# DELETE request (note URL append ?regionId=cn-hangzhou)
aliyun emr-serverless-spark DELETE "/api/v1/workspaces/{workspaceId}/jobRuns/{jobRunId}?regionId=cn-hangzhou" \
  --region cn-hangzhou --force --user-agent AlibabaCloud-Agent-Skills

Idempotency Rules

The following operations recommend using idempotency tokens to avoid duplicate submissions:

API Description
CreateWorkspace Duplicate submission will create multiple workspaces
StartJobRun Duplicate submission will submit multiple jobs
CreateSessionCluster Duplicate submission will create multiple session clusters

Intent Routing

Intent Operation Reference
Beginner / First-time use Full guide getting-started.md
Create workspace / New Spark Plan → CreateWorkspace workspace-lifecycle.md
Delete workspace / Destroy DeleteWorkspace workspace-lifecycle.md
Query workspace / List / Details ListWorkspaces workspace-lifecycle.md
Submit Spark job / Run task StartJobRun job-management.md
Query job status / Job list GetJobRun / ListJobRuns job-management.md
View job logs ListLogContents job-management.md
Cancel job / Stop job CancelJobRun job-management.md
View CU consumption GetCuHours job-management.md
Create Kyuubi service CreateKyuubiService kyuubi-service.md
Start / Stop Kyuubi Start/StopKyuubiService kyuubi-service.md
Execute SQL via Kyuubi Connect Kyuubi Endpoint kyuubi-service.md
Manage Kyuubi Token Create/List/DeleteKyuubiToken kyuubi-service.md
Scale resource queue / Not enough resources EditWorkspaceQueue scaling.md
View resource queue ListWorkspaceQueues scaling.md
Create session cluster CreateSessionCluster job-management.md
Query engine versions ListReleaseVersions api-reference.md
Check API parameters Parameter reference api-reference.md

Destructive Operation Protection

The following operations are irreversible. Before execution, must complete pre-check and confirm with user:

API Pre-check Steps Impact
DeleteWorkspace 1. ListJobRuns to confirm no running jobs 2. ListSessionClusters to confirm no running sessions 3. ListKyuubiServices to confirm no running Kyuubi 4. User explicit confirmation Permanently delete workspace and all associated resources
CancelJobRun 1. GetJobRun to confirm job status is Running 2. User explicit confirmation Abort running job, compute results may be lost
DeleteSessionCluster 1. GetSessionCluster to confirm status is stopped 2. User explicit confirmation Permanently delete session cluster
DeleteKyuubiService 1. GetKyuubiService to confirm status is NOT_STARTED 2. Confirm no active JDBC connections 3. User explicit confirmation Permanently delete Kyuubi service
DeleteKyuubiToken 1. GetKyuubiToken to confirm Token ID 2. Confirm connections using this Token can be interrupted 3. User explicit confirmation Delete Token, connections using this Token will fail authentication
StopKyuubiService 1. Remind user all active JDBC connections will be disconnected 2. User explicit confirmation All active JDBC connections disconnected
StopSessionCluster 1. Remind user session will terminate 2. User explicit confirmation Session state lost
CancelKyuubiSparkApplication 1. Confirm application ID and status 2. User explicit confirmation Abort running Spark query

Confirmation template:

About to execute: <API>, target: <Resource ID>, impact: <Description>. Continue?

Security Guidelines

Job Submission Protection

Before submitting Spark jobs, must:

  1. Confirm workspace ID and resource queue
  2. Confirm code type codeType (required: JAR / PYTHON / SQL)
  3. Confirm Spark parameters and main program resource
  4. Display equivalent spark-submit command
  5. Get user explicit confirmation before submission

Timeout Control

Operation Type Timeout Recommendation
Read-only queries 30 seconds
Write operations 60 seconds
Polling wait 30 seconds per attempt, total not exceeding 30 minutes

Error Handling

Error Code Cause Agent Should Execute
MissingParameter.regionId CLI not configured with default Region and missing --region, or write operations (POST/PUT/DELETE) URL not appended with ?regionId= GET add --region (CLI with default Region configured can auto-use); write operations must append ?regionId=cn-hangzhou to URL
Throttling API rate limiting Wait 5-10 seconds before retry
InvalidParameter Invalid parameter Read error Message, correct parameter
Forbidden.RAM Insufficient RAM permissions Inform user of missing permissions
OperationDenied Operation not allowed Query current status, inform user to wait
null (ErrorCode empty) Accessing non-existent or unauthorized workspace sub-resources (List* type APIs) Use ListWorkspaces to confirm workspace ID is correct, check RAM permissions

Related Documentation

Weekly Installs
10
GitHub Stars
28
First Seen
2 days ago
Installed on
opencode10
gemini-cli10
deepagents10
antigravity10
github-copilot10
amp10