sre-hadoop-cli
SKILL.md
sre-hadoop cli
命令参考型 skill。直接调用,按问题自由选择命令,不预设诊断流程。
基本约定
- macOS / Linux 入口:
scripts/sre-hadoop - Windows 入口:
scripts/sre-hadoop.cmd - skill 内置
bin/sre-hadoop-<os>-<arch>[.exe]多平台产物,自动选择 - 配置:
~/.sre-hadoop/config.json - 安装和配置示例:
README.md - 所有输出统一 JSON
- 参数细节、category、错误码:
references/cli-reference.md
命令总览
| 组件 | 命令 | 数据源 |
|---|---|---|
| YARN | apps, app-info, app-containers, rm-metrics, queue-snapshot |
RM REST API / InfluxDB |
| Spark | apps, jobs, stages, tasks, executors, sql, env, download-log, timeline |
Spark HS REST API / YARN RM |
| HDFS | nn-metrics |
InfluxDB |
| HMS | jvm, api, ddl |
InfluxDB |
| OS | host-metrics |
InfluxDB |
常用入口
YARN
scripts/sre-hadoop yarn apps --states RUNNING --limit 10
scripts/sre-hadoop yarn app-info --app-id <APP_ID>
scripts/sre-hadoop yarn app-containers --app-id <APP_ID>
scripts/sre-hadoop yarn rm-metrics --category rpc
scripts/sre-hadoop yarn queue-snapshot --queue <QUEUE>
Spark
scripts/sre-hadoop spark apps --limit 10
scripts/sre-hadoop spark timeline --app-id <APP_ID>
scripts/sre-hadoop spark jobs --app-id <APP_ID>
scripts/sre-hadoop spark stages --app-id <APP_ID>
scripts/sre-hadoop spark tasks --app-id <APP_ID> --stage-id <STAGE_ID> --top 20
scripts/sre-hadoop spark executors --app-id <APP_ID>
scripts/sre-hadoop spark sql --app-id <APP_ID>
scripts/sre-hadoop spark env --app-id <APP_ID>
scripts/sre-hadoop spark download-log --app-id <APP_ID> --executor-id driver --log-type stderr
基础设施指标
scripts/sre-hadoop hdfs nn-metrics --category dfs
scripts/sre-hadoop hms jvm
scripts/sre-hadoop hms api
scripts/sre-hadoop hms ddl
scripts/sre-hadoop os host-metrics --host <HOST> --category cpu,mem,diskio
输出格式
成功:
{
"ok": true,
"data": {},
"meta": {
"command": "yarn app-info",
"cluster": "default",
"duration_ms": 5
}
}
失败:
{
"ok": false,
"error": {
"code": "TIMEOUT",
"message": "request timeout",
"hint": "check backend connectivity"
}
}
使用规则
- 优先直接用现成命令,不手写 HTTP 请求
- 按当前问题自由拼装命令
- 独立查询可以并行
- 需要完整参数表时再读
references/cli-reference.md