manage-alerts
Manage Alerts
CRUD for real, persistent Kibana alerting rules — saved objects that keep running after the MCP
session ends, evaluate on their schedule, and (if connected to actions) notify via Slack, email,
webhook, etc. Rules created through this tool are tagged elastic-o11y-mcp by default so they're
easy to find and clean up.
Prerequisites
- Kibana with Alerting enabled. No specific backend — works on any numeric metric field in any index pattern.
- Tool gating. This tool only registers when the operator has explicitly set a Kibana URL in the
MCP install config. If
kibana_urlis blank the tool doesn't appear in the LLM's tool catalog at all — a deliberate feature so operators can run the server strictly read-only (no rule creation, and more importantly no rule deletion). If the user can't seemanage-alerts, their server is read-only on purpose. - A notification connector must be attached separately in Kibana for rules that should page someone. Without an action, rules fire silently (visible in Kibana → Alerts & Insights).
Operations
| Operation | Purpose |
|---|---|
create |
Create a new persistent custom-threshold rule. |
list |
List rules, optionally filtered by tags/name/type. |
get |
Fetch a single rule by id, with execution status. |
delete |
Permanently remove a rule by id. Irreversible. |
When to use this vs observe
Use manage-alerts (create) when... |
Use observe when... |
|---|---|
| User wants durable alerting ("page me from now on") | User wants one-off monitoring ("for the next 10 min") |
| Rule should keep running after session ends | Rule only matters inside the current conversation |
| An operator should be paged out-of-band | The agent is validating a remediation in real time |
| The threshold is well-understood | The threshold is still being calibrated |
If the user is still calibrating, suggest observe first — don't create durable rules until the
threshold is validated.
operation='create'
{
"operation": "create",
"rule_name": "Frontend Pod Memory > 80MB",
"metric_field": "k8s.pod.memory.working_set",
"threshold": 80000000,
"comparator": ">",
"kql_filter": "kubernetes.namespace: otel-demo AND service.name: frontend",
"check_interval": "5m",
"agg_type": "avg",
"time_size": 5,
"time_unit": "m",
"index_pattern": "metrics-*"
}
Parameter-filling guidance:
rule_name: derive from user intent — make it descriptive and specific. Bad: "Memory rule." Good: "Frontend Pod Memory > 80MB (otel-demo)."metric_field: a real numeric field present in the index. Don't guess — if the user names a metric vaguely ("memory"), ask which field or cross-reference withapm-health-summaryoutput first.threshold: in the field's native units. 80 MB as aworking_setis80000000(bytes), not80. Always clarify units.comparator: default>. Use<for low-watermark rules ("fire if free memory drops below").kql_filter: narrow the scope — without a filter the rule applies to every document in the index. Strongly recommended for shared environments.check_interval: default5m(matches Kibana's own default and pairs with the 5m lookback). Use1monly for pageable SLO breaches where the extra reactivity is worth the cycles;15m–1hfor capacity or trend rules.agg_type: defaultavg. Usemaxfor worst-case (p99-ish),countfor event frequency.time_size+time_unit: the aggregation window. Default 5m. Wider windows smooth; narrow windows react fast but can be noisy.index_pattern: defaultmetrics-*. Override forlogs-*,traces-apm*, or custom indices.
operation='list'
{
"operation": "list",
"tags": ["elastic-o11y-mcp"],
"search": "memory",
"per_page": 50
}
tags: defaults to["elastic-o11y-mcp"]when omitted, so the user sees only rules this MCP created. Passtags: []to list every rule in the Kibana cluster (may be large — warn the user).search: optional substring match against rule name.rule_type_ids: optional filter — e.g.["observability.rules.custom_threshold"]. Omit to include every rule type.per_page/page: standard pagination. Default 50 per page.
Response includes a rules array with a compact summary (name, condition, status, tags) per rule,
and the view renders each as a card with Inspect / Delete buttons.
operation='get'
{ "operation": "get", "rule_id": "c5f2e1b8-..." }
Returns the full rule definition plus execution status. Typical chain: list → user picks one → get.
operation='delete'
Two-step flow, enforced in the tool itself. The tool refuses to delete anything without an
explicit confirm: true — on the first call you get a preview back, then you re-invoke with
confirm: true after the user approves.
Step 1 — preview (omit confirm or pass confirm: false):
{ "operation": "delete", "rule_id": "c5f2e1b8-..." }
Response shape: { operation: "delete", deleted: false, confirmation_required: true, preview: {...} }.
The preview contains the full rule summary (name, condition, tags, etc.). Nothing has been deleted
yet — the tool only fetched the rule.
Your job on seeing a preview:
- Quote the rule name (not just the id) back to the user. "Delete rule 'Frontend Pod Memory > 80MB' (id c5f2…)? This is irreversible."
- Wait for explicit approval. "yes", "go ahead", "delete it" — not a vague "sure".
- If the user approves, dispatch Step 2. If they decline or hesitate, do nothing.
Step 2 — confirmed delete (pass confirm: true):
{ "operation": "delete", "rule_id": "c5f2e1b8-...", "confirm": true }
Only call this after the user has explicitly approved in the current turn. The Kibana saved object is gone the moment the API returns 204; there is no undo.
Never pass confirm: true on the first invocation from a vague instruction like "clean up the
alerts". Always list first, preview each candidate, and confirm before every delete.
Never batch-delete multiple rules in one exchange unless the user has explicitly authorized it with specific IDs or a clear scope ("delete all three of those").
After the tool returns
All operations emit a common response envelope:
status:"success" | "error".operation: echoes the operation for view rendering.message: human-readable summary.investigation_actions: click-to-send next-step prompts (chain tolist,get,delete, orobserveas appropriate).
create additionally returns rule_id and cleanup_hint (a one-line DELETE instruction plus the
equivalent manage-alerts call).
list returns total, returned, page, and a rules array.
get returns the full rule as rule (summary) and raw_rule (unfiltered Kibana response).
delete returns rule_id and deleted: true.
The MCP App view renders the appropriate layout per operation: a created-rule card for create, a
detail card for get, a list of rule cards with Inspect/Delete buttons for list, and a deletion
confirmation for delete.
Confirm to the user after each operation:
- create: quote the rule name and condition. Mention Kibana → Alerts & Insights → Rules. If no actions are attached, say so: "this rule fires but doesn't notify anyone yet — attach an action in Kibana to page Slack/email/webhook." Offer the cleanup hint.
- list: how many rules were found and what filter was applied. If you defaulted to the
elastic-o11y-mcptag, mention that the user can passtags: []to see everything. - get: summarize the rule's state — enabled/disabled, last execution, active alert count.
- delete: confirm the deletion and suggest a follow-up
listto verify.
Key principles
- These are persistent, real saved objects. Always confirm the rule name and condition back to
the user after
createanddelete. - Attach a KQL filter on
create. Unfiltered rules againstmetrics-*evaluate across everything — a recipe for noise and false alerts. - Units matter. Bytes vs MB vs percentage — always be explicit about the threshold's units.
- Default the
listfilter toelastic-o11y-mcp. This keeps the UX focused on rules this MCP manages. Only broaden when the user explicitly asks. - Confirm before deleting. The delete path is irreversible; quote the rule name, wait for explicit approval.
- If the tool isn't available, the operator disabled it on purpose. Don't suggest workarounds to create rules via raw ES / Kibana API calls — respect the read-only posture.