manage-alerts

Installation
SKILL.md

Manage Alerts

CRUD for real, persistent Kibana alerting rules — saved objects that keep running after the MCP session ends, evaluate on their schedule, and (if connected to actions) notify via Slack, email, webhook, etc. Rules created through this tool are tagged elastic-o11y-mcp by default so they're easy to find and clean up.

Prerequisites

  • Kibana with Alerting enabled. No specific backend — works on any numeric metric field in any index pattern.
  • Tool gating. This tool only registers when the operator has explicitly set a Kibana URL in the MCP install config. If kibana_url is blank the tool doesn't appear in the LLM's tool catalog at all — a deliberate feature so operators can run the server strictly read-only (no rule creation, and more importantly no rule deletion). If the user can't see manage-alerts, their server is read-only on purpose.
  • A notification connector must be attached separately in Kibana for rules that should page someone. Without an action, rules fire silently (visible in Kibana → Alerts & Insights).

Operations

Operation Purpose
create Create a new persistent custom-threshold rule.
list List rules, optionally filtered by tags/name/type.
get Fetch a single rule by id, with execution status.
delete Permanently remove a rule by id. Irreversible.

When to use this vs observe

Use manage-alerts (create) when... Use observe when...
User wants durable alerting ("page me from now on") User wants one-off monitoring ("for the next 10 min")
Rule should keep running after session ends Rule only matters inside the current conversation
An operator should be paged out-of-band The agent is validating a remediation in real time
The threshold is well-understood The threshold is still being calibrated

If the user is still calibrating, suggest observe first — don't create durable rules until the threshold is validated.

operation='create'

{
  "operation": "create",
  "rule_name": "Frontend Pod Memory > 80MB",
  "metric_field": "k8s.pod.memory.working_set",
  "threshold": 80000000,
  "comparator": ">",
  "kql_filter": "kubernetes.namespace: otel-demo AND service.name: frontend",
  "check_interval": "5m",
  "agg_type": "avg",
  "time_size": 5,
  "time_unit": "m",
  "index_pattern": "metrics-*"
}

Parameter-filling guidance:

  • rule_name: derive from user intent — make it descriptive and specific. Bad: "Memory rule." Good: "Frontend Pod Memory > 80MB (otel-demo)."
  • metric_field: a real numeric field present in the index. Don't guess — if the user names a metric vaguely ("memory"), ask which field or cross-reference with apm-health-summary output first.
  • threshold: in the field's native units. 80 MB as a working_set is 80000000 (bytes), not 80. Always clarify units.
  • comparator: default >. Use < for low-watermark rules ("fire if free memory drops below").
  • kql_filter: narrow the scope — without a filter the rule applies to every document in the index. Strongly recommended for shared environments.
  • check_interval: default 5m (matches Kibana's own default and pairs with the 5m lookback). Use 1m only for pageable SLO breaches where the extra reactivity is worth the cycles; 15m1h for capacity or trend rules.
  • agg_type: default avg. Use max for worst-case (p99-ish), count for event frequency.
  • time_size + time_unit: the aggregation window. Default 5m. Wider windows smooth; narrow windows react fast but can be noisy.
  • index_pattern: default metrics-*. Override for logs-*, traces-apm*, or custom indices.

operation='list'

{
  "operation": "list",
  "tags": ["elastic-o11y-mcp"],
  "search": "memory",
  "per_page": 50
}
  • tags: defaults to ["elastic-o11y-mcp"] when omitted, so the user sees only rules this MCP created. Pass tags: [] to list every rule in the Kibana cluster (may be large — warn the user).
  • search: optional substring match against rule name.
  • rule_type_ids: optional filter — e.g. ["observability.rules.custom_threshold"]. Omit to include every rule type.
  • per_page / page: standard pagination. Default 50 per page.

Response includes a rules array with a compact summary (name, condition, status, tags) per rule, and the view renders each as a card with Inspect / Delete buttons.

operation='get'

{ "operation": "get", "rule_id": "c5f2e1b8-..." }

Returns the full rule definition plus execution status. Typical chain: list → user picks one → get.

operation='delete'

Two-step flow, enforced in the tool itself. The tool refuses to delete anything without an explicit confirm: true — on the first call you get a preview back, then you re-invoke with confirm: true after the user approves.

Step 1 — preview (omit confirm or pass confirm: false):

{ "operation": "delete", "rule_id": "c5f2e1b8-..." }

Response shape: { operation: "delete", deleted: false, confirmation_required: true, preview: {...} }. The preview contains the full rule summary (name, condition, tags, etc.). Nothing has been deleted yet — the tool only fetched the rule.

Your job on seeing a preview:

  1. Quote the rule name (not just the id) back to the user. "Delete rule 'Frontend Pod Memory > 80MB' (id c5f2…)? This is irreversible."
  2. Wait for explicit approval. "yes", "go ahead", "delete it" — not a vague "sure".
  3. If the user approves, dispatch Step 2. If they decline or hesitate, do nothing.

Step 2 — confirmed delete (pass confirm: true):

{ "operation": "delete", "rule_id": "c5f2e1b8-...", "confirm": true }

Only call this after the user has explicitly approved in the current turn. The Kibana saved object is gone the moment the API returns 204; there is no undo.

Never pass confirm: true on the first invocation from a vague instruction like "clean up the alerts". Always list first, preview each candidate, and confirm before every delete.

Never batch-delete multiple rules in one exchange unless the user has explicitly authorized it with specific IDs or a clear scope ("delete all three of those").

After the tool returns

All operations emit a common response envelope:

  • status: "success" | "error".
  • operation: echoes the operation for view rendering.
  • message: human-readable summary.
  • investigation_actions: click-to-send next-step prompts (chain to list, get, delete, or observe as appropriate).

create additionally returns rule_id and cleanup_hint (a one-line DELETE instruction plus the equivalent manage-alerts call).

list returns total, returned, page, and a rules array.

get returns the full rule as rule (summary) and raw_rule (unfiltered Kibana response).

delete returns rule_id and deleted: true.

The MCP App view renders the appropriate layout per operation: a created-rule card for create, a detail card for get, a list of rule cards with Inspect/Delete buttons for list, and a deletion confirmation for delete.

Confirm to the user after each operation:

  • create: quote the rule name and condition. Mention Kibana → Alerts & Insights → Rules. If no actions are attached, say so: "this rule fires but doesn't notify anyone yet — attach an action in Kibana to page Slack/email/webhook." Offer the cleanup hint.
  • list: how many rules were found and what filter was applied. If you defaulted to the elastic-o11y-mcp tag, mention that the user can pass tags: [] to see everything.
  • get: summarize the rule's state — enabled/disabled, last execution, active alert count.
  • delete: confirm the deletion and suggest a follow-up list to verify.

Key principles

  • These are persistent, real saved objects. Always confirm the rule name and condition back to the user after create and delete.
  • Attach a KQL filter on create. Unfiltered rules against metrics-* evaluate across everything — a recipe for noise and false alerts.
  • Units matter. Bytes vs MB vs percentage — always be explicit about the threshold's units.
  • Default the list filter to elastic-o11y-mcp. This keeps the UX focused on rules this MCP manages. Only broaden when the user explicitly asks.
  • Confirm before deleting. The delete path is irreversible; quote the rule name, wait for explicit approval.
  • If the tool isn't available, the operator disabled it on purpose. Don't suggest workarounds to create rules via raw ES / Kibana API calls — respect the read-only posture.
Weekly Installs
1
GitHub Stars
4
First Seen
1 day ago