cx-create-dashboard

Installation
SKILL.md

Create Coralogix Dashboard

Produces a Coralogix dashboard for a target service and deploys it via the cx CLI. Workflow: discover the service's telemetry, align on intent with the user, draft a plan, emit the JSON, live-verify every query through cx, then create the dashboard in a chosen folder.

Only use metric names, log fields, and span attributes you can cite from the service's code, README, configuration, or a live query that returned a result. Do not invent them.


Reference files

Load these files for domain-specific guidance:

Task Reference
DataPrime query syntax references/dataprime-reference.md
PromQL query syntax, counters vs gauges, histograms references/promql-guidelines.md
Log field discovery, query patterns, wildfind policy references/logs-querying.md
Span field discovery, latency analysis, trace queries references/spans-querying.md
Dashboard-specific query gotchas (${__range}, promqlQueryType) references/query-syntax.md
Widget JSON templates references/widget-templates.md

For choosing the right signal (metrics / logs / traces), use cx-telemetry-querying.


Dashboard Management

Beyond creating dashboards, use these commands to manage existing ones:

Command Purpose
cx dashboards catalog -o json List all dashboards in the catalog
cx dashboards get <id> -o json Get a dashboard definition (useful as a template)
cx dashboards folders list -o json List dashboard folders
cx dashboards folders create --name "Name" Create a dashboard folder
cx dashboards folders create --name "Sub" --parent-id <id> Create a nested folder

To duplicate or modify an existing dashboard:

cx dashboards get <dashboard-id> -o json > dashboard.json
# Edit dashboard.json (change name, modify widgets, etc.)
cx dashboards create --from-file dashboard.json

Workflow

Track progress through this checklist:

Dashboard Progress:
- [ ] Phase 1: Discover telemetry & business meaning
- [ ] Phase 2: Gather dashboard specifications from user
- [ ] Phase 3: Draft internal dashboard plan (sections/rows/widgets)
- [ ] Phase 4: Generate the Coralogix JSON
- [ ] Phase 5: Live-verify every query through the cx CLI
- [ ] Phase 6: Self-verify structure against the checklist
- [ ] Phase 7: Deploy via `cx dashboards create`

Proceed in order. Don't jump to Phase 4 before the user approves the Phase 3 plan, and don't run Phase 7 before Phases 5 and 6 both pass.


Phase 1: Discover telemetry & business meaning

For the target service, gather:

  1. Business purpose - read README.md and the top-level entrypoint (main.*, index.*, cmd/main.go, etc.). Summarize in 2–3 sentences what it does, its key stages, and what can go wrong.
  2. Metrics - for each candidate keyword (service name, subsystem, verbs like request, error, latency, dlq) run cx metrics search --name '*<keyword>*'. When a metric looks promising, list its labels with cx metrics get-labels <metric>. Only use names cx metrics search returns - this is what prevents invented metrics from reaching Phase 5. Cross-check the service's instrumentation (prometheus_client, promauto.NewCounter/Histogram/Gauge, OTel meters, prom-client, Micrometer, metrics.py) for semantics and histogram buckets (_sum, _count, _bucket).
  3. Logs - discover custom $d.* fields with cx search-fields "<description>" --dataset logs before assuming a field exists. Sample message templates and severity with cx logs "filter \$l.applicationname == '<app>'" --limit 5 -o json. Standard fields ($m.severity, $m.timestamp, $l.applicationname, $l.subsystemname) don't need discovery.
  4. Spans / traces - discover span attributes with cx search-fields "<description>" --dataset spans. Sample with cx spans "filter \$l.serviceName == '<svc>'" --limit 5 -o json. Error conventions vary ($d.tags.error, $d.http.status_code); check samples before filtering.
  5. Message buses & DLQs - grep for Kafka, RabbitMQ, SQS, Pub/Sub clients and any dlq/DLQ references. Note topic/queue names for DLQ panels.
  6. Service configuration - check meta.yaml, Helm values.yaml, Deployment, Dockerfile, chart.yaml. Extract:
    • The applicationname / subsystemname label values as they appear in Coralogix.
    • Tenant/account/team identifiers used as metric or log labels.
    • Deployment environments (prod, staging, dev, …).

If the signal for a question is ambiguous (e.g. "how much revenue last week"), delegate to cx-telemetry-querying first.

Produce a short internal summary before moving on. If critical telemetry is missing (e.g. no metrics), surface that to the user and ask whether they want a log-only or trace-only dashboard.


Phase 2: Gather dashboard specifications

Ask the user a focused set (≤6). Prefer AskQuestion:

  1. Audience & use - on-call triage, product/business tracking, capacity planning, customer success?
  2. Default time range - typical viewing window (e.g. 24h, 7d). Queries still use ${__range} so users can zoom.
  3. Slicing dimensions - top-level filters (tenant_id, account_id, subsystem_name, region, env, …).
  4. Environment scope - which environments to include/exclude (common default: exclude dev, staging, test).
  5. SLO-ish signals - success-rate, latency, or throughput targets to highlight?
  6. Priorities - what to see first (drives row ordering and which section is collapsed: true).

Don't block on answers you can reasonably infer - state the inference and continue.


Phase 3: Draft the internal plan

Write a markdown plan the user can approve before JSON generation:

## Dashboard: <Service> - <Purpose>

### Section 1: <Overview> (collapsed: false)
- Row 1: [widget type] <title> - <what it shows> - source: metrics|logs|spans
- Row 2: ...

### Section 2: <Deep dive> (collapsed: false)
...

### Section N: <Logs & errors> (collapsed: true)
...

### Top-level filters
- <label> (<source>)

### Assumptions / gaps
- ...

Section design:

  • First section: at-a-glance health (gauges + key rates), always expanded.
  • Pair related time-series in the same row (rate + latency).
  • Final section (raw logs, rare breakdowns): collapsed: true.
  • Aim for 3–5 sections, 6–20 widgets total.

Widget-type selection:

Signal Widget type
Single headline number (count, % success, totals) gauge (Coralogix calls this "stat")
Breakdown across ≤8 categories pieChart
Change over time (rate, latency, count per bucket) lineChart
Top-N tables, last errors, per-entity listings dataTable

Don't use other widget types unless the user asks.

Wait for the user to approve or adjust the plan before emitting JSON.


Phase 4: Generate the Coralogix JSON

Produce a single JSON document following references/widget-templates.md. Key rules:

  1. Top-level shape:
    {
      "id": "<21-char-nanoid>",
      "name": "<Dashboard Name>",
      "layout": { "sections": [ ... ] },
      "variables": [],
      "variablesV2": [],
      "filters": [ ... ],
      "relativeTimeFrame": "<seconds>s",
      "annotations": [],
      "off": {},
      "actions": []
    }
    
  2. IDs - fresh UUIDs for every section, row, widget, and query id.
  3. Row height - "appearance": { "height": 19 } unless there's a reason to change.
  4. Section options - include options.custom.name, collapsed, and color.predefined: "SECTION_PREDEFINED_COLOR_UNSPECIFIED".
  5. Filters - one entry per slicing dimension from Phase 2. Default operator equals with empty values so users can fill in. Use notEquals for environment exclusions (see references/widget-templates.md).
  6. relativeTimeFrame - default "172800s" (48h) unless the user specified otherwise.

For query syntax follow references/query-syntax.md; for the full query languages load references/dataprime-reference.md and references/promql-guidelines.md.


Phase 5: Live-verify every query through the cx CLI

Every PromQL and DataPrime query in the draft has to successfully run through cx before Phase 7. This catches invented metric names, typoed field paths, and malformed pipelines.

Frequent vs Archive (what / when / where in JSON)

What:

  • Frequent (TIER_FREQUENT_SEARCH): hot tier for fast search on recent logs/spans.
  • Archive (TIER_ARCHIVE): cold tier for older logs/spans (long-term).

When to choose:

  • Choose Frequent for on-call and recent investigations (hours/days).
  • Choose Archive for long lookbacks (weeks/months) or when the time range is beyond hot retention.

The two languages are verified against different windows:

  • PromQL: map relativeTimeFrame to a $RANGE token (e.g. 48h for 172800s), substitute ${__range} with [$RANGE] for the CLI call, then restore ${__range} in the JSON before Phase 6. Range vectors are window-sensitive, so the check has to match what the dashboard will evaluate.
  • DataPrime: verify against a fixed short window (now-15mnow, --limit 1). The goal is syntax / field / pipeline validation, not data-presence on the dashboard's window — a short window is faster and a cleaner fail signal.

Full procedure (CLI invocations, $RANGE mapping table, retry budget, failure modes): references/verification.md.

If a query can't be made to pass within the retry budget, surface it to the user with the CLI error verbatim - don't ship a broken widget.


Phase 6: Self-verify structure

Run this checklist against the final JSON. Fix and re-check if any item fails before Phase 7.

Query syntax (dashboard-specific)

  • Every PromQL range vector in a metrics widget uses [${__range}] - never [$__range], never [5m] (unless the panel is intentionally a sliding window).
  • promqlQueryType is PROM_QL_QUERY_TYPE_INSTANT for single-value widgets (gauge, pieChart, dataTable). Omitted for lineChart.
  • DataPrime log queries use $d.message / $l.applicationname / unquoted severity enums (full rules: references/dataprime-reference.md).
  • Every DataPrime widget query starts with source logs or source spans (dashboard widgets require the source prefix; Phase 5 verification strips it before handing the pipeline to cx logs / cx spans).
  • Success-rate denominators wrapped in clamp_min(..., 1).
  • Histogram queries use the correct suffix (_sum, _count, _bucket).
  • Widget queries are valid without the dashboard-level filters - Coralogix injects them at render time.

Structure

  • Each section has id.value, rows, and options.custom.
  • Each row has id.value, appearance.height, and widgets.
  • Each widget has a unique id.value and a definition with exactly one of gauge / pieChart / lineChart / dataTable.
  • Success-rate gauges use thresholdType: "THRESHOLD_TYPE_ABSOLUTE" with green at high values; error/DLQ gauges use red at high values.
  • "Total" / "stat" widgets are encoded as gauge, not as a stat type.
  • Top-level filters includes each slicing dimension from Phase 2.
  • All IDs are freshly generated UUIDs, unique within the document.

Content

  • Dashboard name is descriptive ("<Service> - <Purpose>").
  • Widget titles are short, human-readable, and match what the query computes.
  • The logs/errors section is collapsed: true unless the user said otherwise.

Phase 7: Deploy via cx dashboards create

Don't tell the user to paste JSON into the Coralogix UI - deploy it directly.

  1. List folders: cx dashboards folders list -o json.
  2. Suggest the best folder match (team, product area, or a folder named after the service). Default to root (omit --folder) if nothing fits.
  3. Write the verified JSON to a temp file and run cx dashboards create --from-file /tmp/cx-dashboard-<slug>.json --folder <id>. The CLI generates the requestId envelope and prints the created dashboard ID.

Full procedure (folder-picking UX, command templates, idempotency note): references/deploy.md.

On failure: show the CLI error verbatim and return to Phase 5. The most common cause is a query that parses locally but the live API rejects.


Output format for the user

## Plan
<the approved Phase 3 plan>

## Verification
- PromQL queries verified: <N>/<N>
- DataPrime queries verified: <N>/<N>

## Deployed
- Dashboard: **<Name>**
- ID: `<id>`
- Folder: `<folder name or "root">`
- Profile: `<cx profile>`

The dashboard is live in Coralogix. Adjust filter values (e.g. `account_id`) after opening it.

References

Related Skills

  • cx-observability-setup - full monitoring setup workflow (views, webhooks, notifications, integrations)
  • cx-incident-management - SLO and alert-connected dashboards, incident triage
  • cx-telemetry-querying - discover the right telemetry signal before building dashboards
Related skills
Installs
83
GitHub Stars
98
First Seen
4 days ago