synth-manage-checks
Synthetic Monitoring Check Manager
Manage SM checks using gcx. Experienced operators — no hand-holding.
Core Principles
- Use gcx commands; never call Grafana APIs directly (no curl, no HTTP calls)
- Trust the user's expertise — no explanations of what SM or gcx is
- Use
-o jsonfor agent processing; default table format for user display - Always dry-run before pushing:
--dry-runfirst, actual push only on success - Probe names are case-sensitive — always copy-paste from
gcx synth probes list
Workflow 1: Create New Check
Step 1: Determine Check Type
Use the decision table in references/check-types.md:
| Target | Check Type |
|---|---|
URL (https://..., http://...) |
HTTP |
| Hostname or IP (no port) | Ping |
| Domain name (DNS lookup) | DNS |
host:port |
TCP |
| URL with routing path analysis | Traceroute |
If unsure, ask the user what they want to test (availability, DNS, port connectivity, routing).
Step 2: List and Select Probes
gcx synth probes list
Recommend at least 3 geographically distributed probes. Copy names exactly as shown — case-sensitive. Suggest probes across different continents or regions to provide meaningful coverage (e.g., one each from North America, Europe, Asia-Pacific).
Step 3: Build YAML Definition
Use the template from references/check-types.md for the chosen type. Scaffold the file locally:
apiVersion: syntheticmonitoring.ext.grafana.app/v1alpha1
kind: Check
metadata:
name: <job-name> # Non-numeric = create; numeric = update
spec:
job: <job-name>
target: <target>
frequency: 60000 # milliseconds; 10000-120000 typical
timeout: 10000 # milliseconds; must be < frequency
enabled: true
labels:
environment: production
team: platform
probes:
- Atlanta
- Frankfurt
- Singapore
alertSensitivity: medium # none, low, medium, high
basicMetricsOnly: false # true = fewer metrics, lower cardinality
settings:
http: {} # Replace with type-specific settings
Configuration guidance:
- frequency: critical checks 10,000–60,000ms; standard checks 60,000–300,000ms
- timeout: must be strictly less than
frequency; typically 5,000–30,000ms - alertSensitivity:
high= alert if >5% failing;medium= >10%;low= >25%;none= no alerts - basicMetricsOnly:
truereduces metric cardinality (fewer label dimensions);falseemits full metrics
Step 4: Dry-run and Push
# Always dry-run first
gcx synth checks push <file.yaml> --dry-run
# Push only after dry-run succeeds
gcx synth checks push <file.yaml>
Push semantics:
- Non-numeric
metadata.name(e.g.,my-api-check): creates a new check; server assigns a numeric ID and updates the local file - Numeric
metadata.name(e.g.,12345): updates the existing check with that ID
After creation, verify with:
gcx synth checks list
gcx synth checks status <ID>
Workflow 2: Update Existing Check
Step 1: Pull Current Definition
Fetch the specific check or all checks:
# Get single check as YAML (use ID from list output)
gcx synth checks get <ID> -o yaml > check-<ID>.yaml
# Or pull all checks to a directory
gcx synth checks pull -d ./sm-checks/
Step 2: Edit and Push
Edit the pulled YAML file (the metadata.name will be the numeric ID). Modify only the fields that need changing.
# Dry-run the update
gcx synth checks push check-<ID>.yaml --dry-run
# Apply
gcx synth checks push check-<ID>.yaml
Workflow 3: GitOps Sync (Pull/Push)
Pull all checks to local directory, edit in source control, push to apply:
# Pull all checks to directory
gcx synth checks pull -d ./sm-checks/
# Edit files as needed, then push each changed file
gcx synth checks push ./sm-checks/<file>.yaml --dry-run
gcx synth checks push ./sm-checks/<file>.yaml
For bulk push from a directory, push files individually to control which checks are updated. Review dry-run output before each push.
Workflow 4: Delete Checks
# List checks to confirm IDs
gcx synth checks list
# Delete one or more checks (by numeric ID)
gcx synth checks delete <ID>
# Skip confirmation prompt
gcx synth checks delete <ID> -f
# Delete multiple checks
gcx synth checks delete <ID1> <ID2> <ID3>
Confirm the check identity (job name and target) before deleting — use gcx synth checks get <ID> to review.
Output Format
After creating or updating:
Check: <job-name>
Target: <target>
Type: <HTTP|Ping|DNS|TCP|Traceroute>
Probes: <count> selected (<list>)
Push: SUCCESS — ID: <assigned-id>
Verify status:
gcx synth checks status <ID>
After pull:
Pulled <N> checks to <dir>/
Files: <list of filenames>
After delete:
Deleted check <ID> (<job-name> -> <target>)
Error Handling
- "probe not found": Probe names are case-sensitive. Run
gcx synth probes listand copy names exactly. - "timeout must be less than frequency": Reduce
timeoutvalue or increasefrequency. - "invalid frequency":
frequencymust be between 10,000ms and 120,000ms (10s–2min). - Dry-run fails with validation error: Fix the YAML field indicated in the error before pushing.
- Push fails with "check already exists": The check job+target combination may already exist. Use
gcx synth checks listto find it and update instead of create. - No probes available: Run
gcx synth probes list; if empty, verify gcx context and SM API access. - Complex check types (MultiHTTP, Browser, Scripted): Settings map is not fully documented. Pull an existing check of that type as a template:
gcx synth checks get <ID> -o yaml.
More from grafana/gcx
gcx
>
5explore-datasources
Discover what datasources, metrics, labels, and log streams are available in a Grafana instance. Use when the user asks what data exists, what metrics are available, what services are being monitored, or needs to find a datasource UID.
4setup-gcx
>
3gcx-observability
>
3slo-check-status
Use when the user asks about SLO health, wants an overview of all SLOs, or needs status of a specific SLO. Trigger on phrases like "how are my SLOs doing", "SLO status", "check my SLOs", "is my SLO healthy", "SLO budget", "SLO burn rate". For investigating breaching SLOs use slo-investigate. For optimization suggestions use slo-optimize. For creating or modifying SLO definitions use slo-manage.
2slo-investigate
Use when a specific SLO is breaching or alerting and the user needs to understand why — root cause analysis, dimensional breakdown, alert rule correlation, runbook access. Trigger on phrases like "investigate SLO", "why is my SLO breaching", "SLO error budget burning", "SLO alerting". For SLO status overview use slo-check-status. For creating or modifying SLOs use slo-manage. For optimization suggestions use slo-optimize.
2