cx-cost-optimization
Cost Optimization Skill
Use this skill when investigating or reducing Coralogix data costs. It covers the full cost management lifecycle: measuring current spend, reviewing TCO policies, adjusting retention periods, setting ingestion quotas, and configuring archive storage for cold data.
CLI Commands
| Command | Subcommands | Purpose |
|---|---|---|
cx usage |
summary, daily, logs-count, spans-count, export-status |
Measure current data consumption |
cx tco |
list, get, create, update, delete, reorder, test, settings, settings-update |
Manage TCO (Total Cost of Ownership) policies |
cx retentions |
list, update, activate, status |
Manage data retention periods |
cx quotas |
get, create, update, delete |
Set ingestion guardrails |
cx archive logs |
get, set |
Configure logs archive target |
cx archive metrics |
get, create, update, enable, disable, validate |
Configure metrics archive storage |
cx metrics query |
<promql> (positional), --time |
Query billing and usage metrics via PromQL (instant) |
cx metrics query-range |
<promql> (positional), --start/--end |
Query billing and usage metrics via PromQL (range) |
Key flags:
- All commands support
-o jsonfor structured output and-p <profile>for profile selection cx usage dailyaccepts--type processed-gbs|units|evaluation-tokensand--start/--endtime filterscx usage summaryaccepts--start/--endtime filterscx tco create/update,cx retentions update,cx quotas create/update,cx archive logs set,cx archive metrics create/update/validateuse--from-file <path>(or-for stdin)
Cost Investigation Workflow
Follow these steps to diagnose and reduce costs:
Step 1: Measure Current Usage
cx usage summary -o json
cx usage summary --start now-30d -o json
cx usage daily --type processed-gbs --start now-7d -o json
cx usage logs-count -o json
cx usage spans-count -o json
Identify which data types consume the most volume. Use jq to sort:
cx usage summary -o json | jq '[.[] | {name, daily_avg: .avg_daily_gb}] | sort_by(.daily_avg) | reverse'
Step 2: Review TCO Policies
cx tco list -o json
cx tco settings -o json
TCO policies control which logs go to Frequent Search (expensive, fast) vs. Archive (cheap, slower). Check if high-volume, low-value logs are on Frequent Search:
cx tco list -o json | jq '.[] | select(.priority == "LOW") | {name, application, subsystem, archive_retention}'
Step 3: Check Retention Settings
cx retentions list -o json
cx retentions status -o json
Long retention periods increase storage costs. Identify indices with unnecessarily long retention.
Step 4: Review Quota Rules
cx quotas get -o json
Quota rules cap ingestion volume. If there are no quotas and you see burst ingestion, recommend adding guardrails.
Step 5: Check Archive Configuration
cx archive logs get -o json
cx archive metrics get -o json
Verify that archive storage is configured for cold data. If no archive is set up, that's a cost-saving opportunity.
Step 6: Recommend Optimizations
Based on findings, recommend changes in priority order (highest impact first).
Common Optimization Patterns
| Symptom | Diagnosis Command | Optimization |
|---|---|---|
| High-volume low-value logs | cx usage summary -o json |
Move to archive tier via cx tco create --from-file policy.json |
| Long retention on cold data | cx retentions list -o json |
Reduce retention with cx retentions update --from-file |
| Burst ingestion spikes | cx usage daily -o json |
Add quota rules with cx quotas create --from-file |
| No cold storage configured | cx archive logs get -o json |
Enable archive with cx archive logs set --from-file --yes (after user approval) |
| Expensive metrics not queried | cx archive metrics get -o json |
Enable metrics archiving with cx archive metrics create --from-file --yes (after user approval) |
jq Examples
Usage Analysis
# Top consumers by daily volume
cx usage summary -o json | jq '[.[] | {name, daily_avg: .avg_daily_gb}] | sort_by(.daily_avg) | reverse | .[0:10]'
# Daily trend for the past week
cx usage daily --type processed-gbs --start now-7d -o json | jq '[.[] | {date, gb: .processed_gbs}]'
# Total logs and spans counts
cx usage logs-count -o json | jq '.total_count'
cx usage spans-count -o json | jq '.total_count'
TCO Policy Analysis
# Policies routing to archive tier
cx tco list -o json | jq '[.[] | select(.archive_retention != null)]'
# Policies by priority
cx tco list -o json | jq 'group_by(.priority) | map({priority: .[0].priority, count: length})'
# Test if a log pattern matches a policy
cx tco test --from-file test-definition.json -o json
Retention Review
# All retention settings
cx retentions list -o json | jq '.[]'
# Check if retention is active
cx retentions status -o json
Quota Analysis
# Current quota rules
cx quotas get -o json | jq '.rules // empty'
Archive Status
# Logs archive configuration
cx archive logs get -o json | jq '{active: .active, bucket: .bucket}'
# Metrics archive configuration
cx archive metrics get -o json | jq '{enabled: .enabled, bucket: .bucket}'
Applying Changes
IMPORTANT: NEVER pass --yes without explicit user approval. All write operations across archive, TCO, retentions, and quotas require interactive confirmation and the --yes flag to execute non-interactively. Before executing any write operation, describe the exact change to the user and wait for their approval before passing --yes.
Read-only mode: Use --read-only (or CX_READ_ONLY=1) to safely explore cost data without risk of accidental writes. All query commands (usage, tco list/get, retentions list, quotas get, archive get) work normally in read-only mode.
Agent mode: When running inside an AI agent, cx fails fast on write operations instead of hanging on a stdin prompt. Get user confirmation first, then re-run with --yes.
When modifying TCO policies, retention, quotas, or archive:
-
Template from existing: Get the current configuration as JSON, modify it, then apply:
cx tco get <policy-id> -o json > policy.json # Edit policy.json cx tco update --from-file policy.json -
Verify after changes: Re-run the diagnosis commands to confirm the change took effect.
-
TCO policy ordering matters: Use
cx tco reorder --from-fileto set priority order. Policies are evaluated top-to-bottom; the first match wins.
Metrics-Based Cost Analysis
The cx usage API gives summaries, but for billing-accurate analysis, anomaly detection, and breakdown by pillar/feature, query the customer metrics exporter via PromQL.
Key Metrics
| Metric | Meaning | Query suffix |
|---|---|---|
cx_data_usage_units |
Daily billable usage in units (canonical billing metric) | No _total |
cx_data_plan_units_per_day |
Current daily plan quota in units (snapshot) | No _total |
cx_data_usage_payg_units |
Daily overage/PAYG usage in units | No _total |
cx_data_usage_total |
Processed data size in bytes | _total |
cx_data_usage_tokens_total |
AI evaluation tokens | _total |
cx_data_usage_samples_total |
Processed metric samples | _total |
Concept-to-Metric Mapping
- Billing / plan usage / consumption ->
cx_data_usage_units+cx_data_plan_units_per_day - Processed bytes / data volume ->
cx_data_usage_total - AI evaluation tokens ->
cx_data_usage_tokens_total - Metric samples ->
cx_data_usage_samples_total - Overage / PAYG ->
cx_data_usage_payg_units
Common PromQL Queries
# Today's billable units consumed so far
cx metrics query 'sum(cx_data_usage_units)' --time now -o json
# Units breakdown by pillar
cx metrics query 'sum by (pillar) (cx_data_usage_units)' --time now -o json
# Daily plan quota
cx metrics query 'cx_data_plan_units_per_day' --time now -o json
# Plan consumption percentage
cx metrics query '100 * sum(cx_data_usage_units) / cx_data_plan_units_per_day' --time now -o json
# Units by feature group
cx metrics query 'sum by (feature_group_id) (cx_data_usage_units)' --time now -o json
# PAYG overage (if any)
cx metrics query 'cx_data_usage_payg_units' --time now -o json
UTC-Day Bucketing Rules
All usage metrics accumulate from UTC midnight and reset at 00:00 UTC:
- An instant query during the day returns "today so far"
- For completed-day totals, use the last sample before midnight
- Never subtract values across a UTC midnight boundary
- For weekly/monthly analysis, derive completed daily totals first, then roll up
- Exclude the current partial UTC day when computing trends or averages
Anomaly Detection
When investigating usage anomalies:
- Compare completed UTC days (exclude current partial day)
- Break down by:
measurement_type->pillar->entity_type->priority->feature_group_id->application_name->subsystem_name - Prefer same-weekday comparisons for seasonal traffic
- Use
cx_data_usage_unitsfor billing anomalies,cx_data_usage_totalfor volume anomalies
Breakdown Labels
Usage metrics support these grouping dimensions: pillar, entity_type, priority, measurement_type, feature_group_id, feature_id, application_name, subsystem_name.
Key Principles
- Measure before changing - always run usage/summary commands before modifying policies
- Use
-o jsonwith jq - structured output enables precise analysis - Verify changes - re-query after every modification to confirm it took effect
- Multi-profile awareness - use
-p <profile>or--all-profilesto compare costs across environments - Template from existing - get current config as JSON before creating or updating
- TCO is the biggest lever - moving logs from Frequent Search to Archive tier has the largest cost impact
Related Skills
cx-telemetry-querying- investigate what data is being ingested (query logs, metrics, and spans to identify high-volume sources)
More from coralogix/cx-cli
cx-telemetry-querying
|
102cx-alerts
This skill should be used when the user asks to "manage alerts", "create alert", "list alerts", "check alert status", "enable alert", "disable alert", "investigate firing alerts", "check which alerts are active", "find alerting rules", "set up an alert", "configure alerting", "mute an alert", "silence an alert", "see alert definitions", "check alert priority", or wants to manage Coralogix alert definitions using the cx CLI.
95cx-observability-setup
>
87cx-incident-management
>
85cx-create-dashboard
>
83cx-data-pipeline
>
79