data-health-monitor
Data Health Monitor
Purpose
Answers "is my data flowing correctly?" with a single command. Aggregates health signals across streams, jobs, schema, and event quota into a unified, actionable report. Purely read-only.
Environment
Requires authenticated API access. See ../references/auth.md for credential resolution.
Flow
Run all four health checks, then present a unified report. If the user asks about a specific area, focus on that dimension but still show a summary of others.
Check 1: Stream Health
curl -s "${LYTICS_API_URL:-https://api.lytics.io}/v2/stream" \
-H "Authorization: ${LYTICS_API_TOKEN}"
For each stream, evaluate:
| Signal | How to Detect | Severity |
|---|---|---|
| Active | last_msg_ts within last hour |
HEALTHY |
| Stale (continuous) | last_msg_ts 1-24 hours ago |
WARNING |
| Stale (batch) | last_msg_ts 2-7 days ago |
WARNING |
| Dead | last_msg_ts > 24h ago (continuous) or > 7d (batch) |
ERROR |
| Never received | ct == 0 |
ERROR |
Distinguish batch vs continuous by checking if the stream has associated jobs with periodic schedules.
For streams with issues, fetch recent stats for more detail:
curl -s "${LYTICS_API_URL:-https://api.lytics.io}/v2/stream/${STREAM_NAME}/stats" \
-H "Authorization: ${LYTICS_API_TOKEN}"
Check 2: Job Health
# Active jobs (default: non-terminal states)
curl -s "${LYTICS_API_URL:-https://api.lytics.io}/v2/job" \
-H "Authorization: ${LYTICS_API_TOKEN}"
# Also check recently failed jobs
curl -s "${LYTICS_API_URL:-https://api.lytics.io}/v2/job?show_completed=true" \
-H "Authorization: ${LYTICS_API_TOKEN}"
Evaluate each job:
| Status | Severity | Action |
|---|---|---|
runnable |
HEALTHY | Running normally |
sleeping |
HEALTHY | Scheduled, waiting for next run |
paused |
WARNING | Intentional but flag for awareness |
fault |
ERROR | Needs investigation -- fetch logs |
failed |
ERROR | Terminal failure -- fetch logs |
killed |
INFO | Manually stopped |
For faulted/failed jobs, fetch logs:
curl -s "${LYTICS_API_URL:-https://api.lytics.io}/v2/job/${JOB_ID}/logs" \
-H "Authorization: ${LYTICS_API_TOKEN}"
Also check for stale jobs: if a runnable job hasn't been updated in over 1 hour, it may be stuck.
Check 3: Schema Health
# Get all fields with metadata
curl -s "${LYTICS_API_URL:-https://api.lytics.io}/v2/schema/user/field" \
-H "Authorization: ${LYTICS_API_TOKEN}"
Check:
- Identity fields: Count fields where
IsIdentifier == true. Flag if fewer than 2. - PII fields: Count fields marked
IsPII == truefor awareness. - Stale fields: Fields with
Modifiedtimestamp older than 30 days that are actively used.
For deeper coverage analysis:
curl -s "${LYTICS_API_URL:-https://api.lytics.io}/api/schema/user/fieldinfo" \
-H "Authorization: ${LYTICS_API_TOKEN}"
Check field presence/absence ratios. Flag fields with very low coverage that appear in segment FilterQL.
Check 4: Event Quota
curl -s "${LYTICS_API_URL:-https://api.lytics.io}/v2/control/eventquota/thresholds" \
-H "Authorization: ${LYTICS_API_TOKEN}"
Report current usage against thresholds (50%, 75%, 100%, 125%).
Optional: Metrics Deep Dive
When the user wants trends or deeper analysis:
# Stream throughput over last 24h
curl -s "${LYTICS_API_URL:-https://api.lytics.io}/v2/metric?dimension=stream&range=now-24h" \
-H "Authorization: ${LYTICS_API_TOKEN}"
# Job execution metrics over last 24h
curl -s "${LYTICS_API_URL:-https://api.lytics.io}/v2/metric?dimension=works&range=now-24h" \
-H "Authorization: ${LYTICS_API_TOKEN}"
# Segment size trends
curl -s "${LYTICS_API_URL:-https://api.lytics.io}/v2/metric?dimension=segment&range=now-24h" \
-H "Authorization: ${LYTICS_API_TOKEN}"
Present as trends: "Stream throughput is down 40% vs yesterday" or "Segment sizes are stable."
Output Format
Present the report as:
## Data Health Report
### Overall: HEALTHY | NEEDS ATTENTION | UNHEALTHY
### Streams (N total)
HEALTHY: X streams actively receiving data
WARNING: 'stream_name' -- last event 3 days ago
ERROR: 'stream_name' -- never received events
### Jobs (N active)
HEALTHY: X jobs running normally
FAULT: 'job_name' -- error message from logs
PAUSED: 'job_name' -- paused since date
### Schema (user table, N fields)
Identity fields: N configured (field1, field2, ...)
Low coverage: 'field' at X%
Stale: 'field' not updated in N days
### Event Quota
Current usage: X% of monthly quota
### Recommendations
1. Specific actionable recommendation
2. Another recommendation
3. ...
Severity Logic
| Overall Status | Criteria |
|---|---|
| HEALTHY | All streams active, all jobs running, no faults, quota < 75% |
| NEEDS ATTENTION | Any: stale streams, paused jobs, low-coverage fields, quota 75-100% |
| UNHEALTHY | Any: faulted/failed jobs, dead streams, quota > 100% |
Recommendations Engine
Generate specific, actionable recommendations based on findings:
- Faulted job → "Investigate 'job_name' fault. Check auth credentials or bounce the job."
- Dead stream → "Stream 'name' hasn't received data in N days. Check the source integration."
- Zero-event stream → "Stream 'name' is configured but has never received data. Verify the integration is set up correctly."
- Low identity fields → "Only N identity fields configured. Consider marking additional fields as identifiers for better profile resolution."
- Quota approaching → "Event quota at X%. Consider reviewing high-volume streams or upgrading your plan."
- Stale field → "Field 'name' hasn't been updated in N days. Check if the source integration is still active."
Error Handling
- API errors on any check: Report the error for that dimension, continue with other checks. Never let one failed check block the whole report.
- Empty responses: Report "No [streams/jobs/fields] found" -- this may indicate a new or unconfigured account.
- Timeout: If a check takes too long, skip it with a note and proceed.
Dependencies
- Composes:
stream-inspector skill,job-manager skill,schema-manager skill - References:
../references/auth.md,../references/api-client.md
More from lytics/agent-skills
audience-advisor
Strategic audience guidance -- helps users build the right audience for their business goal or improve an existing segment. Use when the user needs help choosing the right audience strategy, wants advice on segment design, or needs to improve an existing segment.
28audience-snapshot
Analyze what an audience looks like -- demographic breakdowns, top field values, coverage rates, and distributions. Use when the user wants to understand audience composition, view segment demographics, or analyze field coverage for a segment.
28schema-optimizer
Analyze schema field usage, coverage, mappings, and identity config to suggest improvements. Use when the user wants to optimize their schema, find unused fields, improve coverage, or review identity and merge configuration.
27stream-inspector
Inspect data streams, view stats, and browse recent events. Use when the user wants to list streams, view stream statistics, or browse recent stream events.
26audience-builder
Create or update audience segments from natural language descriptions. Use when the user wants to build, create, or define an audience or segment from a natural language description.
26campaign-flow-builder
Guided flow/journey creation from business intent -- multi-step campaigns with delays, conditionals, A/B tests, and exports. Use when the user wants to create a campaign, build a journey, or design a multi-step marketing flow.
26