monitoring
SKILL.md
Monitoring - Complete API Reference
Monitor system health, track errors, and receive alerts when issues occur.
Chat Commands
Service Control
/monitor start # Start monitoring
/monitor stop # Stop monitoring
/monitor status # Check monitoring status
Health Checks
/monitor health # Run health check
/monitor health --verbose # Detailed health info
/monitor providers # Check LLM provider status
Alerts
/monitor alerts # View recent alerts
/monitor alerts --unread # Unread alerts only
/monitor alert-targets # View alert destinations
/monitor alert-targets add email <addr> # Add email target
/monitor alert-targets add webhook <url> # Add webhook target
/monitor alert-targets remove <id> # Remove target
Configuration
/monitor config # View config
/monitor cooldown 300 # Set alert cooldown (seconds)
/monitor threshold cpu 80 # Set CPU alert threshold
/monitor threshold memory 90 # Set memory threshold
TypeScript API Reference
Create Monitoring Service
import { createMonitoringService } from 'clodds/monitoring';
const monitor = createMonitoringService({
// Health check interval
intervalMs: 60000, // 1 minute
// Alert targets
alertTargets: [
{ type: 'email', address: 'alerts@example.com' },
{ type: 'webhook', url: 'https://hooks.example.com/alerts' },
],
// Alert cooldown (prevent spam)
alertCooldownMs: 300000, // 5 minutes
// Thresholds
thresholds: {
cpu: 80, // Alert at 80% CPU
memory: 90, // Alert at 90% memory
errorRate: 10, // Alert at 10% error rate
},
});
Start/Stop Monitoring
// Start monitoring
await monitor.start();
// Check if running
const isRunning = monitor.isRunning();
// Stop monitoring
await monitor.stop();
Health Checks
// Run health check
const health = await monitor.runHealthCheck();
console.log(`Overall: ${health.status}`); // 'healthy' | 'degraded' | 'unhealthy'
console.log('\nSystem:');
console.log(` CPU: ${health.system.cpu}%`);
console.log(` Memory: ${health.system.memory}%`);
console.log(` Disk: ${health.system.disk}%`);
console.log('\nProviders:');
for (const [name, status] of Object.entries(health.providers)) {
console.log(` ${name}: ${status.status} (${status.latencyMs}ms)`);
}
console.log('\nServices:');
for (const [name, status] of Object.entries(health.services)) {
console.log(` ${name}: ${status.status}`);
}
Provider Health
// Check LLM provider status
const providers = await monitor.checkProviders();
for (const provider of providers) {
console.log(`${provider.name}:`);
console.log(` Status: ${provider.status}`);
console.log(` Latency: ${provider.latencyMs}ms`);
console.log(` Last error: ${provider.lastError || 'none'}`);
console.log(` Error rate: ${provider.errorRate}%`);
}
Alert Management
// Get recent alerts
const alerts = await monitor.getAlerts({ limit: 10 });
for (const alert of alerts) {
console.log(`[${alert.severity}] ${alert.title}`);
console.log(` ${alert.message}`);
console.log(` Time: ${alert.timestamp}`);
console.log(` Acknowledged: ${alert.acknowledged}`);
}
// Acknowledge alert
await monitor.acknowledgeAlert(alertId);
// Get unread count
const unread = await monitor.getUnreadAlertCount();
Alert Targets
// Add alert target
await monitor.addAlertTarget({
type: 'email',
address: 'team@example.com',
});
await monitor.addAlertTarget({
type: 'webhook',
url: 'https://hooks.slack.com/...',
});
// List targets
const targets = monitor.getAlertTargets();
// Remove target
await monitor.removeAlertTarget(targetId);
Event Handlers
// Listen for events
monitor.on('alert', (alert) => {
console.log(`🚨 Alert: ${alert.title}`);
});
monitor.on('healthCheck', (health) => {
if (health.status !== 'healthy') {
console.log(`⚠️ System ${health.status}`);
}
});
monitor.on('providerDown', (provider) => {
console.log(`❌ Provider down: ${provider.name}`);
});
monitor.on('providerRecovered', (provider) => {
console.log(`✅ Provider recovered: ${provider.name}`);
});
Manual Alerts
// Send manual alert
await monitor.sendAlert({
severity: 'warning', // 'info' | 'warning' | 'error' | 'critical'
title: 'Custom Alert',
message: 'Something important happened',
metadata: { key: 'value' },
});
Alert Types
| Type | Trigger |
|---|---|
| provider_down | LLM provider not responding |
| high_cpu | CPU usage above threshold |
| high_memory | Memory usage above threshold |
| high_error_rate | Error rate above threshold |
| unhandled_exception | Uncaught exception |
| unhandled_rejection | Unhandled promise rejection |
Configuration
// Update config
monitor.configure({
intervalMs: 30000,
alertCooldownMs: 600000,
thresholds: {
cpu: 85,
memory: 95,
errorRate: 5,
},
});
Best Practices
- Set appropriate thresholds - Avoid alert fatigue
- Use cooldowns - Prevent alert spam
- Multiple targets - Email + webhook for redundancy
- Acknowledge alerts - Track what's been handled
- Monitor providers - Know when APIs are down
- Check health regularly - Don't just rely on alerts
Weekly Installs
6
Repository
alsk1992/cloddsbotGitHub Stars
62
First Seen
Feb 20, 2026
Security Audits
Installed on
opencode6
gemini-cli6
github-copilot6
codex6
kimi-cli6
amp6