aws-cloud-monitoring
AWS Cloud Monitoring
MCP Server
- Command:
uvx awslabs.cloudwatch-mcp-server@latest(stdio transport) - Requires:
AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY,AWS_REGION(orAWS_PROFILE)
Key Capabilities
- Metrics: Query CloudWatch metrics for any AWS service (EC2, ELB, TGW, NAT GW, VPN)
- Alarms: List and inspect CloudWatch alarms and their states
- Logs: Run CloudWatch Logs Insights queries across any log group
- Flow Logs: Analyze VPC and TGW flow logs for traffic patterns and dropped connections
Workflow: Network Monitoring Dashboard
When a user asks "how is our AWS network performing?":
- Check alarms: List CloudWatch alarms in ALARM state
- VPN metrics: Tunnel state, bytes in/out for site-to-site VPNs
- NAT Gateway metrics: Active connections, packets dropped, bytes processed
- Transit Gateway metrics: Bytes in/out, packets dropped per attachment
- ELB metrics: Healthy/unhealthy targets, latency, 5xx errors
- Report: Network health dashboard with any issues flagged
Workflow: Flow Log Analysis
When investigating traffic patterns or security events:
- Query VPC flow logs: Filter by source IP, destination IP, port, action (ACCEPT/REJECT)
- Identify rejected traffic: Find REJECT entries to see blocked connections
- Top talkers: Aggregate by source/destination to find heaviest traffic flows
- Time correlation: Narrow to specific time windows around incidents
- Report: Traffic analysis with recommendations
Common CloudWatch Network Metrics
| Service | Metric | What It Tells You |
|---|---|---|
| VPN | TunnelState |
0=down, 1=up for each tunnel |
| VPN | TunnelDataIn/Out |
Bytes through each VPN tunnel |
| NAT GW | ActiveConnectionCount |
Active NAT connections |
| NAT GW | PacketsDropCount |
Packets dropped (capacity issue) |
| NAT GW | BytesProcessed |
Traffic volume through NAT |
| TGW | BytesIn/BytesOut |
Traffic per TGW attachment |
| TGW | PacketDropCountBlackhole |
Blackhole route drops |
| ELB | HealthyHostCount |
Healthy targets behind ALB/NLB |
| ELB | TargetResponseTime |
Backend latency |
| EC2 | NetworkIn/NetworkOut |
Instance network throughput |
| EC2 | NetworkPacketsIn/Out |
Instance packet rate |
Flow Log Query Examples
# Top rejected connections in last hour
fields @timestamp, srcAddr, dstAddr, dstPort, action
| filter action = "REJECT"
| stats count() as rejections by srcAddr, dstAddr, dstPort
| sort rejections desc
| limit 20
# Traffic from specific source
fields @timestamp, srcAddr, dstAddr, dstPort, bytes, action
| filter srcAddr = "10.0.1.50"
| sort @timestamp desc
# Top talkers by bytes
fields srcAddr, dstAddr, bytes
| stats sum(bytes) as totalBytes by srcAddr, dstAddr
| sort totalBytes desc
| limit 10
Important Rules
- CloudWatch Logs Insights queries have a cost — be mindful of time range and data volume
- Region-specific — metrics and logs are scoped to the configured region
- Record in GAIT — log monitoring investigations for audit trail
Environment Variables
AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY,AWS_REGION(orAWS_PROFILE)
More from automateyournetwork/netclaw
pyats-topology
Network topology discovery via CDP/LLDP neighbors, ARP tables, routing peers, and interface mapping to build complete network maps. Use when mapping the network, building a diagram, discovering what is connected to what, or documenting device neighbors and links.
20drawio-diagram
Generate draw.io network diagrams — native .drawio files with CLI export (PNG/SVG/PDF), plus browser-based Mermaid/XML/CSV via MCP server. Use when creating network topology diagrams, generating architecture visuals, exporting diagrams to PNG or PDF, or building draw.io files from discovery data.
19aws-architecture-diagram
AWS architecture diagrams — generate visual network topology diagrams from live AWS infrastructure. Use when drawing AWS network diagrams, visualizing VPCs, mapping Transit Gateway topology, or generating architecture documentation.
19grafana-observability
Grafana observability platform — dashboards, Prometheus PromQL, Loki LogQL, alerting, incidents, OnCall schedules, annotations, datasource queries, panel rendering (75+ tools). Use when querying Grafana dashboards, running PromQL for interface metrics, searching Loki logs for syslog events, investigating firing alerts, or checking who is on call.
18pyats-health-check
Comprehensive network device health monitoring - CPU, memory, interfaces, hardware, NTP, logging, environment, and uptime analysis. Use when running a device health check, monitoring CPU or memory usage, checking interface errors, or validating NTP sync.
17aws-security-audit
AWS security auditing — IAM users/roles/policies, CloudTrail API events, security posture analysis. Use when auditing IAM permissions, investigating security incidents, checking MFA compliance, or tracing API activity in CloudTrail.
16