render-monitor

Installation

SKILL.md

Monitor Render Services

Real-time monitoring of Render services including health checks, performance metrics, and logs.

When to Use This Skill

Activate this skill when users want to:

Check if services are healthy
View performance metrics
Monitor logs
Verify a deployment is working
Investigate slow performance
Check database health

Prerequisites

MCP tools (preferred): Test with list_services() - provides structured data

CLI (fallback): render --version - use if MCP tools unavailable

Authentication: For MCP, use an API key (set in the MCP config or via the RENDER_API_KEY env var, depending on tool). For CLI, verify with render whoami -o json.

Workspace: get_selected_workspace() or render workspace current -o json

Note: MCP tools require the Render MCP server. If unavailable, use the CLI for status and logs; metrics and database queries require MCP.

MCP Setup

If list_services() fails, set up the Render MCP server. For detailed per-tool walkthroughs, see render-mcp.

Quick setup: Add the Render MCP server to your AI tool's MCP config:

URL: https://mcp.render.com/mcp
Auth header: Authorization: Bearer <YOUR_API_KEY>
API key: https://dashboard.render.com/u/*/settings#api-keys

After configuring, restart your tool and retry list_services(). Then set your workspace with list_workspaces() / get_selected_workspace().

Quick Health Check

Run these 5 checks to assess service health:

# 1. Check service status
list_services()

# 2. Check latest deploy
list_deploys(serviceId: "<service-id>", limit: 1)

# 3. Check for errors
list_logs(resource: ["<service-id>"], level: ["error"], limit: 20)

# 4. Check resource usage
get_metrics(resourceId: "<service-id>", metricTypes: ["cpu_usage", "memory_usage"])

# 5. Check latency
get_metrics(resourceId: "<service-id>", metricTypes: ["http_latency"], httpLatencyQuantile: 0.95)

Service Health

Check Status

list_services()

get_service(serviceId: "<id>")

Check Deployments

list_deploys(serviceId: "<service-id>", limit: 5)

Status	Meaning
`live`	Deployment successful
`build_in_progress`	Building
`build_failed`	Build failed
`deactivated`	Replaced by newer deploy

Check Errors

list_logs(resource: ["<service-id>"], level: ["error"], limit: 50)

list_logs(resource: ["<service-id>"], statusCode: ["500", "502", "503"], limit: 50)

Performance Metrics

CPU & Memory

get_metrics(
  resourceId: "<service-id>",
  metricTypes: ["cpu_usage", "memory_usage", "cpu_limit", "memory_limit"]
)

Metric	Healthy	Warning	Critical
CPU	<70%	70-85%	>85%
Memory	<80%	80-90%	>90%

HTTP Latency

get_metrics(
  resourceId: "<service-id>",
  metricTypes: ["http_latency"],
  httpLatencyQuantile: 0.95
)

p95 Latency	Status
<200ms	Excellent
200-500ms	Good
500ms-1s	Concerning
>1s	Problem

Request Count

get_metrics(
  resourceId: "<service-id>",
  metricTypes: ["http_request_count"]
)

Filter by Endpoint

get_metrics(
  resourceId: "<service-id>",
  metricTypes: ["http_latency"],
  httpPath: "/api/users"
)

Detailed metrics guide: references/metrics-guide.md

Database Monitoring

PostgreSQL Status

list_postgres_instances()
get_postgres(postgresId: "<postgres-id>")

Connection Count

get_metrics(resourceId: "<postgres-id>", metricTypes: ["active_connections"])

Query Database

query_render_postgres(
  postgresId: "<postgres-id>",
  sql: "SELECT state, count(*) FROM pg_stat_activity GROUP BY state"
)

Find Slow Queries

query_render_postgres(
  postgresId: "<postgres-id>",
  sql: "SELECT query, mean_exec_time FROM pg_stat_statements ORDER BY mean_exec_time DESC LIMIT 10"
)

Key-Value Store

list_key_value()
get_key_value(keyValueId: "<kv-id>")

Log Monitoring

Recent Logs

list_logs(resource: ["<service-id>"], limit: 100)

Error Logs

list_logs(resource: ["<service-id>"], level: ["error"], limit: 50)

Search Logs

list_logs(resource: ["<service-id>"], text: ["timeout", "error"], limit: 50)

Filter by Time

list_logs(
  resource: ["<service-id>"],
  startTime: "2024-01-15T10:00:00Z",
  endTime: "2024-01-15T11:00:00Z"
)

Stream Logs (CLI)

render logs -r <service-id> --tail -o text

Quick Reference

MCP Tools

# Services
list_services()
get_service(serviceId: "<id>")
list_deploys(serviceId: "<id>", limit: 5)

# Logs
list_logs(resource: ["<id>"], level: ["error"], limit: 100)
list_logs(resource: ["<id>"], text: ["search"], limit: 50)

# Metrics
get_metrics(resourceId: "<id>", metricTypes: ["cpu_usage", "memory_usage"])
get_metrics(resourceId: "<id>", metricTypes: ["http_latency"], httpLatencyQuantile: 0.95)
get_metrics(resourceId: "<id>", metricTypes: ["http_request_count"])

# Database
list_postgres_instances()
get_postgres(postgresId: "<id>")
query_render_postgres(postgresId: "<id>", sql: "SELECT ...")
get_metrics(resourceId: "<postgres-id>", metricTypes: ["active_connections"])

# Key-Value
list_key_value()
get_key_value(keyValueId: "<id>")

CLI Commands (Fallback)

Use these if MCP tools are unavailable:

# Service status
render services -o json
render services instances <service-id>

# Deployments
render deploys list <service-id> -o json

# Logs
render logs -r <service-id> --tail -o text          # Stream logs
render logs -r <service-id> --level error -o json   # Error logs
render logs -r <service-id> --type deploy -o json   # Build logs

# Database
render psql <database-id>                           # Connect to PostgreSQL

# SSH for live debugging
render ssh <service-id>

Healthy Service Indicators

Indicator	Healthy	Warning	Critical
Deploy Status	`live`	`update_in_progress`	`build_failed`
Error Rate	<0.1%	0.1-1%	>1%
p95 Latency	<500ms	500ms-2s	>2s
CPU Usage	<70%	70-90%	>90%
Memory Usage	<80%	80-95%	>95%

References

Metrics guide: references/metrics-guide.md

Related Skills

render-deploy — Deploy new applications to Render
render-debug — Diagnose and fix deployment failures
render-mcp — MCP server setup and tool catalog

Related skills

More from render-oss/skills

Installs

Repository

render-oss/skills

GitHub Stars

First Seen

Feb 9, 2026

Security Audits

Gen Agent Trust HubPass

SocketPass

SnykFail