node-exporter
Installation
SKILL.md
Identity
- Unit:
prometheus-node-exporter.service(Debian/Ubuntu),node_exporter.service(RHEL/Fedora, manual install) - Binary:
/usr/bin/prometheus-node-exporter(package) or/usr/local/bin/node_exporter(manual) - Config: No config file — all configuration is via command-line flags in the systemd unit
- Flags location:
/etc/default/prometheus-node-exporter(Debian/Ubuntu) orExecStart=in the unit override - Metrics endpoint:
http://localhost:9100/metrics - Distro install:
apt install prometheus-node-exporter/dnf install golang-github-prometheus-node-exporter
Key Operations
| Operation | Command |
|---|---|
| Service status | systemctl status prometheus-node-exporter |
| Check metrics endpoint | curl -s localhost:9100/metrics | head -50 |
| Count exposed metrics | curl -s localhost:9100/metrics | grep -c '^node_' |
| List enabled collectors | prometheus-node-exporter --help 2>&1 | grep 'collector\.' |
| Filter CPU metrics | curl -s localhost:9100/metrics | grep '^node_cpu' |
| Check filesystem metrics | curl -s localhost:9100/metrics | grep '^node_filesystem' |
| Check disk I/O metrics | curl -s localhost:9100/metrics | grep '^node_disk' |
| Check memory metrics | curl -s localhost:9100/metrics | grep '^node_memory' |
| Check network metrics | curl -s localhost:9100/metrics | grep '^node_network' |
| Check load/CPU pressure | curl -s localhost:9100/metrics | grep '^node_load|node_pressure' |
| Textfile collector output | curl -s localhost:9100/metrics | grep '^node_textfile|^# HELP node_textfile' |
| Hardware temperature (hwmon) | curl -s localhost:9100/metrics | grep '^node_hwmon' |
| Systemd unit states | curl -s localhost:9100/metrics | grep '^node_systemd' |
| View active flags | systemctl cat prometheus-node-exporter | grep ExecStart |
Expected Ports
- 9100/tcp — metrics endpoint (HTTP or HTTPS if TLS configured)
- Verify:
ss -tlnp | grep 9100 - Node exporter binds to all interfaces by default — restrict with
--web.listen-address=127.0.0.1:9100if Prometheus scrapes locally
Health Checks
systemctl is-active prometheus-node-exporter→activecurl -sf http://localhost:9100/metrics > /dev/null && echo OK→OKcurl -s localhost:9100/metrics | grep -c '^node_' | awk '$1 > 100 {print "metrics present"}'→metrics present
Common Failures
| Symptom | Likely cause | Check/Fix |
|---|---|---|
connection refused on port 9100 |
Service not running | systemctl start prometheus-node-exporter |
No node_filesystem_* metrics |
Mount point excluded or wrong filesystem type | Check --collector.filesystem.mount-points-exclude flag; tmpfs excluded by default |
No node_hwmon_* metrics |
hwmon collector disabled or no sensors detected | Verify lm_sensors installed: sensors; collector may need --collector.hwmon explicitly |
| Textfile metrics not appearing | Wrong directory or file permissions | Confirm dir matches --collector.textfile.directory; file must be world-readable and end in .prom |
| Port 9100 reachable from internet | No firewall rule restricting access | Add firewall rule or use --web.listen-address=127.0.0.1:9100; node exporter has no auth by default |
Old metric names (node_cpu not node_cpu_seconds_total) |
Pre-v1.0 package installed | v1.0 renamed many metrics; check prometheus-node-exporter --version and update scrape queries |
too many open files in logs |
System ulimit too low for large number of mounts/disks | Add LimitNOFILE=65536 to systemd unit override |
Pain Points
- No built-in authentication: Node exporter exposes all system metrics without credentials by default. Bind to localhost (
--web.listen-address=127.0.0.1:9100) unless Prometheus is on a different host, and use firewall rules or a reverse proxy for remote access. TLS + basic auth via--web.config.filerequires v1.5+. - Textfile collector is the extension point: The only supported way to add custom metrics is to write
.promfiles into the textfile directory. Scripts that generate these files must be run separately (e.g., via cron). The collector reads them at scrape time, not on a schedule. - Collector flags are cumulative:
--collector.disable-defaultsdisables everything; then--collector.cpu,--collector.meminfo, etc. opt in. Without--collector.disable-defaults, you get all collectors and can only exclude with--no-collector.<name>. - Metric naming changed in v1.0:
node_cpubecamenode_cpu_seconds_total,node_filesystem_freebecamenode_filesystem_free_bytes, etc. Any dashboards or alerts from before v1.0 will silently show no data after an upgrade. - Loop devices inflate filesystem metrics: By default, loop devices (
/dev/loop*) appear innode_filesystem_*metrics. Exclude with--collector.filesystem.mount-points-exclude='^/(dev|proc|run/credentials/.+|sys|var/lib/docker/.+)($|/)' --collector.diskstats.device-exclude='^(loop|ram)\d+$'. - High cardinality on busy systems: On servers with many CPUs, disks, or network interfaces, the default collectors produce thousands of time series. Disable unused collectors (e.g.,
--no-collector.arp,--no-collector.bcache) to reduce overhead on resource-constrained systems.
References
See references/ for:
common-patterns.md— install, restrict access, collector configuration, textfile collector, PromQL, Grafana dashboards, TLS authdocs.md— official documentation and reference links
Related skills