[node_exporter] node_exporter.service was killed with SIGPIPE and not restarted by systemd

882 views
Skip to first unread message

hkhl

unread,
Jun 27, 2017, 5:24:37 AM6/27/17
to Prometheus Users
Hi,

this might be a systemd problem but I am interested if anyone experienced something similar and if there is a recommended workaround.
I am running a test setup with a node_exporters and this service configuration:

--8<--

[Unit]
Description=Prometheus Node Exporter
After=network-online.target

[Service]
User=root
Restart=on-failure
WorkingDirectory=/var/local/prometheus/node_exporter
ExecStart=/usr/local/bin/prometheus_exporter_pack/node_exporter/node_exporter -web.listen-address 127.0.0.1:9100 \
-collector.filesystem.ignored-fs-types "^(sys|proc|auto)fs$" \
-collector.filesystem.ignored-mount-points "^/(sys|proc|dev)($|/)" \
-log.level warn

[Install]
WantedBy=default.target

-- >8 --

This is very close to https://github.com/prometheus/node_exporter/tree/master/examples/systemd (I need to convert this to an unprivileged user)

Last night the service died on one server with:

-- 8< --

node_exporter.service - Prometheus Node Exporter
   Loaded: loaded (/usr/local/bin/prometheus_exporter_pack/node_exporter/init/node_exporter.service; enabled)
   Active: inactive (dead) since Mo 2017-06-26 23:22:19 CEST; 10h ago
  Process: 20079 ExecStart=/usr/local/bin/prometheus_exporter_pack/node_exporter/node_exporter -web.listen-address 127.0.0.1:9100 -collector.filesystem.ignored-fs-types ^(sys|proc|auto)fs$ -collector.filesystem.ignored-mount-points ^/(sys|proc|dev)($|/) -log.level warn (code=killed, signal=PIPE)
 Main PID: 20079 (code=killed, signal=PIPE)

-- >8 --

The service was not restarted and an alert was fired although there was/is no problem with the server.

While running several hours before the crash this node_exporter was logging problems with diskstat (node_exporter[20079]: time="2017-06-26T23:21:04+02:00" level=error msg="ERROR: diskstats collector failed after 0.002437s: couldn't get diskstats: open /proc/diskstats: no such file or directory" source="node_exporter.go:95")  but this did not cause any problem (no crash, all other metrics recorded)

Apparently (https://github.com/hashicorp/consul/issues/1688) SIGPIPE is not considered as a "failure" by systemd -- hence no restart.
If I understand this correctly SIGPIPE could happen when journald closes stdout on node_exporter. This would correlate to node_exporter constantly logging on this machine, maybe increasing the likelihood.

I am not sure how to make this more robust. Should I configure restart=always or RestartForceExitStatus=SIGPIPE in systemd to force a restart or should node_exporter handle SIGPIPE differently?

Thanks!
Henrik

Ben Kochie

unread,
Jun 27, 2017, 5:51:31 AM6/27/17
to hkhl, Prometheus Users
We have an issue open for the node_exporter about this.  We haven't decided if there is anything we can/want to do inside the node_exporter to handle SIGPIPE.


The current recommended workaround is to change systemd to restart=always or RestartForceExitStatus=SIGPIPE.

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/43d3fcd8-fbd9-4c21-8cf3-4ba912a43662%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Karin Herwerth

unread,
Nov 22, 2019, 5:20:32 AM11/22/19
to Prometheus Users
Hi,

We have still the problem mit the node_exporter.
Are there same news or fixes about this issue?

thanks in advance.
Reply all
Reply to author
Forward
0 new messages