[node_exporter] node_exporter.service was killed with SIGPIPE and not restarted by systemd

hkhl

unread,

Jun 27, 2017, 5:24:37 AM6/27/17

to Prometheus Users

Hi,

this might be a systemd problem but I am interested if anyone experienced something similar and if there is a recommended workaround.

I am running a test setup with a node_exporters and this service configuration:

--8<--

[Unit]

Description=Prometheus Node Exporter

After=network-online.target

[Service]

User=root

Restart=on-failure

WorkingDirectory=/var/local/prometheus/node_exporter

ExecStart=/usr/local/bin/prometheus_exporter_pack/node_exporter/node_exporter -web.listen-address 127.0.0.1:9100 \

-collector.filesystem.ignored-fs-types "^(sys|proc|auto)fs$" \

-collector.filesystem.ignored-mount-points "^/(sys|proc|dev)($|/)" \

-log.level warn

[Install]

WantedBy=default.target

-- >8 --

This is very close to https://github.com/prometheus/node_exporter/tree/master/examples/systemd (I need to convert this to an unprivileged user)

Last night the service died on one server with:

-- 8< --

node_exporter.service - Prometheus Node Exporter

Loaded: loaded (/usr/local/bin/prometheus_exporter_pack/node_exporter/init/node_exporter.service; enabled)

Active: inactive (dead) since Mo 2017-06-26 23:22:19 CEST; 10h ago

Process: 20079 ExecStart=/usr/local/bin/prometheus_exporter_pack/node_exporter/node_exporter -web.listen-address 127.0.0.1:9100 -collector.filesystem.ignored-fs-types ^(sys|proc|auto)fs$ -collector.filesystem.ignored-mount-points ^/(sys|proc|dev)($|/) -log.level warn (code=killed, signal=PIPE)

Main PID: 20079 (code=killed, signal=PIPE)

-- >8 --

The service was not restarted and an alert was fired although there was/is no problem with the server.

While running several hours before the crash this node_exporter was logging problems with diskstat (node_exporter[20079]: time="2017-06-26T23:21:04+02:00" level=error msg="ERROR: diskstats collector failed after 0.002437s: couldn't get diskstats: open /proc/diskstats: no such file or directory" source="node_exporter.go:95") but this did not cause any problem (no crash, all other metrics recorded)

Apparently (https://github.com/hashicorp/consul/issues/1688) SIGPIPE is not considered as a "failure" by systemd -- hence no restart.

If I understand this correctly SIGPIPE could happen when journald closes stdout on node_exporter. This would correlate to node_exporter constantly logging on this machine, maybe increasing the likelihood.

I am not sure how to make this more robust. Should I configure restart=always or RestartForceExitStatus=SIGPIPE in systemd to force a restart or should node_exporter handle SIGPIPE differently?

Thanks!

Henrik

Ben Kochie

unread,

Jun 27, 2017, 5:51:31 AM6/27/17

to hkhl, Prometheus Users

We have an issue open for the node_exporter about this. We haven't decided if there is anything we can/want to do inside the node_exporter to handle SIGPIPE.

https://github.com/prometheus/node_exporter/issues/587

The current recommended workaround is to change systemd to restart=always or RestartForceExitStatus=SIGPIPE.

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/43d3fcd8-fbd9-4c21-8cf3-4ba912a43662%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Karin Herwerth

unread,

Nov 22, 2019, 5:20:32 AM11/22/19

to Prometheus Users

Hi,

We have still the problem mit the node_exporter.

Are there same news or fixes about this issue?

thanks in advance.

Reply all

Reply to author

Forward