Getting the error because of which node_filesystem_device_error becomes 1.

23 views
Skip to first unread message

Yagyansh S. Kumar

unread,
Apr 18, 2020, 12:41:17 PM4/18/20
to Prometheus Users
Hi. After a lot of discussion on this forum for monitoring NFS hang issues, I am using node_filesystem_device_error to see if my NFS mount is hanging or not.
Now, since node_filesystem_device_error is basically a statfs call, there can be more than one reason for statfs call to fail. So, if node_filesystem_device_error == 1, is there any way to find out because if which error has the statfs call failed?

Eg. Currently, node_filesystem_device_error == 1 on 2 different NFS mounts because of 2 different reasons, one of the mount is hung and for the other one when I checked I found out statfs call is failing because of "Stale file handle".

So, is there any way to get these errors without logging into the machine?

Thanks in advance!

Yagyansh S. Kumar

unread,
Apr 18, 2020, 12:46:17 PM4/18/20
to Prometheus Users
Adding to this, I noticed something very strange. I am seeing that node_filesystem_device_error gives 1, but when I login into the servers, everything seems to be fine with the NFS mount. Even the statfs call is successful.
Is this a bug? If not, how to know the reason because of which node_filesystem_device_error == 1 even though the statfs call is successful.

Julius Volz

unread,
Apr 18, 2020, 2:23:17 PM4/18/20
to Yagyansh S. Kumar, Prometheus Users
Just looking at the source for the filesystem collector in the Node Exporter, the detailed errors in case of a failed statfs() are logged at debug level, so you'd have to set --log.level=debug to see that:



On Sat, Apr 18, 2020 at 6:46 PM Yagyansh S. Kumar <yagyans...@gmail.com> wrote:
Adding to this, I noticed something very strange. I am seeing that node_filesystem_device_error gives 1, but when I login into the servers, everything seems to be fine with the NFS mount. Even the statfs call is successful.
Is this a bug? If not, how to know the reason because of which node_filesystem_device_error == 1 even though the statfs call is successful.

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/003ffd1b-9058-4219-b63b-ef88c9bdb6e3%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages