Checking if NFS is hanged or not using node_exporter.

27 views
Skip to first unread message

Yagyansh S. Kumar

unread,
Mar 3, 2020, 10:54:15 AM3/3/20
to Prometheus Developers
Hello experts.
I want to check if the NFS is hanged(i.e whether it is accessible from the server or not, and if yes then what is the response time it is getting). I have already enabled the nfs and nfsd collectors, but haven't found any that can accurately tell me every time the NFS hangs. Any help would be appreciated.
Thanks in advance.

Matthias Rampke

unread,
Mar 3, 2020, 11:53:10 AM3/3/20
to Yagyansh S. Kumar, Prometheus Developers
The trouble is that the only sure way to know if MFS hangs is to try and use it. For one, the node exporter is not a health prober per se, bur more importantly, when NFS does hang, this leaves a thread permanently stuck in an uninterruptible syscall. Very soon the exporter would run out of threads and stop working altogether. For the same reason, do not put textfile metrics files on NFS.

I believe the only way to do this safely is a separate check script/loop. Don't start the script with cron, you need to make sure that once it is stuck it stops trying until it is unstuck again, or you will fill up your process table with stuck processes that cannot be killed.

/MR


--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/1b166f35-fdb0-44fe-9fda-294fcde864f5%40googlegroups.com.

Yagyansh S. Kumar

unread,
Mar 3, 2020, 12:22:50 PM3/3/20
to Prometheus Developers
What happens to the metrics when an NFS mount is in hung state? I believe that if a NFS is in hung state, the node_exporter will stop scraping the metrics until the mount become normal again, because if node_exporter keeps trying to scrape metrics from a hung mount, it will eventually lead to node_exporter going down itself? I am right here?
And if I am, then can we safely say that whenever node_exporter stop giving metrics for a NFS mount, it is necessarily because the mount is in hung state?

On Tuesday, March 3, 2020 at 10:23:10 PM UTC+5:30, Matthias Rampke wrote:
The trouble is that the only sure way to know if MFS hangs is to try and use it. For one, the node exporter is not a health prober per se, bur more importantly, when NFS does hang, this leaves a thread permanently stuck in an uninterruptible syscall. Very soon the exporter would run out of threads and stop working altogether. For the same reason, do not put textfile metrics files on NFS.

I believe the only way to do this safely is a separate check script/loop. Don't start the script with cron, you need to make sure that once it is stuck it stops trying until it is unstuck again, or you will fill up your process table with stuck processes that cannot be killed.

/MR


On Tue, 3 Mar 2020, 16:54 Yagyansh S. Kumar, <yagyans...@gmail.com> wrote:
Hello experts.
I want to check if the NFS is hanged(i.e whether it is accessible from the server or not, and if yes then what is the response time it is getting). I have already enabled the nfs and nfsd collectors, but haven't found any that can accurately tell me every time the NFS hangs. Any help would be appreciated.
Thanks in advance.

--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages