Exposing total thread count in node-exporter on linux systems

1,640 views
Skip to first unread message

ari.a...@gmail.com

unread,
Jan 6, 2017, 1:51:15 PM1/6/17
to Prometheus Developers
Hello -

I'd like to track the total thread count on our systems. This has been useful in identifying cases where threads stack up due to timeouts, etc.

This gives a sample of the metric I'm looking for:
grep -s '^Threads' /proc/[0-9]*/status | awk '{ sum += $2; } END { print "threads.value", sum; }'

I didn't see anything like this in the existing collectors. Did I miss it? Should I submit a PR for it? If so, where would it fit best? stat_linux.go?

Ben Kochie

unread,
Jan 6, 2017, 2:17:47 PM1/6/17
to ari.a...@gmail.com, Prometheus Developers
This would be useful, but the kind of file reading you're talking about would be quite expensive.  I don't think this is something we would include.

You could collect this using the textfile interface.

--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.
To post to this group, send email to prometheus-developers@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/0903919a-61b4-4363-8475-0a1fa3b7e7ea%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Marcus Franke

unread,
Jan 9, 2017, 2:26:55 AM1/9/17
to ari.a...@gmail.com, Prometheus Developers
Hi,

I think that grep command is quite complex. There is a simpler way to get the thread count:

~ % cd /proc/2113/task
/proc/2113/task(:|✔) % ls
2113  2124  2128  2129  2130  2131  2132  2159  2763  2927  2928  2930  2931  2932

There is one directory per thread including the main thread. You can stat the task directory
and count the subdirectories and you will have your thread count. For some internal stuff
I use mitchellh's go-ps library to traverse over the /proc/[0-9]+/ directories to find "my" threads,
but I guess that could be some kind of overkill for the node_exporter.


--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
To post to this group, send email to prometheus...@googlegroups.com.

Matthias Rampke

unread,
Jan 9, 2017, 3:45:57 AM1/9/17
to Marcus Franke, ari.a...@gmail.com, Prometheus Developers

This is still O(n) in the number of processes, which can also get very high. In a very loaded system, we'd be adding even more load trying to monitor it, and get slower at monitoring.

I think it's better to solve this asynchronously (i.e. by feeding the textfile collector), but the best way by far would be to instrument the application (runtime) itself. A process should know its own thread count anyway?

/MR


Marcus Franke

unread,
Jan 9, 2017, 4:04:02 AM1/9/17
to Matthias Rampke, ari.a...@gmail.com, Prometheus Developers
Sure this is O(n), but it will stay this way even when you use some cron job, that will write to the textfile collectore. And its even worse, as you have a second process that gets started by cron, it needs to write these to a file and the node_exporter must read the prom file. That way even more cpu cycles will be burnt on your loaded scenario server than a native implementation in the node_exporter. 

This will only work out if you collect the thread count with a lower frequency than your scrape interval.

We do this for our internal monitoring, as we are only interested in the threads of our own application and we collect additional stats from the threads sched file. But I read Ari's request to be more global than just a single application.

The feature could be switched off by default, like the textcollector. So everyone could decide to use it or not.

Marcus

Ben Kochie

unread,
Jan 22, 2017, 10:13:06 AM1/22/17
to Marcus Franke, Matthias Rampke, ari.a...@gmail.com, Prometheus Developers
If you're using cgroups (for example, with systemd), you can get tasks counts from the cgroup tasks.

# wc -l /sys/fs/cgroup/systemd/system.slice/*.service/tasks
16 /sys/fs/cgroup/systemd/system.slice/node_exporter.service/tasks
17 /sys/fs/cgroup/systemd/system.slice/prometheus.service/tasks
...

A task count could be added to the systemd collector.

To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.
To post to this group, send email to prometheus-developers@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.
To post to this group, send email to prometheus-developers@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.
To post to this group, send email to prometheus-developers@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/CAFRuYVee39piSYPQMKGxS0xTaTUg-W1g%3DReQ2L9fsbNSPnNn_g%40mail.gmail.com.

Marcus Franke

unread,
Jan 23, 2017, 10:06:41 AM1/23/17
to Ben Kochie, Matthias Rampke, Prometheus Developers, ari.a...@gmail.com
Hi,

I would speak against the systemd exporter as there are still many users with non-systemd distributions for example those long-term enterprise systems like RHEL6. I don't know about SuSE SLES, but I guess its the same situation. Additionally, looking into one of my servers (rhel6) this approach depends on active cgroup usage for your services. My Desktop (arch linux) uses systemd but I find no cgroup directory structure in /sys, too .. This sounds like a very specialized config case for any kind of exporter, from my understanding the original poster wanted a more general solution. 

On the other hand, I like the cgroup approach, as you would only monitor your own applications and not the whole system.. if you could manage to put all your applications in cgroups, which imposes quite some work. ;)

In my opinion the node_exporter still fits most for this.

Regards,
Marcus

To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
To post to this group, send email to prometheus...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
To post to this group, send email to prometheus...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
To post to this group, send email to prometheus...@googlegroups.com.

Brian Brazil

unread,
Jan 23, 2017, 10:12:31 AM1/23/17
to Marcus Franke, Ben Kochie, Matthias Rampke, Prometheus Developers, ari.a...@gmail.com
On 23 January 2017 at 15:06, Marcus Franke <marcus...@gmail.com> wrote:
Hi,

I would speak against the systemd exporter as there are still many users with non-systemd distributions for example those long-term enterprise systems like RHEL6. I don't know about SuSE SLES, but I guess its the same situation. Additionally, looking into one of my servers (rhel6) this approach depends on active cgroup usage for your services. My Desktop (arch linux) uses systemd but I find no cgroup directory structure in /sys, too .. This sounds like a very specialized config case for any kind of exporter, from my understanding the original poster wanted a more general solution. 

On the other hand, I like the cgroup approach, as you would only monitor your own applications and not the whole system.. if you could manage to put all your applications in cgroups, which imposes quite some work. ;)

In my opinion the node_exporter still fits most for this.

This issue is more that no efficient way to obtain this data has been proposed. Anything that walks all the pids in /proc can be problematic, and that's before the possibility of it being called once a second is considered.

Brian
 

Regards,
Marcus


To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.
To post to this group, send email to prometheus-developers@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.
To post to this group, send email to prometheus-developers@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.
To post to this group, send email to prometheus-developers@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.
To post to this group, send email to prometheus-developers@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/CAFRuYVck3jv_TqdOXVC3uwjJG6Lw6AtewsW20Pu8x2q_%2Ba%3DXDw%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.



--
Reply all
Reply to author
Forward
0 new messages