Exposing total thread count in node-exporter on linux systems

ari.a...@gmail.com

unread,

Jan 6, 2017, 1:51:15 PM1/6/17

to Prometheus Developers

Hello -

I'd like to track the total thread count on our systems. This has been useful in identifying cases where threads stack up due to timeouts, etc.

This gives a sample of the metric I'm looking for:
grep -s '^Threads' /proc/[0-9]*/status | awk '{ sum += $2; } END { print "threads.value", sum; }'

I didn't see anything like this in the existing collectors. Did I miss it? Should I submit a PR for it? If so, where would it fit best? stat_linux.go?

Ben Kochie

unread,

Jan 6, 2017, 2:17:47 PM1/6/17

to ari.a...@gmail.com, Prometheus Developers

This would be useful, but the kind of file reading you're talking about would be quite expensive. I don't think this is something we would include.

You could collect this using the textfile interface.

--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.
To post to this group, send email to prometheus-developers@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/0903919a-61b4-4363-8475-0a1fa3b7e7ea%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Marcus Franke

unread,

Jan 9, 2017, 2:26:55 AM1/9/17

to ari.a...@gmail.com, Prometheus Developers

Hi,

I think that grep command is quite complex. There is a simpler way to get the thread count:

~ % cd /proc/2113/task

/proc/2113/task(:|✔) % ls

2113 2124 2128 2129 2130 2131 2132 2159 2763 2927 2928 2930 2931 2932

There is one directory per thread including the main thread. You can stat the task directory

and count the subdirectories and you will have your thread count. For some internal stuff

I use mitchellh's go-ps library to traverse over the /proc/[0-9]+/ directories to find "my" threads,

but I guess that could be some kind of overkill for the node_exporter.

--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.

To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
To post to this group, send email to prometheus...@googlegroups.com.

Matthias Rampke

unread,

Jan 9, 2017, 3:45:57 AM1/9/17

to Marcus Franke, ari.a...@gmail.com, Prometheus Developers

This is still O(n) in the number of processes, which can also get very high. In a very loaded system, we'd be adding even more load trying to monitor it, and get slower at monitoring.

I think it's better to solve this asynchronously (i.e. by feeding the textfile collector), but the best way by far would be to instrument the application (runtime) itself. A process should know its own thread count anyway?

/MR

To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/CAFRuYVcw0c8Ou_9HbZsZ4PmX0kQEB7uaO_6Ai3huOAK15xNF-Q%40mail.gmail.com.

Marcus Franke

unread,

Jan 9, 2017, 4:04:02 AM1/9/17

to Matthias Rampke, ari.a...@gmail.com, Prometheus Developers

Sure this is O(n), but it will stay this way even when you use some cron job, that will write to the textfile collectore. And its even worse, as you have a second process that gets started by cron, it needs to write these to a file and the node_exporter must read the prom file. That way even more cpu cycles will be burnt on your loaded scenario server than a native implementation in the node_exporter.

This will only work out if you collect the thread count with a lower frequency than your scrape interval.

We do this for our internal monitoring, as we are only interested in the threads of our own application and we collect additional stats from the threads sched file. But I read Ari's request to be more global than just a single application.

The feature could be switched off by default, like the textcollector. So everyone could decide to use it or not.

Marcus

Ben Kochie

unread,

Jan 22, 2017, 10:13:06 AM1/22/17

to Marcus Franke, Matthias Rampke, ari.a...@gmail.com, Prometheus Developers

If you're using cgroups (for example, with systemd), you can get tasks counts from the cgroup tasks.

# wc -l /sys/fs/cgroup/systemd/system.slice/*.service/tasks

16 /sys/fs/cgroup/systemd/system.slice/node_exporter.service/tasks

17 /sys/fs/cgroup/systemd/system.slice/prometheus.service/tasks

...

A task count could be added to the systemd collector.

To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.
To post to this group, send email to prometheus-developers@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/0903919a-61b4-4363-8475-0a1fa3b7e7ea%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.

To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.
To post to this group, send email to prometheus-developers@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/CAFRuYVcw0c8Ou_9HbZsZ4PmX0kQEB7uaO_6Ai3huOAK15xNF-Q%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

--

You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.

To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.
To post to this group, send email to prometheus-developers@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/CAFRuYVee39piSYPQMKGxS0xTaTUg-W1g%3DReQ2L9fsbNSPnNn_g%40mail.gmail.com.

Marcus Franke

unread,

Jan 23, 2017, 10:06:41 AM1/23/17

to Ben Kochie, Matthias Rampke, Prometheus Developers, ari.a...@gmail.com

Hi,

I would speak against the systemd exporter as there are still many users with non-systemd distributions for example those long-term enterprise systems like RHEL6. I don't know about SuSE SLES, but I guess its the same situation. Additionally, looking into one of my servers (rhel6) this approach depends on active cgroup usage for your services. My Desktop (arch linux) uses systemd but I find no cgroup directory structure in /sys, too .. This sounds like a very specialized config case for any kind of exporter, from my understanding the original poster wanted a more general solution.

On the other hand, I like the cgroup approach, as you would only monitor your own applications and not the whole system.. if you could manage to put all your applications in cgroups, which imposes quite some work. ;)

In my opinion the node_exporter still fits most for this.

Regards,

Marcus

To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
To post to this group, send email to prometheus...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/0903919a-61b4-4363-8475-0a1fa3b7e7ea%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.

To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
To post to this group, send email to prometheus...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/CAFRuYVcw0c8Ou_9HbZsZ4PmX0kQEB7uaO_6Ai3huOAK15xNF-Q%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.

To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
To post to this group, send email to prometheus...@googlegroups.com.

Brian Brazil

unread,

Jan 23, 2017, 10:12:31 AM1/23/17

to Marcus Franke, Ben Kochie, Matthias Rampke, Prometheus Developers, ari.a...@gmail.com

On 23 January 2017 at 15:06, Marcus Franke <marcus...@gmail.com> wrote:

Hi,

I would speak against the systemd exporter as there are still many users with non-systemd distributions for example those long-term enterprise systems like RHEL6. I don't know about SuSE SLES, but I guess its the same situation. Additionally, looking into one of my servers (rhel6) this approach depends on active cgroup usage for your services. My Desktop (arch linux) uses systemd but I find no cgroup directory structure in /sys, too .. This sounds like a very specialized config case for any kind of exporter, from my understanding the original poster wanted a more general solution.

On the other hand, I like the cgroup approach, as you would only monitor your own applications and not the whole system.. if you could manage to put all your applications in cgroups, which imposes quite some work. ;)

In my opinion the node_exporter still fits most for this.

This issue is more that no efficient way to obtain this data has been proposed. Anything that walks all the pids in /proc can be problematic, and that's before the possibility of it being called once a second is considered.

Brian

Regards,
Marcus

To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.
To post to this group, send email to prometheus-developers@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/0903919a-61b4-4363-8475-0a1fa3b7e7ea%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.

To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.
To post to this group, send email to prometheus-developers@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/CAFRuYVcw0c8Ou_9HbZsZ4PmX0kQEB7uaO_6Ai3huOAK15xNF-Q%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.

To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.
To post to this group, send email to prometheus-developers@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/CAFRuYVee39piSYPQMKGxS0xTaTUg-W1g%3DReQ2L9fsbNSPnNn_g%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

--

You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.

To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.
To post to this group, send email to prometheus-developers@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/CAFRuYVck3jv_TqdOXVC3uwjJG6Lw6AtewsW20Pu8x2q_%2Ba%3DXDw%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

--

Brian Brazil

www.robustperception.io

Reply all

Reply to author

Forward