Maximum targets for exporter

170 views
Skip to first unread message

Elliott Balsley

unread,
Jan 11, 2024, 8:32:17 PM1/11/24
to Prometheus Users
I'm curious if anyone has experimented to find out how many targets can reasonably be scraped by a single instance of blackbox and snmp exporters.  I know Prometheus itself can handle tens of thousands of targets, but I'm wondering at what point it becomes necessary to split up the scraping.  I'll find out for myself soon enough, I just wanted to check and see if anyone has tested this already.  I'm thinking I would have around 10K targets for blackbox, and 1K for snmp.

I'm using http_sd_config with a 15 second refresh interval, so that's another potential bottleneck I'll have to test.

Brian Candler

unread,
Jan 12, 2024, 6:43:51 AM1/12/24
to Prometheus Users
The http_sd_config refresh is going to be a very tiny part of the resource utilisation of Prometheus, although 15 seconds is quite aggressive.

As for the exporters, it depends very much on the scrape interval and the duration of each probe, the type of probe, and number of cores you have.

For example: let's say you have a 15 second scrape interval and 10K targets = a new scrape every 1.5ms on average (it spreads them out over the time period)

If each blackbox or snmp probe takes 150ms to complete, then you are processing 100 probes concurrently on average.

If you have 4 cores, then each core is handling 25 probe goroutines. Most of the time each goroutine will be waiting for network response from the target system.  But some probes may be more computationally expensive, e.g. those which involve setting up TLS connections, or SNMP privacy/authentication modes.

In short, it sounds to me like it should be fine, but monitor it to be sure.

Before doing any sort of sharding, I'd first put blackbox/snmp exporters into separate VMs (i.e. separate from Prometheus itself). That's very simple to implement, and gives you a clearer picture of the resource utilisation of each.

Ben Kochie

unread,
Jan 12, 2024, 3:50:57 PM1/12/24
to Elliott Balsley, Prometheus Users
Those sound like reasonable amounts for those exporters.

I've heard of people hitting thousands of SNMP devices from the snmp_exporter.

Since the exporters are in Go, they scale well. But if it's not enough, the advantage of their design means they can be deployed horizontally. You could run several exporters in parallel and use a simple http load balancer like Envoy or HAProxy. 

On Fri, Jan 12, 2024, 02:32 'Elliott Balsley' via Prometheus Users <promethe...@googlegroups.com> wrote:
I'm curious if anyone has experimented to find out how many targets can reasonably be scraped by a single instance of blackbox and snmp exporters.  I know Prometheus itself can handle tens of thousands of targets, but I'm wondering at what point it becomes necessary to split up the scraping.  I'll find out for myself soon enough, I just wanted to check and see if anyone has tested this already.  I'm thinking I would have around 10K targets for blackbox, and 1K for snmp.

I'm using http_sd_config with a 15 second refresh interval, so that's another potential bottleneck I'll have to test.

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CALajkdh7EhHAVN5nJNYqJjKvcH_rfT1L7ZaPvPR4L-xjypKSbg%40mail.gmail.com.

Alexander Wilke

unread,
Jan 12, 2024, 7:20:10 PM1/12/24
to Prometheus Users
Hello,
sorry to hijack this thread a little bit but Brian talks about "4 CPU cores" and Ben says "scale horizontally".

Just for interest - why not just use 8, 16, or 32 CPU cores? Is Go limited at a specific CPU amount or is there a disadvantage to have to many cores?
I think if someone is monitoring so many devices this is enterprise network and servers/VMs with more CPUs are no problem.

Brian Candler

unread,
Jan 13, 2024, 4:34:21 AM1/13/24
to Prometheus Users
One reason is you may already have eight 4-core servers lying around.

If it's a VM then of course you can just scale up to the largest instance size available, before you need to go to multiple instnaces.

Brian Candler

unread,
Jan 13, 2024, 4:35:34 AM1/13/24
to Prometheus Users
Just to clarify: I picked "4 cores" out of thin air just as an example to work through, same as I picked 15 second scrape interval and 150ms per scrape.

Ben Kochie

unread,
Jan 13, 2024, 4:51:49 AM1/13/24
to Alexander Wilke, Prometheus Users
No, Go is not specifically limited to a number of cores. For the exporters, they should scale vertically just fine as well as horizontally.

The only limit I've seen is how well the SNMP exporter's UDP packet handling works. IIRC you may run into UDP packets per second limits before you run into actual CPU limits.

It's not something a lot of people have tested/used in production that scale. At least not enough that I've gotten any good feedback.

Alexander Wilke

unread,
Jan 13, 2024, 6:35:17 AM1/13/24
to Prometheus Users
Thank you for clarification. I was interested in If there are any disadvantages If the amount of CPU cores is too high maybe because of Overhead to share the load.

Good to know i can scale it easily If i run it on VMs

Elliott Balsley

unread,
Jan 16, 2024, 9:34:49 AM1/16/24
to Prometheus Users
Thanks, it sounds promising on the Prometheus side. 
I’ve actually found a performance issue with the Prometheus plug-in for Netbox that I was using to provide the HTTP discovery. It hammers the Netbox database with excessive queries and takes over 30 seconds to respond with just 2000 targets. So, if the refresh interval is less than 30 seconds, the whole Netbox instance grinds to a halt. Brian, have you encountered this issue, since I think you use Netbox too? 

Elliott Balsley

unread,
Jan 16, 2024, 9:45:15 AM1/16/24
to Prometheus Users
Even if that plugin can be optimized for performance, it still feels like an inefficient approach. Users will add and modify devices in Netbox, for example marking them off-line to remove from monitoring. These changes are infrequent, and I want Prometheus to respond as fast as possible. So I’m thinking a better design would be to use a Netbox event trigger to generate the new yaml file, and somehow transfer that file to Prometheus for file SD discovery, perhaps using SCP. 
Reply all
Reply to author
Forward
0 new messages