snmp_exporter a lot slower than bulksnmpwalk

24 views
Skip to first unread message

Wilhelm Wijkander

unread,
Oct 13, 2025, 10:31:55 AM (2 days ago) Oct 13
to Prometheus Users
Hello!
I have snmp_exporter set up to scrape a number of network switches. I have a weird issue where a specific device is a lot slower than all the others of the same model to scrape - but only when using snmp_exporter!

The time needed for snmp_exporter to scrape the device(snmp_scrape_duration_seconds) ranges from 40 to 90+ seconds(the current timeout).

However, using snmpbulkwalk from the same server, results in a much faster scrape, in my tests 4-10 seconds. I tried with and without the -Cr flag set to 10 which is the same value as I am using as max_repetitions for snmp_exporter, without change.

I have also tried scraping the resolved IP directly to exclude DNS.

My current theory is that there might be some behavior where snmp_exporter/gosnmp differs from what snmpbulkwalk does, and that this tickles the network switch in the wrong way.

Anyone experienced something similar? Anyone have any ideas about other snmpbulkwalk flags I could try to replicate the issue?

Best regards,
Wilhelm

Wilhelm Wijkander

unread,
Oct 14, 2025, 3:15:38 AM (yesterday) Oct 14
to Jseb Tarot, Prometheus Users
Hello,

Den mån 13 okt. 2025 kl 18:48 skrev Jseb Tarot <tarotcrcu...@gmail.com>:
>
> Why to scrap all mib.scrap only the value needed!. Personnaly a select branch needed and that it.
>
> I recommande to do that.

To be clear I'm only scraping a few(30) OIDs. Something like this:

time for i in 1.3.6.1.2.1.25.1.1 1.3.6.1.2.1.2.2.1.8
1.3.6.1.2.1.2.2.1.1 1.3.6.1.2.1.31.1.1.1.1 1.3.6.1.2.1.31.1.1.1.18
1.3.6.1.2.1.2.2.1.2 (and so on); do snmpbulkwalk -v2c -Cr10 -On -c
community 192.0.2.2 $i; done

takes about 4-10s, while snmp_exporter with the following
generator.yml takes at least 40, and up to 90+ seconds:

my-switch:
walk:
- hrSystemUptime # 1.3.6.1.2.1.25.1.1
- ifOperStatus # 1.3.6.1.2.1.2.2.1.8
- ifIndex # 1.3.6.1.2.1.2.2.1.1
- ifName # 1.3.6.1.2.1.31.1.1.1.1
- ifAlias # 1.3.6.1.2.1.31.1.1.1.18
- ifDescr # 1.3.6.1.2.1.2.2.1.2
- (and so on...)

timeout: 30s
max_repetitions: 10
lookups:
- source_indexes: [ifIndex]
lookup: ifDescr
- source_indexes: [ifIndex]
lookup: ifAlias
- source_indexes: [ifIndex]
lookup: ifName

Regards,
Wilhelm

Brian Candler

unread,
Oct 14, 2025, 6:24:45 AM (yesterday) Oct 14
to Prometheus Users
tcpdump (comparing the snmpbulkwalk and snmp_exporter traffic) may give some clues, as may snmp.debug-packets

For snmpbulkwalk, default retries is 5 but timeout is 1.  For snmp_exporter, retries is 3 and default timeout is 5; this means that each lost packet could add 5 seconds rather than 1 second to the total time.

Wilhelm Wijkander

unread,
Oct 14, 2025, 12:16:09 PM (yesterday) Oct 14
to Brian Candler, Prometheus Users
Hey Brian,

Den tis 14 okt. 2025 kl 12:24 skrev 'Brian Candler' via Prometheus
Users <promethe...@googlegroups.com>:
> For snmpbulkwalk, default retries is 5 but timeout is 1. For snmp_exporter, retries is 3 and default timeout is 5; this means that each lost packet could add 5 seconds rather than 1 second to the total time.

Thanks for that hint! I was able to replicate the issue with
snmpbulkwalk when changing the timeout value. As you can see in the
config I quoted before someone had raised it in snmp_exporter to a
quite high value(30s) a long time ago, for unknown reasons. Turning
that down to 15s alleviated the issue.

As for why it has to retry for that specific device so much remains a
mystery but off topic for this list... :)

Regards,
Wilhelm

Ben Kochie

unread,
Oct 14, 2025, 1:55:02 PM (yesterday) Oct 14
to Brian Candler, Prometheus Users
I wonder if we should change those defaults to match net-snmp.

Also, the snmp_exporter has built-in SNMP packet debug logging. This makes it less necessary to use tcpdump. Also means you can packet log v3 encrypted traffic.

On Tue, Oct 14, 2025, 11:24 'Brian Candler' via Prometheus Users <promethe...@googlegroups.com> wrote:
tcpdump (comparing the snmpbulkwalk and snmp_exporter traffic) may give some clues, as may snmp.debug-packets

For snmpbulkwalk, default retries is 5 but timeout is 1.  For snmp_exporter, retries is 3 and default timeout is 5; this means that each lost packet could add 5 seconds rather than 1 second to the total time.

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/prometheus-users/1539d3be-5b1b-464a-8b8e-6852f535864en%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages