Snmp Exporter - scrape timeout for Huawei Core Router which has more than 3000 interfaces

165 views
Skip to first unread message

Umut Cokbilir

unread,
Nov 26, 2022, 3:01:44 AM11/26/22
to Prometheus Users
Hi All,

I've tried different timeout interval like 10s, 1m, 5m and max_repetitions like 25, 20, 10 but I couldn't solve the problem. What should I use prometheus scrape interval, snmp_exporter timeout interval and max_repetitions?

Debug:
ts=2022-11-26T07:53:11.587Z caller=scrape.go:1343 level=debug component="scrape manager" scrape_pool=mtx target="http://10.86.35.25:30020/snmp?module=huawei&target=10.85.12.1" msg="Scrape failed" err="Get \"http://10.86.35.25:30020/snmp?module=huawei&target=10.85.12.1\": context deadline exceeded"
level=info ts=2022-11-26T07:53:15.747Z caller=collector.go:224 module=huawei target=10.85.12.1 msg="Error scraping target" err="scrape canceled (possible timeout) walking target 10.85.12.1"

snmp_scrape_duration_seconds 564.382868812

OUTPUT:
# HELP snmp_scrape_duration_seconds Total SNMP time scrape took (walk and processing).
# TYPE snmp_scrape_duration_seconds gauge
snmp_scrape_duration_seconds 564.382868812
# HELP snmp_scrape_pdus_returned PDUs returned from walk.
# TYPE snmp_scrape_pdus_returned gauge
snmp_scrape_pdus_returned 13715
# HELP snmp_scrape_walk_duration_seconds Time SNMP walk/bulkwalk took.
# TYPE snmp_scrape_walk_duration_seconds gauge
snmp_scrape_walk_duration_seconds 564.285584201
# HELP sysName An administratively-assigned name for this managed node - 1.3.6.1.2.1.1.5
# TYPE sysName gauge
sysName{sysName="2886(1)-PTN3221412_MEDYAPARK"} 1
# HELP sysUpTime The time (in hundredths of a second) since the network management portion of the system was last re-initialized. - 1.3.6.1.2.1.1.3
# TYPE sysUpTime gauge
sysUpTime 2.512855182e+09

Thanks

Brian Candler

unread,
Nov 26, 2022, 5:05:30 AM11/26/22
to Prometheus Users
Most likely this a bug in your device.  A scrape duration of 564 seconds is bad, and getting 13715 SNMP PDUs in a single scrape is bad.  Maybe there is some sort of loop in its responses.

Can you do the same walks using snmpbulkwalk?  You can find the OIDs to walk in snmp.yml under the "huawei" module.  Then if you can find the particular subtree causing the problem, you can disable it.

If your are not using SNMPv3 with privacy (authPriv), then you can also use tcpdump to decode the packets and show you what's going on:
tcpdump -i eth0 -nn -s0 -v host 10.85.12.1 and udp port 161
Reply all
Reply to author
Forward
0 new messages