Error 500 when scrapping a slow APC UPS

47 views
Skip to first unread message

Erwin

unread,
Aug 9, 2021, 5:23:18 AM8/9/21
to Prometheus Users
Hello everyone,

I'm trying to monitoring the UPS we have in our datacenter, I use the "apcups" module to do that.
However, snmp_exporter seems to struggle a bit with a particular OID : 1.3.6.1.4.1.318.1.1.1.2

An snmpwalk on this OID takes 14,8 seconds.

If I mesure the time snmp_exporter use for this particular OID (having commented everything else) : 
$ time wget "snmp_exporter:9116/snmp?module=apcups&target=<my_UPS_IP>"
wget: server returned error: HTTP/1.1 500 Internal Server Error
Command exited with non-zero status 1
real    0m 14.91s

snmp_exporter has no problem with the rest of the OIDs configured for this module, it takes 3.6 seconds to get everything.

I don't know how to get more logs out of that, is the latency the problem?
having a check every 30/60 seconds would be okay, heck, even every 5 minutes.
But snmp_exporter has a timeout of 15s that I can't configure?

Is there any solution to fix that?

Thanks!

Brian Candler

unread,
Aug 9, 2021, 8:45:00 AM8/9/21
to Prometheus Users
On Monday, 9 August 2021 at 10:23:18 UTC+1 Erwin wrote:
But snmp_exporter has a timeout of 15s that I can't configure?

You can set "timeout:" in the source generator.yml, or update snmp.yml by hand.

Reducing max_repetitions (below the default of 25) helps with some devices too.

Check with snmpbulkwalk and/or tcpdump whether the delay is due to the UPS being slow to respond, or some other problem.  Taking >15s is unusual, but not entirely unheard of - e.g. old Dell 5524 switches can take longer than that - but they're returning a whole load of interfaces.

Erwin

unread,
Aug 10, 2021, 6:23:17 AM8/10/21
to Prometheus Users
Changing Timeout: and max_repetition in the generator.yml didn't do anything sadly... I'm still getting errors 500 from snmp_exporter after 19-20s.
I've played a bit with the data I got from the UPS, and decided to NOT poll data from  1.3.6.1.4.1.318.1.1.1.2.3.10 (upsHighPrecBatteryPacks), as this collection of OID took way too much time to collect.

And... so far so good!

Prometheus needs 4.3s to get everything now, but that's fine for me.

Thanks for the help :)

Reply all
Reply to author
Forward
0 new messages