snmp exporter - snmp v3 error - wrong digest

651 views

Skip to first unread message

Artyom Ivanov

unread,

May 31, 2023, 6:39:49 AM5/31/23

to Prometheus Users

Hello everyone!

We got some snmp devices monitored via snmp v3.

Infa:

snmp exporter version v0.21.0

snmp.yml

=====

...

version: 3
max_repetitions: 5
retries: 3
timeout: 5s
auth:
security_level: authPriv
username: monitor
password: password
auth_protocol: SHA
priv_protocol: DES
priv_password: password

...

=====

Almost all the time scrap is done right, but sometimes some devices (not at the same time) register snmp auth failure with such msg: "Failed to authenticate SNMP message".

Debug snmp exporter show such an error: "...caller=collector.go:282 level=info module=if_mib target=some_target msg="Error scraping target" err="error getting target some_target: wrong digest"".

Maybe someone knows how to debug the error further to understand on what side the error is?

Brian Candler

unread,

May 31, 2023, 1:50:23 PM5/31/23

to Prometheus Users

If these problems are with the same target device type, but intermittent, I suspect it's a bug with the firmware on that device type.

See if a firmware upgrade is available, or report it to the vendor. If you can replicate the problem using the net-snmp command line tools (e.g. snmpget or snmpbulkwalk) then so much the better.

Here's the relevant bit of code from snmp_exporter:

oids := len(getOids)
if oids > maxOids {
oids = maxOids
}

level.Debug(logger).Log("msg", "Getting OIDs", "oids", oids)
getStart := time.Now()
packet, err := snmp.Get(getOids[:oids])
if err != nil {
if err == context.Canceled {
return results, fmt.Errorf("scrape cancelled after %s (possible timeout) getting target %s",
time.Since(getInitialStart), snmp.Target)
}
return results, fmt.Errorf("error getting target %s: %s", snmp.Target, err)
}

If you set the logging level to "debug", you should also get some more logs, although the value "oids" is the number of oids, not the oids themselves. So a slight modification to log "getOids" may confirm which oids are being fetched - although it should be just the ones labelled "get:" in the YAML.

In the case of if_mib, I see:

if_mib:
walk:
- 1.3.6.1.2.1.2
- 1.3.6.1.2.1.31.1.1
get:
- 1.3.6.1.2.1.1.3.0

Hence it seems likely the problem could be replicated using snmpget ...... 1.3.6.1.2.1.1.3.0

(which is sysUpTime.0). You'll need to sort out all the flags, e.g. (completely untested)

snmpget -v3 -l authPriv -a SHA -x DES -u monitor -A password -X password ip.add.re.ss 1.3.6.1.2.1.1.3.0