snmp exporter - snmp v3 error - wrong digest

651 views
Skip to first unread message

Artyom Ivanov

unread,
May 31, 2023, 6:39:49 AM5/31/23
to Prometheus Users
Hello everyone!
We got some snmp devices monitored via snmp v3.
Infa:
snmp exporter version v0.21.0
snmp.yml
=====
...
  version: 3
  max_repetitions: 5
  retries: 3
  timeout: 5s
  auth:
    security_level: authPriv
    username: monitor
    password: password
    auth_protocol: SHA
    priv_protocol: DES
    priv_password: password
...
=====
Almost all the time scrap is done right, but sometimes some devices (not at the same time) register snmp auth failure with such msg: "Failed to authenticate SNMP message".
Debug snmp exporter show such an error: "...caller=collector.go:282 level=info module=if_mib target=some_target msg="Error scraping target" err="error getting target some_target: wrong digest"".
Maybe someone knows how to debug the error further to understand on what side the error is?

Brian Candler

unread,
May 31, 2023, 1:50:23 PM5/31/23
to Prometheus Users
If these problems are with the same target device type, but intermittent, I suspect it's a bug with the firmware on that device type.

See if a firmware upgrade is available, or report it to the vendor.  If you can replicate the problem using the net-snmp command line tools (e.g. snmpget or snmpbulkwalk) then so much the better.

Here's the relevant bit of code from snmp_exporter:

                oids := len(getOids)
                if oids > maxOids {
                        oids = maxOids
                }

                level.Debug(logger).Log("msg", "Getting OIDs", "oids", oids)
                getStart := time.Now()
                packet, err := snmp.Get(getOids[:oids])
                if err != nil {
                        if err == context.Canceled {
                                return results, fmt.Errorf("scrape cancelled after %s (possible timeout) getting target %s",
                                        time.Since(getInitialStart), snmp.Target)
                        }
                        return results, fmt.Errorf("error getting target %s: %s", snmp.Target, err)
                }

If you set the logging level to "debug", you should also get some more logs, although the value "oids" is the number of oids, not the oids themselves.  So a slight modification to log "getOids" may confirm which oids are being fetched - although it should be just the ones labelled "get:" in the YAML.

In the case of if_mib, I see:

if_mib:
  walk:
  - 1.3.6.1.2.1.2
  - 1.3.6.1.2.1.31.1.1
  get:
  - 1.3.6.1.2.1.1.3.0

Hence it seems likely the problem could be replicated using snmpget ...... 1.3.6.1.2.1.1.3.0
(which is sysUpTime.0).  You'll need to sort out all the flags, e.g. (completely untested)

snmpget -v3 -l authPriv -a SHA -x DES -u monitor -A password -X password ip.add.re.ss 1.3.6.1.2.1.1.3.0

and then repeat it periodically to see if it fails from time to time.
Reply all
Reply to author
Forward
0 new messages