If these problems are with the same target device type, but intermittent, I suspect it's a bug with the firmware on that device type.
See if a firmware upgrade is available, or report it to the vendor. If you can replicate the problem using the net-snmp command line tools (e.g. snmpget or snmpbulkwalk) then so much the better.
Here's the relevant bit of code from snmp_exporter:
oids := len(getOids)
if oids > maxOids {
oids = maxOids
}
level.Debug(logger).Log("msg", "Getting OIDs", "oids", oids)
getStart := time.Now()
packet, err := snmp.Get(getOids[:oids])
if err != nil {
if err == context.Canceled {
return results, fmt.Errorf("scrape cancelled after %s (possible timeout) getting target %s",
time.Since(getInitialStart), snmp.Target)
}
return results, fmt.Errorf("error getting target %s: %s", snmp.Target, err)
}
If you set the logging level to "debug", you should also get some more logs, although the value "oids" is the number of oids, not the oids themselves. So a slight modification to log "getOids" may confirm which oids are being fetched - although it should be just the ones labelled "get:" in the YAML.
In the case of if_mib, I see:
if_mib:
walk:
- 1.3.6.1.2.1.2
- 1.3.6.1.2.1.31.1.1
get:
- 1.3.6.1.2.1.1.3.0
Hence it seems likely the problem could be replicated using snmpget ...... 1.3.6.1.2.1.1.3.0
(which is sysUpTime.0). You'll need to sort out all the flags, e.g. (completely untested)
snmpget -v3 -l authPriv -a SHA -x DES -u monitor -A password -X password ip.add.re.ss 1.3.6.1.2.1.1.3.0
and then repeat it periodically to see if it fails from time to time.