Generator snmp_exporter return error 500 in prometheus

165 views
Skip to first unread message

Nicolas

unread,
Apr 15, 2024, 4:38:50 PM4/15/24
to Prometheus Users
Hello,
I have a strange error and I hope you can help me, or maybe there is a problem with the snmp_exporter generator in the latest version.

I'm using snmp_exporter version 0.25.0 and prometheus 
My generator.yml file looks like this:
auths:
  cisco_v3:
    security_level: authPriv
    username: secret_name
    password: secret
    auth_protocol: SHA
    priv_protocol: AES
    priv_password: secret
    version: 3
modules:
  arte_mib:
    walk:
    - 1.3.6.1.2.1.1'
    - 1.3.6.1.2.1.2.2
    - 1.3.6.1.2.1.31.1.1
    - 1.3.6.1.4.1.9.9.109.1.1.1.1.8
    - 1.3.6.1.4.1.9.9.13.1.4
    - 1.3.6.1.4.1.9.9.13.1.5
    - 1.3.6.1.4.1.9.9.48.1.1.1.5
    - 1.3.6.1.4.1.9.9.48.1.1.1.6
    - 1.3.6.1.4.1.9.9.68.1.2.2.1.2

I generate my snmp.yml file without error:
ts=2024-04-15T20:29:38.537Z caller=net_snmp.go:175 level=info msg="Loading MIBs" from=/usr/share/snmp/mibs/
ts=2024-04-15T20:29:38.577Z caller=main.go:53 level=info msg="Generating config for module" module=arte_mib
ts=2024-04-15T20:29:38.580Z caller=main.go:68 level=info msg="Generated metrics" module=arte_mib metrics=64
ts=2024-04-15T20:29:38.585Z caller=main.go:93 level=info msg="Config written" file=/etc/prometheus/snmp_generator/snmp_exporter-0.25.0/generator/snmp.yml

And when I push it in my prometheus (conf like this)
  - job_name: 'arte_snmp'
    scrape_interval: 1m
    scrape_timeout: 50s
    file_sd_configs:
      - files:
        - '/etc/prometheus/targets/arte.json'
    metrics_path: /snmp
    params:
      auth: [cisco_v3]
      module: [arte_mib]
    relabel_configs:
      - source_labels: [__address__]
        target_label: instance
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [module]
        target_label: __param_module
      - target_label: __address__
        replacement: 127.0.0.1:9116  

I get a server returned HTTP status 500 Internal Server Error
500.PNG


Note that if, for example, I use the if_mib module and take the snmp.yml provided in the snmp_exporter, I get no error and my servers appear to be UP.

Thanks for your help

Ben Kochie

unread,
Apr 15, 2024, 4:41:24 PM4/15/24
to Nicolas, Prometheus Users
If you use `snmp_exporter --log.level=debug`, what do the logs say?

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/711dd213-92de-4abb-911c-883c0e4535edn%40googlegroups.com.
Message has been deleted
Message has been deleted

Nicolas

unread,
Apr 15, 2024, 5:12:18 PM4/15/24
to Prometheus Users
Hi Ben,
Here the debug log, but they are strange too because with an snmwalk everything is fine.

debug log 
Apr 15 22:56:20 prometheus01 snmp_exporter[16444]: ts=2024-04-15T20:56:20.905Z caller=collector.go:393 level=info auth=cisco_v3 target= xx.xx.xx.xx   module=arte_mib msg="Error scraping target" err="error walking target xx.xx.xx.xx: request timeout (after 3 retries)"
Apr 15 22:56:20 prometheus01 snmp_exporter[16444]: ts=2024-04-15T20:56:20.905Z caller=collector.go:464 level=debug auth=cisco_v3 target=
 xx.xx.xx.xx   module=arte_mib msg="Finished scrape" duration_seconds=20.03805335
Apr 15 22:56:21 prometheus01 snmp_exporter[16444]: ts=2024-04-15T20:56:21.907Z caller=collector.go:393 level=info auth=cisco_v3 target=
 yy.yy.yy.yy   module=arte_mib msg="Error scraping target" err="error walking target yy.yy.yy.yy: request timeout (after 3 retries)"
Apr 15 22:56:21 prometheus01 snmp_exporter[16444]: ts=2024-04-15T20:56:21.907Z caller=collector.go:464 level=debug auth=cisco_v3 target=
 yy.yy.yy.yy   module=arte_mib msg="Finished scrape" duration_seconds=20.083473958
Apr 15 22:56:22 prometheus01 snmp_exporter[16444]: ts=2024-04-15T20:56:22.869Z caller=collector.go:460 level=debug auth=cisco_v3 target=ww.ww.ww.ww module=arte_mib msg="Starting scrape"
Apr 15 22:56:22 prometheus01 snmp_exporter[16444]: ts=2024-04-15T20:56:22.869Z caller=collector.go:214 level=debug auth=cisco_v3 target=
 ww.ww.ww.ww  module=arte_mib msg="Walking subtree" oid=1.3.6.1.2.1.1
Apr 15 22:56:27 prometheus01 snmp_exporter[16444]: ts=2024-04-15T20:56:27.142Z caller=collector.go:393 level=info auth=cisco_v3 target=zz.zz.zz.zz module=arte_mib msg="Error scraping target" err="error walking target 
 zz.zz.zz.zz  : request timeout (after 3 retries)"
Apr 15 22:56:27 prometheus01 snmp_exporter[16444]: ts=2024-04-15T20:56:27.142Z caller=collector.go:464 level=debug auth=cisco_v3 target=
 zz.zz.zz.zz   module=arte_mib msg="Finished scrape" duration_seconds=20.075515731

snmpwalk : 
SNMPv2-MIB::sysDescr.0 = STRING: Cisco IOS Software [Cupertino], Catalyst L3 Switch Software (CAT9K_IOSXE), Version 17.9.4, RELEASE SOFTWARE (fc5)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2023 by Cisco Systems, Inc.
Compiled Wed 26-Jul-23 10:26 by mcpre
SNMPv2-MIB::sysObjectID.0 = OID: SNMPv2-SMI::enterprises.9.1.2494
DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (235735378) 27 days, 6:49:13.78
SNMPv2-MIB::sysContact.0 = STRING:
SNMPv2-MIB::sysName.0 = STRING: swro-arte-strg-data-1.net.com
SNMPv2-MIB::sysLocation.0 = STRING: STRG
SNMPv2-MIB::sysServices.0 = INTEGER: 6
SNMPv2-MIB::sysORLastChange.0 = Timeticks: (0) 0:00:00.00
SNMPv2-MIB::sysORID.1 = OID: SNMPv2-SMI::enterprises.9.7.129
SNMPv2-MIB::sysORID.2 = OID: SNMPv2-SMI::enterprises.9.7.115
SNMPv2-MIB::sysORID.3 = OID: SNMPv2-SMI::enterprises.9.7.265
SNMPv2-MIB::sysORID.4 = OID: SNMPv2-SMI::enterprises.9.7.112
SNMPv2-MIB::sysORID.5 = OID: SNMPv2-SMI::enterprises.9.7.106
SNMPv2-MIB::sysORID.6 = OID: SNMPv2-SMI::enterprises.9.7.582

...

The snmpwalk response is instantaneous, I don't think it's a scrape_interval and timeout problem. In fact, I've even increased the interval to 1 minute and it doesn't change a thing.

I also use tcpdump to see the communication between my equipments and prometheus and everything seems fine :
[~]# tcpdump -i ens192 port 161
22:43:47.138346 IP prometheus01.com.44170 > swro-arte-strg-data-2.net.com.snmp:  F=r U="" E= C="" GetRequest(14)
22:43:47.154121 IP swro-arte-strg-data-2.net.com.snmp > prometheus01.com.44170:  F= U="" E=_80_00_00_09_03_00_68_e5_9e_a4_00_00 C="" Report(33)  S:snmpUsmMIB.usmMIBObjects.usmStats.usmStatsUnknownEngineIDs.0=716692
22:43:47.204598 IP prometheus01.com.44170 > swro-arte-strg-data-2.net.com.snmp:  F=apr U="supserver" [!scoped PDU]9d_5d_4e_0c_1f_26_92_e6_a6_65_0c_3a_04_b0_8e_f7_66_7d_18_97_3f_5d_84_e2_04....
......
80 packets captured
85 packets received by filter
0 packets dropped by kernel

Nicolas

unread,
Apr 15, 2024, 6:24:54 PM4/15/24
to Prometheus Users
I think i found something. It's the mib of SNMPv2-MIB : 1.3.6.1.2.1.1 which dosen't work fine in the generator.
If I'm more precise and give the oid 1.2.6.2.1.1.1 it works for example.
The other oids I put next
    - 1.3.6.1.2.1.2.2
    - 1.3.6.1.2.1.31.1.1
    - 1.3.6.1.4.1.9.9.109.1.1.1.1.8
    - 1.3.6.1.4.1.9.9.13.1.4
    - 1.3.6.1.4.1.9.9.13.1.5
    - 1.3.6.1.4.1.9.9.48.1.1.1.5
    - 1.3.6.1.4.1.9.9.48.1.1.1.6
    - 1.3.6.1.4.1.9.9.68.1.2.2.1.2

 work without a hitch.



Nicolas

unread,
Apr 16, 2024, 4:56:32 AM4/16/24
to Prometheus Users
Hi again,

So I confirm that the 1.3.6.1.2.1.1 doesn't work fine but I don't know why...

debug.log :
214 level=debug auth=cisco_v3 target=xx.xx.xx.xx module=arte_mib msg="Walking subtree" oid=1.3.6.1.2.1.1

393 level=info auth=cisco_v3 target= xx.xx.xx.xx   module=arte_mib msg="Error scraping target" err="error walking target xx.xx.xx.xx: request timeout (after 3 retries)"
464 level=debug auth=cisco_v3 target= xx.xx.xx.xx   module=arte_mib msg="Finished scrape" duration_seconds=20.048886702

$ snmpwalk -v3 -l authPriv -u user -a SHA -A secret -x AES -X secret xx.xx.xx.xx 1.3.6.1.2.1.1
SNMPv2-MIB::sysDescr.0 = STRING: Cisco IOS Software [Cupertino], Catalyst L3 Switch Software (CAT9K_IOSXE), Version 17.9.4, RELEASE SOFTWARE (fc5)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2023 by Cisco Systems, Inc.
Compiled Wed 26-Jul-23 10:26 by mcpre
SNMPv2-MIB::sysObjectID.0 = OID: SNMPv2-SMI::enterprises.9.1.2494
DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (725228220) 83 days, 22:31:22.20
SNMPv2-MIB::sysContact.0 = STRING:
SNMPv2-MIB::sysName.0 = STRING: xx.xx.xx.xx
SNMPv2-MIB::sysLocation.0 = STRING: SITE

SNMPv2-MIB::sysServices.0 = INTEGER: 6
SNMPv2-MIB::sysORLastChange.0 = Timeticks: (0) 0:00:00.00
SNMPv2-MIB::sysORID.1 = OID: SNMPv2-SMI::enterprises.9.7.129
SNMPv2-MIB::sysORID.2 = OID: SNMPv2-SMI::enterprises.9.7.115
SNMPv2-MIB::sysORID.3 = OID: SNMPv2-SMI::enterprises.9.7.265


generator.yml
auths:
  cisco_v3:
    security_level: authPriv
    username: usesr

    password: secret
    auth_protocol: SHA
    priv_protocol: AES
    priv_password: secret
    version: 3
modules:
  arte_mib:
    walk:
    - 1.3.6.1.2.1.1


./generator generate -m /usr/share/snmp/mibs/  -g generator.yml -o snmp.yml
ts=2024-04-16T08:34:31.972Z caller=net_snmp.go:175 level=info msg="Loading MIBs" from=/usr/share/snmp/mibs/
ts=2024-04-16T08:34:32.016Z caller=main.go:53 level=info msg="Generating config for module" module=arte_mib
ts=2024-04-16T08:34:32.018Z caller=main.go:68 level=info msg="Generated metrics" module=arte_mib metrics=12
ts=2024-04-16T08:34:32.019Z caller=main.go:93 level=info msg="Config written" file=/etc/prometheus/snmp_generator/snmp_exporter-0.25.0/generator/snmp.yml


snmp.yml (generated by generator) 
# WARNING: This file was auto-generated using snmp_exporter generator, manual changes will be lost.
auths:
  cisco_v3:
    community: public
    security_level: authPriv
    username: user
    password: secret
    auth_protocol: SHA
    priv_protocol: AES
    priv_password: secret
    version: 3
modules:
  arte_mib:
    walk:
    - 1.3.6.1.2.1.1
    metrics:
    - name: sysDescr
      oid: 1.3.6.1.2.1.1.1
      type: DisplayString
      help: A textual description of the entity - 1.3.6.1.2.1.1.1
    - name: sysObjectID
      oid: 1.3.6.1.2.1.1.2
      type: OctetString
      help: The vendor's authoritative identification of the network management subsystem
        contained in the entity - 1.3.6.1.2.1.1.2
    - name: sysUpTime
      oid: 1.3.6.1.2.1.1.3
      type: gauge
      help: The time (in hundredths of a second) since the network management portion
        of the system was last re-initialized. - 1.3.6.1.2.1.1.3
    - name: sysContact
      oid: 1.3.6.1.2.1.1.4
      type: DisplayString
      help: The textual identification of the contact person for this managed node,
        together with information on how to contact this person - 1.3.6.1.2.1.1.4
    - name: sysName
      oid: 1.3.6.1.2.1.1.5
      type: DisplayString
      help: An administratively-assigned name for this managed node - 1.3.6.1.2.1.1.5
    - name: sysLocation
      oid: 1.3.6.1.2.1.1.6
      type: DisplayString
      help: The physical location of this node (e.g., 'telephone closet, 3rd floor')
        - 1.3.6.1.2.1.1.6
    - name: sysServices
      oid: 1.3.6.1.2.1.1.7
      type: gauge
      help: A value which indicates the set of services that this entity may potentially
        offer - 1.3.6.1.2.1.1.7
    - name: sysORLastChange
      oid: 1.3.6.1.2.1.1.8
      type: gauge
      help: The value of sysUpTime at the time of the most recent change in state
        or value of any instance of sysORID. - 1.3.6.1.2.1.1.8
    - name: sysORIndex
      oid: 1.3.6.1.2.1.1.9.1.1
      type: gauge
      help: The auxiliary variable used for identifying instances of the columnar
        objects in the sysORTable. - 1.3.6.1.2.1.1.9.1.1
      indexes:
      - labelname: sysORIndex
        type: gauge
    - name: sysORID
      oid: 1.3.6.1.2.1.1.9.1.2
      type: OctetString
      help: An authoritative identification of a capabilities statement with respect
        to various MIB modules supported by the local SNMP application acting as a
        command responder. - 1.3.6.1.2.1.1.9.1.2
      indexes:
      - labelname: sysORIndex
        type: gauge
    - name: sysORDescr
      oid: 1.3.6.1.2.1.1.9.1.3
      type: DisplayString
      help: A textual description of the capabilities identified by the corresponding
        instance of sysORID. - 1.3.6.1.2.1.1.9.1.3
      indexes:
      - labelname: sysORIndex
        type: gauge
    - name: sysORUpTime
      oid: 1.3.6.1.2.1.1.9.1.4
      type: gauge
      help: The value of sysUpTime at the time this conceptual row was last instantiated.
        - 1.3.6.1.2.1.1.9.1.4
      indexes:
      - labelname: sysORIndex
        type: gauge




and the tcpdump receive the anwer so why the snmp_exporter says "timeout after 3 retries" ? : 
tcpdump -i ens192 port 161 -l | grep   xx.xx.xx.xx
dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on ens192, link-type EN10MB (Ethernet), snapshot length 262144 bytes
10:48:02.349225 IP prometheus01.com.54725 >   xx.xx.xx.xx  .snmp:  F=r U="" E= C="" GetRequest(14)
10:48:02.356737 IP xx.xx.xx.xx.snmp > prometheus01.com.54725:  F= U="" E=_80_00_00_09_03_00_68_e5_9e_b2_94_80 C="" Report(33)  S:snmpUsmMIB.usmMIBObjects.usmStats.usmStatsUnknownEngineIDs.0=1140991
10:48:02.356932 IP prometheus01.com.54725 >   xx.xx.xx.xx  .snmp:  F=apr U="user" [!scoped PDU]a3_f0_93_62_b2_02_47_2b_41_f5_a2_bc_16_97_ef_9f_2b_1c_bd_2e_5c_7b_d8_75_84_0a_de_bd_ff_02_49_dc_7c_2e_2d_3b_13_0f_fb_ea_e5_98_76_5e_b8_fb
10:48:02.365440 IP xx.xx.xx.xx.snmp > prometheus01.com.54725:  F=ap U="user" [!scoped PDU]0e_b5_6c_00_23_61_81_8c_82_2e_44_75_78_1f_54_5b_22_dc_fe_51_19_2e_1b_40_62_16_41_05_89_f0_d6_43_5d_af_97_25_1e_66_60_6b_bc_61_55_8f_bd_31_76_4b_62_ca_a8_77_26_fc_f8_e8_35_4f_77_9f_b7_25_6e_1d_7f_06_27_07_3e_bc_04_ee_13_d2_d3_43_ef_6f_04_fd_88_03_bb_37_f9_fe_75_9e_0b_c5_99_38_5d_64_af_d1_67_35_3c_99_ff_7b_cc_92_2d_67_10_13_11_3b_f5_85_bd_77_f2_7d_19_67_e8_7d_92_e5_9c_cd_ba_86_eb_d3_23_81_d0_09_e9_be_82_58_6a_e9_b3_c4_1a_78_09_1e_44_bb_f7_fd_23_ed_4c_14_23_f1_83_63_55_aa_b1_c7_ce_7d_b5_94_c9_e4_1a_f3_d9_42_dd_19_24_69_13_

Ben Kochie

unread,
Apr 16, 2024, 5:20:05 AM4/16/24
to Nicolas, Prometheus Users
I've got a new packet debugging option that I've been working on:
https://github.com/prometheus/snmp_exporter/pull/1157

Alexander Wilke

unread,
Apr 22, 2024, 1:15:20 PM4/22/24
to Prometheus Users
Is it possible the snmp.yml timeout is too Low?
Scrape_co fig hast 1m but logs say 5s. Maybe you should try and Set timeout: 50s in SNMP.yml, too and retries: 0.
Or retries: 3 and timeout: 15s.

Reply all
Reply to author
Forward
0 new messages