Issues with snmp_exporter 0.1.0

563 views
Skip to first unread message

Christina Carkner

unread,
Dec 15, 2016, 7:30:02 PM12/15/16
to Prometheus Users
Hi all -

I'm using snmp_exporter fine in one environment, but I'm having issues in another that I'm not really clear on what is causing the problem.

I prefix this with: I have successfully used snmpwalk and snmpget to return values to me at the command line for this OID from the prometheus server.

If I start the snmp_exporter and view the metrics page I see snmp_collection_durations_seconds being populated for the stats module I created. However, in /var/log/syslog (with debug level logging turned on) I get this:

Dec 16 00:15:04 prometheus01 snmp_exporter[36383]: time="2016-12-16T00:15:04Z" level=debug msg="Scraping target '192.168.1.45' with module 'default'" source="main.go:73"
Dec 16 00:15:04 prometheus01 snmp_exporter[36383]: time="2016-12-16T00:15:04Z" level=debug msg="Scrape of target '192.168.1.45' with module 'default' took 0.000257 seconds" source="main.go:84"
Dec 16 00:15:10 prometheus01 snmp_exporter[36383]: time="2016-12-16T00:15:10Z" level=debug msg="Scraping target '192.168.1.45' with module 'stats'" source="main.go:73"
Dec 16 00:15:10 prometheus01 snmp_exporter[36383]: time="2016-12-16T00:15:10Z" level=error msg="Error scraping target 192.168.1.45: Error walking target 192.168.1.45: Request timeout (after 3 retries)" source="collector.go:119"
Dec 16 00:15:10 prometheus01 snmp_exporter[36383]: time="2016-12-16T00:15:10Z" level=debug msg="Scrape of target '192.168.1.45' with module 'stats' took 60.000952 seconds" source="main.go:84"

Looking at the snmp_exporter page I see the same error that's being reported in syslog.

I cut the snmp.yml to just one node and one OID to try to figure this out:

# Default module: interface stats and uptime.
default:
  version: 2
  auth:
    community: foo

stats:
  walk:
    - 1.3.6.1.2.1.31.1.1.1
  metrics:
    - name: ifHCInOctets
      oid: 1.3.6.1.2.1.31.1.1.1.6.1

The snmp section of the prometheus.yml is similarly uncomplicated:

scrape_configs:
  - job_name: 'snmpstats'
    scrape_interval: 1m
    static_configs:
      - targets:
         - 192.168.1.45
    metrics_path: /snmp
    params:
     module: [stats]
    relabel_configs:
     - source_labels: [__address__]
       target_label: __param_target
     - source_labels: [__param_target]
       target_label: instance
     - target_label: __address__
       replacement: 127.0.0.1:9116

This returns instantly at the command line, so I am not sure how it's taking 60 seconds to respond to the exporter.  I'm not sure what I can do to resolve this issue. 

Any help or ideas would be appreciated.

Thanks!

Brian Brazil

unread,
Dec 15, 2016, 7:40:53 PM12/15/16
to Christina Carkner, Prometheus Users
On 16 December 2016 at 00:30, Christina Carkner <ccar...@gmail.com> wrote:
Hi all -

I'm using snmp_exporter fine in one environment, but I'm having issues in another that I'm not really clear on what is causing the problem.

I prefix this with: I have successfully used snmpwalk and snmpget to return values to me at the command line for this OID from the prometheus server.

What about snmpbulkwalk with -Cr25? That's what the snmp_exporter does by default and if that doesn't work either the device only supports v1 or it's buggy.

Brian
 

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/d18b623a-69a1-4bd9-83d0-7af703db2de0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--

Christina Carkner

unread,
Dec 15, 2016, 9:17:22 PM12/15/16
to Prometheus Users, ccar...@gmail.com
snmpbulkwalk -Cr25 returned quickly from the target.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To post to this group, send email to promethe...@googlegroups.com.



--

Brian Brazil

unread,
Dec 16, 2016, 4:46:57 AM12/16/16
to Christina Carkner, Prometheus Users
On 16 December 2016 at 02:17, Christina Carkner <ccar...@gmail.com> wrote:
snmpbulkwalk -Cr25 returned quickly from the target.

Are you sure you have all the credentials correct?

If that still all looks okay, try a tcpdump.

Brian
 
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/45fbbe97-3ace-454f-80e3-ad65b373ff20%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--

Christina Carkner

unread,
Dec 17, 2016, 6:03:17 PM12/17/16
to Prometheus Users, ccar...@gmail.com
It turns out with this config:

# Default module: interface stats and uptime.
default:
  version: 2
  auth:
    community: foo

stats:
  walk:
    - 1.3.6.1.2.1.31.1.1.1
  metrics:
    - name: ifHCInOctets
      oid: 1.3.6.1.2.1.31.1.1.1.6.1

It wasn't inheriting the community string from default, and trying public, which is the real default. I'd presumed that default meant "default values for these fields for everything else in the file" so that's on me.

I also noticed that in prometheus.yml, when I added a scrape_interval to the snmp job, it didn't honor it. The plugin was using the value of 5m I had set in the global block at the top.

It's now fetching data, but the Prometheus Targets page shows:
http://127.0.0.1:9116/snmp -> DOWN -> (no labels) -> 12.204s ago -> server returned HTTP status 500 Internal Server Error

The SNMP exporter page shows this for every metric configured when I hit it with Chrome:

 

An error has occurred during metrics gathering:

 

102 error(s) occurred:

* collected metric ifStatus untyped:<value:1 > was collected before with the same name and label values


Which is confusing, because when I reload it a little bit later, the value changed to 6.  

For those who are SNMP impaired like myself, I had to basically follow the pattern given in the default snmp.yml for my stats to make it work by making sure each interface had a unique name.

  metrics:
    - name: ifNumber
      oid: 1.3.6.1.2.1.2.1
      indexes:
        - labelname: ifDescr
          type: Integer32
      lookups:
        - labels: [ifDescr]
          labelname: ifDescr
          oid: 1.3.6.1.2.1.2.2.1.2

Until I did that absolutely no results from any scrape, even if it worked correctly, made it into Prometheus. 



--

Brian Brazil

unread,
Dec 17, 2016, 6:11:59 PM12/17/16
to Christina Carkner, Prometheus Users
On 17 December 2016 at 23:03, Christina Carkner <ccar...@gmail.com> wrote:
It turns out with this config:

# Default module: interface stats and uptime.
default:
  version: 2
  auth:
    community: foo

stats:
  walk:
    - 1.3.6.1.2.1.31.1.1.1
  metrics:
    - name: ifHCInOctets
      oid: 1.3.6.1.2.1.31.1.1.1.6.1

It wasn't inheriting the community string from default, and trying public, which is the real default. I'd presumed that default meant "default values for these fields for everything else in the file" so that's on me.

I also noticed that in prometheus.yml, when I added a scrape_interval to the snmp job, it didn't honor it. The plugin was using the value of 5m I had set in the global block at the top.

I'm not able to reproduce this, did you ask Prometheus to reload its config?

Brian
 
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/f1e8f4d7-1c7f-4527-b66f-9672dec2cbf7%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--

Christina Carkner

unread,
Dec 17, 2016, 6:24:27 PM12/17/16
to Prometheus Users, ccar...@gmail.com
I changed settings and restarted both snmp_exporter and prometheus repeatedly over the course of more than an hour. Then when I merged everything into one block it worked.

I finally caught it with this:

17:12:54.711120 IP 192.168.123.17.58767 > dest-server.snmp:  GetBulk(26)  N=0 M=25 interfaces
17:12:57.936334 IP 192.168.123.17.55764 > dest-server.snmp:  C=foo GetBulk(29)  N=0 M=25 31.1.1.1

First run is snmp_exporter, with no community. Second is me, manually from the command line defining the community.



--
Reply all
Reply to author
Forward
0 new messages