Prometheus and number of targets for snmp exporter

5,825 views
Skip to first unread message

luce...@gmail.com

unread,
Mar 21, 2017, 2:41:15 PM3/21/17
to Prometheus Users

Targets are cisco routers.

Prometheus.yml

scrape_configs:
  - job_name: 'snmp'
    scrape_interval: 60s
    scrape_timeout: 60s
    file_sd_configs:
        - files :
          - /etc/prometheus/targets.yml
    metrics_path: /snmp
    params:
      module: [default] #which OID's we will be querying in

    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: 195.x.x.x.x:9117

Max 115 targets can be put in de target file.
The problem is not memory but CPU which become very high.

If we put 500 targets then snmp_exporter blocks.

Question,

 Prometheus design is based on polling (right?) which can be heavy is there are a lot of devices.
 

We are using grafana as dashboard.
Prometheus , snmp_exporter and Grafana are running in three separated docker containers.
Server: Ubuntu , Memory 250Gb and cpu numbers = 55.

Julius Volz

unread,
Mar 21, 2017, 3:33:23 PM3/21/17
to luce...@gmail.com, Prometheus Users
On Tue, Mar 21, 2017 at 7:41 PM, <luce...@gmail.com> wrote:

Targets are cisco routers.

Prometheus.yml

scrape_configs:
  - job_name: 'snmp'
    scrape_interval: 60s
    scrape_timeout: 60s
    file_sd_configs:
        - files :
          - /etc/prometheus/targets.yml
    metrics_path: /snmp
    params:
      module: [default] #which OID's we will be querying in

    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: 195.x.x.x.x:9117

Max 115 targets can be put in de target file.
The problem is not memory but CPU which become very high.

If we put 500 targets then snmp_exporter blocks.

So you mean the CPU becomes high in the SNMP exporter, not in Prometheus? Note that the SNMP exporter is completely stateless, so you can horizontally shard it easily with a load balancer in front if that really becomes a problem.
 

Question,

 Prometheus design is based on polling (right?) which can be heavy is there are a lot of devices.
 

We are using grafana as dashboard.
Prometheus , snmp_exporter and Grafana are running in three separated docker containers.
Server: Ubuntu , Memory 250Gb and cpu numbers = 55.

Yes, Prometheus is based on pulling data instead of getting data pushed to it. Scalability-wise, that makes little difference though (see also https://prometheus.io/blog/2016/07/23/pull-does-not-scale-or-does-it/).

SNMP in particular is poll-based anyways, even if you used a push-based monitoring system.

Cheers,
Julius 

luce...@gmail.com

unread,
Mar 22, 2017, 4:51:00 AM3/22/17
to Prometheus Users, luce...@gmail.com
  
--->   So you mean the CPU becomes high in the SNMP exporter, not in Prometheus?
We used 3 docker containers on the same machine. (Prometheus container, SNMP exporter container and Grafana container)
SNMP exporter caused the high cpu load.

I'm still disappointed in the low number of snmp targets that can be handled per SNMP exporter because Go is a very powerful language.
What if we like to poll 500.000 routers ?





Op dinsdag 21 maart 2017 19:41:15 UTC+1 schreef luce...@gmail.com:

Brian Brazil

unread,
Mar 22, 2017, 5:11:33 AM3/22/17
to luce...@gmail.com, Prometheus Users
On 22 March 2017 at 08:50, <luce...@gmail.com> wrote:
  
--->   So you mean the CPU becomes high in the SNMP exporter, not in Prometheus?
We used 3 docker containers on the same machine. (Prometheus container, SNMP exporter container and Grafana container)
SNMP exporter caused the high cpu load.

What version of the snmp exporter are you using?
 

I'm still disappointed in the low number of snmp targets that can be handled per SNMP exporter because Go is a very powerful language.
What if we like to poll 500.000 routers ?

A single Prometheus wouldn't be able to handle that.

Brian
 

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/dd3a4a55-3dd1-4fea-a5d1-736a60aef393%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--

Luc Evers

unread,
Mar 22, 2017, 9:24:17 AM3/22/17
to Prometheus Users
  Brian,

    snmp_exporter version 0.2.0




Luc Evers

unread,
Mar 22, 2017, 9:26:06 AM3/22/17
to Prometheus Users
snmp_exporter 0.3.0   is not working with Prometheus. ( snmp_exporter 3.0 container crashed if we reload prometheus container)

Brian Brazil

unread,
Mar 22, 2017, 9:29:52 AM3/22/17
to Luc Evers, Prometheus Users
On 22 March 2017 at 13:23, Luc Evers <luce...@gmail.com> wrote:
  Brian,

    snmp_exporter version 0.2.0


There are no known performance issues with that version. How many oids are you trying to scrape?

Brian
 



--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--

luce...@gmail.com

unread,
Mar 23, 2017, 5:16:25 AM3/23/17
to Prometheus Users, luce...@gmail.com
  Brian,

    We also thought in this direction and cleaned the file to a minumum.
    So we have  74 OID's, which we cannot reduce.

    Messages if the list is too long:

time="2017-03-23T08:36:36Z" level=error msg="Error scraping target a.b.c.d: Error walking target a.b.c.d: Request timeout (after 1 retries)" source="collector.go:125"

We also test the targets with errors on a small list and then there are no errors.


With snmp-poller v0.30  we have problems

docker run --name snmp-switch -p 9117:9116 -v /docker/docker-volumes/snmp/snmp_exporter/:/etc/snmp_exporter/ a81b7148413b
time="2017-03-23T07:56:05Z" level=info msg="Starting snmp exporter (version=0.3.0, branch=master, revision=6f8aa8a24d720b36991f29ffb179b2896e92090b)" source="main.go:99"
time="2017-03-23T07:56:05Z" level=info msg="Build context (go=go1.7.5, user=x@y, date=20170315-16:01:54)" source="main.go:100"
time="2017-03-23T07:56:05Z" level=info msg="Listening on :9116" source="main.go:114"
time="2017-03-23T07:57:14Z" level=fatal msg="Unknown index type string" source="collector.go:355"
 



   When we use 150 cpe (+/-)  and start extending the list with cpe's, we see that we loose other cpe results.
   What I like to tell is that some cpe which were working before , are not working any more after adding additional cpe's. 

 I expected 10.000 cpe's / per snmp-exporter.



Op woensdag 22 maart 2017 14:29:52 UTC+1 schreef Brian Brazil:
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To post to this group, send email to promethe...@googlegroups.com.



--

Brian Brazil

unread,
Mar 23, 2017, 6:44:38 AM3/23/17
to Luc Evers, Prometheus Users
On 23 March 2017 at 09:16, <luce...@gmail.com> wrote:
  Brian,

    We also thought in this direction and cleaned the file to a minumum.
    So we have  74 OID's, which we cannot reduce.

    Messages if the list is too long:

time="2017-03-23T08:36:36Z" level=error msg="Error scraping target a.b.c.d: Error walking target a.b.c.d: Request timeout (after 1 retries)" source="collector.go:125"

We also test the targets with errors on a small list and then there are no errors.

That's a timeout, you need to increase it.
 


With snmp-poller v0.30  we have problems

docker run --name snmp-switch -p 9117:9116 -v /docker/docker-volumes/snmp/snmp_exporter/:/etc/snmp_exporter/ a81b7148413b
time="2017-03-23T07:56:05Z" level=info msg="Starting snmp exporter (version=0.3.0, branch=master, revision=6f8aa8a24d720b36991f29ffb179b2896e92090b)" source="main.go:99"
time="2017-03-23T07:56:05Z" level=info msg="Build context (go=go1.7.5, user=x@y, date=20170315-16:01:54)" source="main.go:100"
time="2017-03-23T07:56:05Z" level=info msg="Listening on :9116" source="main.go:114"
time="2017-03-23T07:57:14Z" level=fatal msg="Unknown index type string" source="collector.go:355"

You need to re-run the config generator.


   When we use 150 cpe (+/-)  and start extending the list with cpe's, we see that we loose other cpe results.
   What I like to tell is that some cpe which were working before , are not working any more after adding additional cpe's. 

 I expected 10.000 cpe's / per snmp-exporter.

Based on a 60s scrape interval, that's 185 OID/s. You should not be having problems at that level, as that's a normal level of data from a single device.

I suspect something else is going on. Have you checked for anything odd with tcpdump?

Brian

 
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/cdebdf82-3ac2-46fe-bbee-3419ebd6caaa%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--

luce...@gmail.com

unread,
Mar 28, 2017, 4:07:38 AM3/28/17
to Prometheus Users, luce...@gmail.com
       Actions and results

1) Upgrade to snmp_exporter V0.3 by changing the snmp.yml file. (Variable were different!)
2) Test with 1 OID , Scrap time:  4 minutes.
     We see no problems with 6300 targets (routers) can be even more.
3) Test with 79 OID's and 500 targets (routers) Scrap time:  4 minutes. Problem:  snmp_exporter still works but becomes very slow. Snmp_exporter answer duration > 1 minute which means that Prometheus goes in time-out.
Snmp_exporter consumes also a lot of cpu's.

A router can return about 52 lines of text!

Rem
    We are using snmpV3.
    Test with tcpdump gives no real results, no problems seen. 


Op donderdag 23 maart 2017 11:44:38 UTC+1 schreef Brian Brazil:



--

Brian Brazil

unread,
Mar 28, 2017, 7:08:11 AM3/28/17
to Luc Evers, Prometheus Users
On 28 March 2017 at 09:07, <luce...@gmail.com> wrote:
       Actions and results

1) Upgrade to snmp_exporter V0.3 by changing the snmp.yml file. (Variable were different!)
2) Test with 1 OID , Scrap time:  4 minutes.
     We see no problems with 6300 targets (routers) can be even more.
3) Test with 79 OID's and 500 targets (routers) Scrap time:  4 minutes. Problem:  snmp_exporter still works but becomes very slow. Snmp_exporter answer duration > 1 minute which means that Prometheus goes in time-out.
Snmp_exporter consumes also a lot of cpu's.

How much CPU exactly? What type of processor are you using?

These numbers aren't really adding up in terms of how we know the snmp exporter performs.
 

A router can return about 52 lines of text!

That's small in Prometheus terms. Hundreds to thousands of time series per target is usual.

Brian
 
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/804ad474-e54f-4a40-bd75-d04f400dff88%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--

luce...@gmail.com

unread,
Mar 28, 2017, 9:32:23 AM3/28/17
to Prometheus Users, luce...@gmail.com
  The tests are done on a coreos linux with 16 processors
  Intel(R) Xeon(R) CPU   E5620 @ 2.40GHz

   But the same test with 500 routers and 79 OIDs with the same results were also done on:

   Linux Ubuntu:  56 processors  Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz
                           


  


Op dinsdag 28 maart 2017 13:08:11 UTC+2 schreef Brian Brazil:



--

Luc Evers

unread,
Apr 11, 2017, 4:55:58 AM4/11/17
to Prometheus Users, luce...@gmail.com
   Info 

     Only 175 targets  and we see time-outs in the snmp-exporter loggings.
     Error scraping target ...
     Source:  collector.go 126
     

     Memory is oke but cpu usages is too high.




Op dinsdag 28 maart 2017 15:32:23 UTC+2 schreef Luc Evers:

janni...@gmail.com

unread,
Apr 12, 2017, 5:48:13 AM4/12/17
to Prometheus Users
Hi

I'm new to the theam of Monitoring.
Can you send me yopur configs (Prometheus.yml,snmp.yml, /etc/prometheus/targets.yml).
So that I can orientate me form yout confs.

thank you in advance
best regarts
Jannik

Luc Evers

unread,
Apr 14, 2017, 4:52:32 AM4/14/17
to Prometheus Users, janni...@gmail.com
The requested information.

We are using grafana as dashboard. 
Targets can be added via grafana.
Targets IP are not send because of our company policy.
Of course you are welcome for testing with our team.
 


cat targets.yml 
- targets:
    - a1.b1.c1.d1
    - a2.b2.c2.d2
  labels:
    job: snmp

cat prometheus.yml
scrape_configs:
  - job_name: 'snmp'
    scrape_interval: 60s
    scrape_timeout: 60s
    file_sd_configs:
        - files :
          - /etc/prometheus/targets.yml
    metrics_path: /snmp
    params:
      module: [default] #which OID's we will be querying in

    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: 1.2.3.4:9117


cat snmp.yml
# Default module: interface stats and uptime.
default:
  version: 3
  auth:
    username: xxxxxxxxx
    password: yyyyyyyyyyyyyyyyy
    auth_protocol: qsdf
    priv_protocol: abc
    security_level: authPriv
    priv_password: zzzzzzzzzzzz
    
  walk:
    - 1.3.6.1.2.1.1.3
    - 1.3.6.1.2.1.1.5
    - 1.3.6.1.2.1.1.6
    - 1.3.6.1.6.3.10.2.1
    - 1.3.6.1.2.1.2
    - 1.3.6.1.2.1.31.1.1.1
    - 1.3.6.1.4.1.9.2.1
    - 1.3.6.1.4.1.9.9.109.1.1.1.1
    - 1.3.6.1.4.1.9.9.48.1.1.1
    - 1.3.6.1.4.1.9.9.166.1.6
    - 1.3.6.1.2.1.4.31.3  
  metrics:
    - name: mem5mimUsed
      oid: 1.3.6.1.4.1.9.9.48.1.1.1.5.1
      type: gauge
    - name: memFree
      oid: 1.3.6.1.4.1.9.9.48.1.1.1.6.1
      type: gauge
#    - name: ipIfStatsIfIndex
#      oid: 1.3.6.1.2.1.4.31.3.1.2
#    - name: hrSystem
#      oid: 1.3.6.1.2.1.25.1
#    - name: sysDescr
#      oid: 1.3.6.1.6.3.10.2.1.1
#      type: gauge
#    - name: sysObjectID
#      oid: 1.3.6.1.2.1.1.2
    - name: sysUpTime
      oid: 1.3.6.1.2.1.1.3
      type: gauge
#    - name: sysContact
#      oid: 1.3.6.1.2.1.1.4
#      type: string
    - name: sysName
      oid: 1.3.6.1.2.1.1.5
      type: DisplayString
    - name: sysLocation
      oid: 1.3.6.1.2.1.1.6
      type: DisplayString
#    - name: sysServices
#      oid: 1.3.6.1.2.1.1.7
#      type: displaystring 
    - name: ifNumber
      oid: 1.3.6.1.2.1.2.1
      type: gauge 
    - name: ifEntry
      oid: 1.3.6.1.2.2.2.1
      type: DisplayString
    - name: ifIndex
      oid: 1.3.6.1.2.1.2.2.1.1
      type: gauge
      indexes:
        - labelname: ifDescr
          type: gauge
      lookups:
        - labels: [ifDescr]
          labelname: ifDescr
          oid: 1.3.6.1.2.1.2.2.1.2
          type: DisplayString
    - name: ifDescr
      oid: 1.3.6.1.2.1.2.2.1.2
      type: gauge
      indexes:
        - labelname: ifDescr
          type: gauge
      lookups:
        - labels: [ifDescr]
          labelname: ifDescr
          oid: 1.3.6.1.2.1.2.2.1.2
          type: DisplayString
    - name: ifType
      oid: 1.3.6.1.2.1.2.2.1.3
      type: gauge
      indexes:
        - labelname: ifDescr
          type: gauge
      lookups:
        - labels: [ifDescr]
          labelname: ifDescr
          oid: 1.3.6.1.2.1.2.2.1.2
          type: DisplayString
    - name: ifMtu
      oid: 1.3.6.1.2.1.2.2.1.4
      type: gauge
      indexes:
        - labelname: ifDescr
          type: gauge
      lookups:
        - labels: [ifDescr]
          labelname: ifDescr
          oid: 1.3.6.1.2.1.2.2.1.2
          type: DisplayString
    - name: ifSpeed
      oid: 1.3.6.1.2.1.2.2.1.5
      type: gauge
      indexes:
        - labelname: ifDescr
          type: gauge
      lookups:
        - labels: [ifDescr]
          labelname: ifDescr
          oid: 1.3.6.1.2.1.2.2.1.2
          type: DisplayString
#    - name: ifPhysAddress
#      oid: 1.3.6.1.2.1.2.2.1.6
#      indexes:
#        - labelname: ifDescr
#          type: gauge
#      lookups:
#        - labels: [ifDescr]
#          labelname: ifDescr
#          oid: 1.3.6.1.2.1.2.2.1.2
    - name: ifAdminStatus
      oid: 1.3.6.1.2.1.2.2.1.7
      type: gauge
      indexes:
        - labelname: ifDescr
          type: gauge
      lookups:
        - labels: [ifDescr]
          labelname: ifDescr
          oid: 1.3.6.1.2.1.2.2.1.2
          type: DisplayString
    - name: ifOperStatus
      oid: 1.3.6.1.2.1.2.2.1.8
      type: gauge
      indexes:
        - labelname: ifDescr
          type: gauge
      lookups:
        - labels: [ifDescr]
          labelname: ifDescr
          oid: 1.3.6.1.2.1.2.2.1.2
          type: DisplayString
    - name: ifLastChange
      oid: 1.3.6.1.2.1.2.2.1.9
      type : gauge
      indexes:
        - labelname: ifDescr
          type: gauge
      lookups:
        - labels: [ifDescr]
          labelname: ifDescr
          oid: 1.3.6.1.2.1.2.2.1.2
          type: DisplayString
    - name: ifInOctets
      oid: 1.3.6.1.2.1.2.2.1.10
      type: gauge
      indexes:
        - labelname: ifDescr
          type: gauge
      lookups:
        - labels: [ifDescr]
          labelname: ifDescr
          oid: 1.3.6.1.2.1.2.2.1.2
          type: DisplayString
    - name: ifInUcastPkts
      oid: 1.3.6.1.2.1.2.2.1.11
      type: gauge
      indexes:
        - labelname: ifDescr
          type: gauge
      lookups:
        - labels: [ifDescr]
          labelname: ifDescr
          oid: 1.3.6.1.2.1.2.2.1.2
          type: DisplayString
    - name: ifInNUcastPkts
      oid: 1.3.6.1.2.1.2.2.1.12
      type: gauge
      indexes:
        - labelname: ifDescr
          type: gauge
      lookups:
        - labels: [ifDescr]
          labelname: ifDescr
          oid: 1.3.6.1.2.1.2.2.1.2
          type: DisplayString
    - name: ifInDiscards
      oid: 1.3.6.1.2.1.2.2.1.13
      type: gauge
      indexes:
        - labelname: ifDescr
          type: gauge
      lookups:
        - labels: [ifDescr]
          labelname: ifDescr
          oid: 1.3.6.1.2.1.2.2.1.2
          type: DisplayString
    - name: ifInErrors
      oid: 1.3.6.1.2.1.2.2.1.14
      type: gauge
      indexes:
        - labelname: ifDescr
          type: Integer
      lookups:
        - labels: [ifDescr]
          labelname: ifDescr
          oid: 1.3.6.1.2.1.2.2.1.2
          type: DisplayString
    - name: ifInUnknownProtos
      oid: 1.3.6.1.2.1.2.2.1.15
      type: gauge
      indexes:
        - labelname: ifDescr
          type: gauge
      lookups:
        - labels: [ifDescr]
          labelname: ifDescr
          oid: 1.3.6.1.2.1.2.2.1.2
          type: DisplayString
    - name: ifOutOctets
      oid: 1.3.6.1.2.1.2.2.1.16
      type: gauge
      indexes:
        - labelname: ifDescr
          type: gauge
      lookups:
        - labels: [ifDescr]
          labelname: ifDescr
          oid: 1.3.6.1.2.1.2.2.1.2
          type: DisplayString
    - name: ifOutUcastPkts
      oid: 1.3.6.1.2.1.2.2.1.17
      type: gauge
      indexes:
        - labelname: ifDescr
          type: gauge
      lookups:
        - labels: [ifDescr]
          labelname: ifDescr
          oid: 1.3.6.1.2.1.2.2.1.2
          type: DisplayString
    - name: ifOutNUcastPkts
      oid: 1.3.6.1.2.1.2.2.1.18
      type: gauge
      indexes:
        - labelname: ifDescr
          type: gauge
      lookups:
        - labels: [ifDescr]
          labelname: ifDescr
          oid: 1.3.6.1.2.1.2.2.1.2
          type: DisplayString
    - name: ifOutDiscards
      oid: 1.3.6.1.2.1.2.2.1.19
      type: gauge
      indexes:
        - labelname: ifDescr
          type: gauge
      lookups:
        - labels: [ifDescr]
          labelname: ifDescr
          oid: 1.3.6.1.2.1.2.2.1.2
          type: DisplayString
    - name: ifOutErrors
      oid: 1.3.6.1.2.1.2.2.1.20
      type: gauge
      indexes:
        - labelname: ifDescr
          type: gauge
      lookups:
        - labels: [ifDescr]
          labelname: ifDescr
          oid: 1.3.6.1.2.1.2.2.1.2
          type: DisplayString
    - name: ifOutQLen
      oid: 1.3.6.1.2.1.2.2.1.21
      type: gauge
      indexes:
        - labelname: ifDescr
          type: gauge
      lookups:
        - labels: [ifDescr]
          labelname: ifDescr
          oid: 1.3.6.1.2.1.2.2.1.2
          type: DisplayString
    - name: ifSpecific
      oid: 1.3.6.1.2.1.2.2.1.22
      type: gauge
      indexes:
        - labelname: ifDescr
          type: gauge
      lookups:
        - labels: [ifDescr]
          labelname: ifDescr
          oid: 1.3.6.1.2.1.2.2.1.2
          type: DisplayString
#    - name: ciscoMemoryPoolType
#      oid: 1.3.6.1.4.1.9.9.48.1.1.1.1
#    - name: ciscoMemoryPoolName
#      oid: 1.3.6.1.4.1.9.9.48.1.1.1.2
#    - name: ciscoMemoryPoolAlternate
#      oid: 1.3.6.1.4.1.9.9.48.1.1.1.3
#    - name: ciscoMemoryPoolValid
#      oid: 1.3.6.1.4.1.9.9.48.1.1.1.4
#    - name: ciscoMemoryPoolUsed
#      oid: 1.3.6.1.4.1.9.9.48.1.1.1.5
#    - name: ciscoMemoryPoolFree
#      oid: 1.3.6.1.4.1.9.9.48.1.1.1.6
#    - name: ciscoMemoryPoolLargestFree
#      oid: 1.3.6.1.4.1.9.9.48.1.1.1.7
    - name: ciscoMemoryPoolUtilization1Min
      oid: 1.3.6.1.4.1.9.9.48.1.2.1.1
      type: gauge
#    - name: ciscoMemoryPoolUtilization5Min
#      oid: 1.3.6.1.4.1.9.9.48.1.2.1.2
#    - name: ciscoMemoryPoolUtilization10Min
#      oid: 1.3.6.1.4.1.9.9.48.1.2.1.3
    - name: cpmCPUTotal5minRev
      oid: 1.3.6.1.4.1.9.9.109.1.1.1.1.8
      type: gauge
    - name: cpmCPUTotal1minRev
      oid: 1.3.6.1.4.1.9.9.109.1.1.1.1.7
      type: gauge
    - name: cpmCPUTotal5secRev
      oid: 1.3.6.1.4.1.9.9.109.1.1.1.1.6
      type: gauge
    - name: cpmCPUTotal5min
      oid: 1.3.6.1.4.1.9.9.109.1.1.1.1.5
      type: gauge
    - name: cpmCPUTotal1min
      oid: 1.3.6.1.4.1.9.9.109.1.1.1.1.4
      type: gauge
    - name: cpmCPUTotal5sec
      oid: 1.3.6.1.4.1.9.9.109.1.1.1.1.3
      type: gauge
    - name: avgBusy5
      oid: 1.3.6.1.4.1.9.2.1.58
      type: gauge
    - name: avgBusy1
      oid: 1.3.6.1.4.1.9.2.1.57
      type: gauge
    - name: busyPer
      oid: 1.3.6.1.4.1.9.2.1.56
      type: gauge
    - name: ifInMulticastPkts
      oid: 1.3.6.1.2.1.31.1.1.1.2
      type: gauge
      indexes:
        - labelname: ifDescr
          type: gauge
      lookups:
        - labels: [ifDescr]
          labelname: ifDescr
          oid: 1.3.6.1.2.1.2.2.1.2
          type: DisplayString
    - name: ifInBroadCastPkts
      oid: 1.3.6.1.2.1.31.1.1.1.3
      type: gauge
      indexes:
        - labelname: ifDescr
          type: gauge
      lookups:
        - labels: [ifDescr]
          labelname: ifDescr
          oid: 1.3.6.1.2.1.2.2.1.2
          type: DisplayString
    - name: ifOutMulticastPkts
      oid: 1.3.6.1.2.1.31.1.1.1.4
      type: gauge
      indexes:
        - labelname: ifDescr
          type: gauge
      lookups:
        - labels: [ifDescr]
          labelname: ifDescr
          oid: 1.3.6.1.2.1.2.2.1.2
          type: DisplayString
    - name: ifOutBroadCastPkts
      oid: 1.3.6.1.2.1.31.1.1.1.5
      type: gauge
      indexes:
        - labelname: ifDescr
          type: gauge
      lookups:
        - labels: [ifDescr]
          labelname: ifDescr
          oid: 1.3.6.1.2.1.2.2.1.2
          type: DisplayString
    - name: ifHCInOctets
      oid: 1.3.6.1.2.1.31.1.1.1.6
      type: gauge
      indexes:
        - labelname: ifDescr
          type: gauge
      lookups:
        - labels: [ifDescr]
          labelname: ifDescr
          oid: 1.3.6.1.2.1.2.2.1.2
          type: DisplayString
    - name: ifHCInUcastPkts
      oid: 1.3.6.1.2.1.31.1.1.1.7
      type: gauge
      indexes:
        - labelname: ifDescr
          type: gauge
      lookups:
        - labels: [ifDescr]
          labelname: ifDescr
          oid: 1.3.6.1.2.1.2.2.1.2
          type: DisplayString
    - name: ifHCInBroadcastPkts
      oid: 1.3.6.1.2.1.31.1.1.1.8
      type: gauge
      indexes:
        - labelname: ifDescr
          type: gauge
      lookups:
        - labels: [ifDescr]
          labelname: ifDescr
          oid: 1.3.6.1.2.1.2.2.1.2
          type: DisplayString
    - name: ifHCInMulticastPkts
      oid: 1.3.6.1.2.1.31.1.1.1.9
      type: gauge
      indexes:
        - labelname: ifDescr
          type: gauge
      lookups:
        - labels: [ifDescr]
          labelname: ifDescr
          oid: 1.3.6.1.2.1.2.2.1.2
          type: DisplayString
    - name: ifHCOutOctets
      oid: 1.3.6.1.2.1.31.1.1.1.10
      type: gauge
      indexes:
        - labelname: ifDescr
          type: gauge
      lookups:
        - labels: [ifDescr]
          labelname: ifDescr
          oid: 1.3.6.1.2.1.2.2.1.2
          type: DisplayString
    - name: ifHCOutUcastPkts
      oid: 1.3.6.1.2.1.31.1.1.1.11
      type: gauge
      indexes:
        - labelname: ifDescr
          type: gauge
      lookups:
        - labels: [ifDescr]
          labelname: ifDescr
          oid: 1.3.6.1.2.1.2.2.1.2
          type: DisplayString
    - name: ifHCOutMulticastPkts
      oid: .1.3.6.1.2.1.31.1.1.1.12
      type: gauge
      indexes:
        - labelname: ifDescr
          type: gauge
      lookups:
        - labels: [ifDescr]
          labelname: ifDescr
          oid: 1.3.6.1.2.1.2.2.1.2
          type: DisplayString
    - name: ifHCOutBroadcastPkts
      oid: 1.3.6.1.2.1.31.1.1.1.13
      type: gauge
      indexes:
        - labelname: ifDescr
          type: gauge
      lookups:
        - labels: [ifDescr]
          labelname: ifDescr
          oid: 1.3.6.1.2.1.2.2.1.2
          type: DisplayString
    - name: ifLinkUpDownTrapEnable
      oid: 1.3.6.1.2.1.31.1.1.1.14
      type: gauge
      indexes:
        - labelname: ifDescr
          type: gauge
      lookups:
        - labels: [ifDescr]
          labelname: ifDescr
          oid: 1.3.6.1.2.1.2.2.1.2
          type: DisplayString
    - name: ifHighSpeed
      oid: 1.3.6.1.2.1.31.1.1.1.15
      type: gauge
      indexes:
        - labelname: ifDescr
          type: gauge
      lookups:
        - labels: [ifDescr]
          labelname: ifDescr
          oid: 1.3.6.1.2.1.2.2.1.2
          type: DisplayString
    - name: ifPromiscuousMode
      oid: 1.3.6.1.2.1.31.1.1.1.16
      type: gauge
      indexes:
        - labelname: ifDescr
          type: gauge
      lookups:
        - labels: [ifDescr]
          labelname: ifDescr
          oid: 1.3.6.1.2.1.2.2.1.2
          type: DisplayString
    - name: ifConnectorPresent
      oid: 1.3.6.1.2.1.31.1.1.1.17
      type: gauge
      indexes:
        - labelname: ifDescr
          type: gauge
      lookups:
        - labels: [ifDescr]
          labelname: ifDescr
          oid: 1.3.6.1.2.1.2.2.1.2
          type: DisplayString
    - name: ifAlias
      oid: 1.3.6.1.2.1.31.1.1.1.18
      type: gauge
      indexes:
        - labelname: ifDescr
          type: gauge
      lookups:
        - labels: [ifDescr]
          labelname: ifDescr
          oid: 1.3.6.1.2.1.2.2.1.2
          type: DisplayString
    - name: ifCounterDiscontinuityTime
      oid: 1.3.6.1.2.1.31.1.1.1.19
      type: gauge
      indexes:
        - labelname: ifDescr
          type: gauge
      lookups:
        - labels: [ifDescr]
          labelname: ifDescr
          oid: 1.3.6.1.2.1.2.2.1.2
          type: DisplayString



Op woensdag 12 april 2017 11:48:13 UTC+2 schreef janni...@gmail.com:

janni...@gmail.com

unread,
Apr 17, 2017, 4:24:57 PM4/17/17
to Prometheus Users
Hi 


Can you answer me the following questions:
Which OIDs do i have to put in the"generator.yml" ?
Have i to create for each type of device ?
I have two Zyxel Switches. Is it normal that I don't can get a "Current link Speed", instead i just can get a "Counter" of Bits.
How do i get the kbit/s to present in grafana ?

Luc Evers

unread,
Apr 19, 2017, 3:57:20 AM4/19/17
to Jannik Tom Züllig, Prometheus Users
  Which OIDs do i have to put in the"generator.yml" ?  The one I send before == Default
Have i to create for each type of device ? The module default is used only for Cisco routers
I have two Zyxel Switches. Is it normal that I don't can get a "Current link Speed", instead I just can get a "Counter" of Bits.  I don't know this type of device, but you cannot simulate the problem with one OID. We think that the high CPU load is caused by the number of OID's.
How do i get the kbit/s to present in grafana ? We send you our Grafana dashboard, see attach

--
You received this message because you are subscribed to a topic in the Google Groups "Prometheus Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/prometheus-users/T4nTkUs7iGY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/edd17d0c-75b0-477a-a56f-aa90c28f91d8%40googlegroups.com.
Dashboard RPC Repair-1492585507679.json

Luc Evers

unread,
Apr 28, 2017, 7:44:24 AM4/28/17
to Prometheus Users, janni...@gmail.com
     Hi,

        What is the progress of this problem, seems to be a bug ?



Op woensdag 19 april 2017 09:57:20 UTC+2 schreef Luc Evers:

janni...@gmail.com

unread,
Apr 28, 2017, 11:24:36 AM4/28/17
to Prometheus Users, janni...@gmail.com
Hi 

Thanks for your help. I`ve now understood it.

I just have to create a personal Dashboard witch Grafana.

Have a nice Weekend:

Am Freitag, 28. April 2017 13:44:24 UTC+2 schrieb Luc Evers:
     Hi,

        What is the progress of this problem, seems to be a bug ?



Op woensdag 19 april 2017 09:57:20 UTC+2 schreef Luc Evers:
  Which OIDs do i have to put in the"generator.yml" ?  The one I send before == Default
Have i to create for each type of device ? The module default is used only for Cisco routers
I have two Zyxel Switches. Is it normal that I don't can get a "Current link Speed", instead I just can get a "Counter" of Bits.  I don't know this type of device, but you cannot simulate the problem with one OID. We think that the high CPU load is caused by the number of OID's.
How do i get the kbit/s to present in grafana ? We send you our Grafana dashboard, see attach
On Mon, Apr 17, 2017 at 10:24 PM, <janni...@gmail.com> wrote:
Hi 


Can you answer me the following questions:
Which OIDs do i have to put in the"generator.yml" ?
Have i to create for each type of device ?
I have two Zyxel Switches. Is it normal that I don't can get a "Current link Speed", instead i just can get a "Counter" of Bits.
How do i get the kbit/s to present in grafana ?


thank you in advance
best regarts
Jannik

--
You received this message because you are subscribed to a topic in the Google Groups "Prometheus Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/prometheus-users/T4nTkUs7iGY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to prometheus-use...@googlegroups.com.
To post to this group, send email to promethe...@googlegroups.com.

Luc Evers

unread,
May 8, 2017, 3:50:13 AM5/8/17
to Prometheus Users, janni...@gmail.com
  Jannik,

      What is the result of your tests?

   Luc.

Op vrijdag 28 april 2017 17:24:36 UTC+2 schreef janni...@gmail.com:

Luc Evers

unread,
May 28, 2017, 8:26:39 AM5/28/17
to Prometheus Users, Jannik Tom Züllig
Progress?

To unsubscribe from this group and all its topics, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/f3e70d17-bb78-451b-938d-81073d30089e%40googlegroups.com.

jeden...@gmail.com

unread,
Jul 13, 2017, 9:05:10 AM7/13/17
to Prometheus Users, janni...@gmail.com
I am also interested in the scalability of Prometheus for snmp hosts that are network devices.  Has more information been learned on this topic?  The ratio of number of OIDs to number of devices could prove to be very important to us.
Progress?

Ben Kochie

unread,
Jul 13, 2017, 9:22:07 AM7/13/17
to jeden...@gmail.com, Prometheus Users, janni...@gmail.com
We recently added SIGHUP support to the snmp_exporter.  This reduces the overhead of having to re-load the config for every request.

A recent test showed you could easily handle over 5000 targets with a single CPU core dedicated to the snmp_exporter.

Most of the overhead is simply dealing with SNMP's protocol.

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/a3fb0898-9791-4b54-abd9-0b827fc5f376%40googlegroups.com.

jeden...@gmail.com

unread,
Jul 13, 2017, 9:48:07 AM7/13/17
to Prometheus Users, jeden...@gmail.com, janni...@gmail.com
How many OIDs can be collected for each of the 5000 targets?  What is the recommendation for 50,000 targets distributed around the globe, which is my environment?
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.

To post to this group, send email to promethe...@googlegroups.com.

Brian Brazil

unread,
Jul 13, 2017, 9:50:49 AM7/13/17
to jeden...@gmail.com, Prometheus Users, janni...@gmail.com
On 13 July 2017 at 14:48, <jeden...@gmail.com> wrote:
How many OIDs can be collected for each of the 5000 targets?  What is the recommendation for 50,000 targets distributed around the globe, which is my environment?

SNMP is a very chatty protocol. You'll want to run an snmp exporter close to each set of targets (and probably a Prometheus close to those too).

Brian
 
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/6a5dc361-980e-4721-bb27-519bb6ef257a%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--

Ben Kochie

unread,
Jul 13, 2017, 9:56:23 AM7/13/17
to jeden...@gmail.com, Prometheus Users, janni...@gmail.com
Generally we recommend running Prometheus server close or inside each "failure domain".  It's designed to be run in a distributed manner, forwarding alerts to central location(s).  Grafana allows templating which Prometheus server to talk to, so you can have a central dashboard location talk to many different globally distributed Prometheus servers.

For something like a typical set of globally distributed datacenters (10s to 100s globally) this works out pretty well.  You can have each location also roll-up site-wide health metrics, and a few "central" Prometheus can use federation to meta-monitor all of the sites globally.

A lot of it depends on how your targets are distributed, how they're connected, etc.  This kind of design is totally possible, but requires a lot more information about your environment to make specific recommendations.

To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/6a5dc361-980e-4721-bb27-519bb6ef257a%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages