Prometheus and number of targets for snmp exporter

luce...@gmail.com

unread,

Mar 21, 2017, 2:41:15 PM3/21/17

to Prometheus Users

Targets are cisco routers.

Prometheus.yml

scrape_configs:
  - job_name: 'snmp'
    scrape_interval: 60s
    scrape_timeout: 60s
    file_sd_configs:
        - files :
          - /etc/prometheus/targets.yml
    metrics_path: /snmp
    params:
      module: [default] #which OID's we will be querying in

    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: 195.x.x.x.x:9117

Max 115 targets can be put in de target file.
The problem is not memory but CPU which become very high.

If we put 500 targets then snmp_exporter blocks.

Question,

 Prometheus design is based on polling (right?) which can be heavy is there are a lot of devices.
 

We are using grafana as dashboard.
Prometheus , snmp_exporter and Grafana are running in three separated docker containers.
Server: Ubuntu , Memory 250Gb and cpu numbers = 55.

Julius Volz

unread,

Mar 21, 2017, 3:33:23 PM3/21/17

to luce...@gmail.com, Prometheus Users

On Tue, Mar 21, 2017 at 7:41 PM, <luce...@gmail.com> wrote:

Targets are cisco routers.

Prometheus.yml

scrape_configs:
  - job_name: 'snmp'
    scrape_interval: 60s
    scrape_timeout: 60s
    file_sd_configs:
        - files :
          - /etc/prometheus/targets.yml
    metrics_path: /snmp
    params:
      module: [default] #which OID's we will be querying in

    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: 195.x.x.x.x:9117

Max 115 targets can be put in de target file.
The problem is not memory but CPU which become very high.

If we put 500 targets then snmp_exporter blocks.

So you mean the CPU becomes high in the SNMP exporter, not in Prometheus? Note that the SNMP exporter is completely stateless, so you can horizontally shard it easily with a load balancer in front if that really becomes a problem.

Question,

 Prometheus design is based on polling (right?) which can be heavy is there are a lot of devices.
 

We are using grafana as dashboard.
Prometheus , snmp_exporter and Grafana are running in three separated docker containers.
Server: Ubuntu , Memory 250Gb and cpu numbers = 55.

Yes, Prometheus is based on pulling data instead of getting data pushed to it. Scalability-wise, that makes little difference though (see also https://prometheus.io/blog/2016/07/23/pull-does-not-scale-or-does-it/).

SNMP in particular is poll-based anyways, even if you used a push-based monitoring system.

Cheers,

Julius

luce...@gmail.com

unread,

Mar 22, 2017, 4:51:00 AM3/22/17

to Prometheus Users, luce...@gmail.com

---> So you mean the CPU becomes high in the SNMP exporter, not in Prometheus?

We used 3 docker containers on the same machine. (Prometheus container, SNMP exporter container and Grafana container)

SNMP exporter caused the high cpu load.

I'm still disappointed in the low number of snmp targets that can be handled per SNMP exporter because Go is a very powerful language.

What if we like to poll 500.000 routers ?

Op dinsdag 21 maart 2017 19:41:15 UTC+1 schreef luce...@gmail.com:

Brian Brazil

unread,

Mar 22, 2017, 5:11:33 AM3/22/17

to luce...@gmail.com, Prometheus Users

On 22 March 2017 at 08:50, <luce...@gmail.com> wrote:

---> So you mean the CPU becomes high in the SNMP exporter, not in Prometheus?
We used 3 docker containers on the same machine. (Prometheus container, SNMP exporter container and Grafana container)
SNMP exporter caused the high cpu load.

What version of the snmp exporter are you using?

I'm still disappointed in the low number of snmp targets that can be handled per SNMP exporter because Go is a very powerful language.
What if we like to poll 500.000 routers ?

A single Prometheus wouldn't be able to handle that.

Brian

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/dd3a4a55-3dd1-4fea-a5d1-736a60aef393%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--

Brian Brazil

www.robustperception.io

Luc Evers

unread,

Mar 22, 2017, 9:24:17 AM3/22/17

to Prometheus Users

Brian,

snmp_exporter version 0.2.0

Luc Evers

unread,

Mar 22, 2017, 9:26:06 AM3/22/17

to Prometheus Users

snmp_exporter 0.3.0 is not working with Prometheus. ( snmp_exporter 3.0 container crashed if we reload prometheus container)

Brian Brazil

unread,

Mar 22, 2017, 9:29:52 AM3/22/17

to Luc Evers, Prometheus Users

On 22 March 2017 at 13:23, Luc Evers <luce...@gmail.com> wrote:

Brian,

snmp_exporter version 0.2.0

There are no known performance issues with that version. How many oids are you trying to scrape?

Brian

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CAA0yNqJB17%2BtddL1XqdjQRu0XAXjdWyFuSqwka1qhbNOe9domQ%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

--

Brian Brazil

www.robustperception.io

luce...@gmail.com

unread,

Mar 23, 2017, 5:16:25 AM3/23/17

to Prometheus Users, luce...@gmail.com

Brian,

We also thought in this direction and cleaned the file to a minumum.

So we have 74 OID's, which we cannot reduce.

Messages if the list is too long:

time="2017-03-23T08:36:36Z" level=error msg="Error scraping target a.b.c.d: Error walking target a.b.c.d: Request timeout (after 1 retries)" source="collector.go:125"

We also test the targets with errors on a small list and then there are no errors.

With snmp-poller v0.30 we have problems,

docker run --name snmp-switch -p 9117:9116 -v /docker/docker-volumes/snmp/snmp_exporter/:/etc/snmp_exporter/ a81b7148413b

time="2017-03-23T07:56:05Z" level=info msg="Starting snmp exporter (version=0.3.0, branch=master, revision=6f8aa8a24d720b36991f29ffb179b2896e92090b)" source="main.go:99"

time="2017-03-23T07:56:05Z" level=info msg="Build context (go=go1.7.5, user=x@y, date=20170315-16:01:54)" source="main.go:100"

time="2017-03-23T07:56:05Z" level=info msg="Listening on :9116" source="main.go:114"

time="2017-03-23T07:57:14Z" level=fatal msg="Unknown index type string" source="collector.go:355"

When we use 150 cpe (+/-) and start extending the list with cpe's, we see that we loose other cpe results.

What I like to tell is that some cpe which were working before , are not working any more after adding additional cpe's.

I expected 10.000 cpe's / per snmp-exporter.

Op woensdag 22 maart 2017 14:29:52 UTC+1 schreef Brian Brazil:

To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To post to this group, send email to promethe...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CAA0yNqJB17%2BtddL1XqdjQRu0XAXjdWyFuSqwka1qhbNOe9domQ%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

--
Brian Brazil
www.robustperception.io

Brian Brazil

unread,

Mar 23, 2017, 6:44:38 AM3/23/17

to Luc Evers, Prometheus Users

On 23 March 2017 at 09:16, <luce...@gmail.com> wrote:

Brian,

We also thought in this direction and cleaned the file to a minumum.
So we have 74 OID's, which we cannot reduce.

Messages if the list is too long:

time="2017-03-23T08:36:36Z" level=error msg="Error scraping target a.b.c.d: Error walking target a.b.c.d: Request timeout (after 1 retries)" source="collector.go:125"

We also test the targets with errors on a small list and then there are no errors.

That's a timeout, you need to increase it.

With snmp-poller v0.30 we have problems,

docker run --name snmp-switch -p 9117:9116 -v /docker/docker-volumes/snmp/snmp_exporter/:/etc/snmp_exporter/ a81b7148413b
time="2017-03-23T07:56:05Z" level=info msg="Starting snmp exporter (version=0.3.0, branch=master, revision=6f8aa8a24d720b36991f29ffb179b2896e92090b)" source="main.go:99"
time="2017-03-23T07:56:05Z" level=info msg="Build context (go=go1.7.5, user=x@y, date=20170315-16:01:54)" source="main.go:100"
time="2017-03-23T07:56:05Z" level=info msg="Listening on :9116" source="main.go:114"
time="2017-03-23T07:57:14Z" level=fatal msg="Unknown index type string" source="collector.go:355"

You need to re-run the config generator.

When we use 150 cpe (+/-) and start extending the list with cpe's, we see that we loose other cpe results.
What I like to tell is that some cpe which were working before , are not working any more after adding additional cpe's.

I expected 10.000 cpe's / per snmp-exporter.

Based on a 60s scrape interval, that's 185 OID/s. You should not be having problems at that level, as that's a normal level of data from a single device.

I suspect something else is going on. Have you checked for anything odd with tcpdump?

Brian

To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/cdebdf82-3ac2-46fe-bbee-3419ebd6caaa%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--

Brian Brazil

www.robustperception.io

luce...@gmail.com

unread,

Mar 28, 2017, 4:07:38 AM3/28/17

to Prometheus Users, luce...@gmail.com

Actions and results

1) Upgrade to snmp_exporter V0.3 by changing the snmp.yml file. (Variable were different!)

2) Test with 1 OID , Scrap time: 4 minutes.

We see no problems with 6300 targets (routers) can be even more.

3) Test with 79 OID's and 500 targets (routers) Scrap time: 4 minutes. Problem: snmp_exporter still works but becomes very slow. Snmp_exporter answer duration > 1 minute which means that Prometheus goes in time-out.

Snmp_exporter consumes also a lot of cpu's.

A router can return about 52 lines of text!

Rem

We are using snmpV3.

Test with tcpdump gives no real results, no problems seen.

Op donderdag 23 maart 2017 11:44:38 UTC+1 schreef Brian Brazil:

To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/cdebdf82-3ac2-46fe-bbee-3419ebd6caaa%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
Brian Brazil
www.robustperception.io

Brian Brazil

unread,

Mar 28, 2017, 7:08:11 AM3/28/17

to Luc Evers, Prometheus Users

On 28 March 2017 at 09:07, <luce...@gmail.com> wrote:

Actions and results

1) Upgrade to snmp_exporter V0.3 by changing the snmp.yml file. (Variable were different!)
2) Test with 1 OID , Scrap time: 4 minutes.
We see no problems with 6300 targets (routers) can be even more.
3) Test with 79 OID's and 500 targets (routers) Scrap time: 4 minutes. Problem: snmp_exporter still works but becomes very slow. Snmp_exporter answer duration > 1 minute which means that Prometheus goes in time-out.
Snmp_exporter consumes also a lot of cpu's.

How much CPU exactly? What type of processor are you using?

These numbers aren't really adding up in terms of how we know the snmp exporter performs.

A router can return about 52 lines of text!

That's small in Prometheus terms. Hundreds to thousands of time series per target is usual.

Brian

To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/804ad474-e54f-4a40-bd75-d04f400dff88%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--

Brian Brazil

www.robustperception.io

luce...@gmail.com

unread,

Mar 28, 2017, 9:32:23 AM3/28/17

to Prometheus Users, luce...@gmail.com

The tests are done on a coreos linux with 16 processors

Intel(R) Xeon(R) CPU E5620 @ 2.40GHz

But the same test with 500 routers and 79 OIDs with the same results were also done on:

Linux Ubuntu: 56 processors Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz

Op dinsdag 28 maart 2017 13:08:11 UTC+2 schreef Brian Brazil:

To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/804ad474-e54f-4a40-bd75-d04f400dff88%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
Brian Brazil
www.robustperception.io

Luc Evers

unread,

Apr 11, 2017, 4:55:58 AM4/11/17

to Prometheus Users, luce...@gmail.com

Info

Only 175 targets and we see time-outs in the snmp-exporter loggings.

Error scraping target ...

Source: collector.go 126

Memory is oke but cpu usages is too high.

Op dinsdag 28 maart 2017 15:32:23 UTC+2 schreef Luc Evers:

janni...@gmail.com

unread,

Apr 12, 2017, 5:48:13 AM4/12/17

to Prometheus Users

Hi

I'm new to the theam of Monitoring.
Can you send me yopur configs (Prometheus.yml,snmp.yml, /etc/prometheus/targets.yml).
So that I can orientate me form yout confs.

thank you in advance
best regarts
Jannik

Luc Evers

unread,

Apr 14, 2017, 4:52:32 AM4/14/17

to Prometheus Users, janni...@gmail.com

The requested information.

We are using grafana as dashboard.

Targets can be added via grafana.

Targets IP are not send because of our company policy.

Of course you are welcome for testing with our team.

cat targets.yml

- targets:

- a1.b1.c1.d1

- a2.b2.c2.d2

labels:

job: snmp

cat prometheus.yml

scrape_configs:

- job_name: 'snmp'

scrape_interval: 60s

scrape_timeout: 60s

file_sd_configs:

- files :

- /etc/prometheus/targets.yml

metrics_path: /snmp

params:

module: [default] #which OID's we will be querying in

relabel_configs:

- source_labels: [__address__]

target_label: __param_target

- source_labels: [__param_target]

target_label: instance

- target_label: __address__

replacement: 1.2.3.4:9117

cat snmp.yml

# Default module: interface stats and uptime.

default:

version: 3

auth:

username: xxxxxxxxx

password: yyyyyyyyyyyyyyyyy

auth_protocol: qsdf

priv_protocol: abc

security_level: authPriv

priv_password: zzzzzzzzzzzz

walk:

- 1.3.6.1.2.1.1.3

- 1.3.6.1.2.1.1.5

- 1.3.6.1.2.1.1.6

- 1.3.6.1.6.3.10.2.1

- 1.3.6.1.2.1.2

- 1.3.6.1.2.1.31.1.1.1

- 1.3.6.1.4.1.9.2.1

- 1.3.6.1.4.1.9.9.109.1.1.1.1

- 1.3.6.1.4.1.9.9.48.1.1.1

- 1.3.6.1.4.1.9.9.166.1.6

- 1.3.6.1.2.1.4.31.3

metrics:

- name: mem5mimUsed

oid: 1.3.6.1.4.1.9.9.48.1.1.1.5.1

type: gauge

- name: memFree

oid: 1.3.6.1.4.1.9.9.48.1.1.1.6.1

type: gauge

# - name: ipIfStatsIfIndex

# oid: 1.3.6.1.2.1.4.31.3.1.2

# - name: hrSystem

# oid: 1.3.6.1.2.1.25.1

# - name: sysDescr

# oid: 1.3.6.1.6.3.10.2.1.1

# type: gauge

# - name: sysObjectID

# oid: 1.3.6.1.2.1.1.2

- name: sysUpTime

oid: 1.3.6.1.2.1.1.3

type: gauge

# - name: sysContact

# oid: 1.3.6.1.2.1.1.4

# type: string

- name: sysName

oid: 1.3.6.1.2.1.1.5

type: DisplayString

- name: sysLocation

oid: 1.3.6.1.2.1.1.6

type: DisplayString

# - name: sysServices

# oid: 1.3.6.1.2.1.1.7

# type: displaystring

- name: ifNumber

oid: 1.3.6.1.2.1.2.1

type: gauge

- name: ifEntry

oid: 1.3.6.1.2.2.2.1

type: DisplayString

- name: ifIndex

oid: 1.3.6.1.2.1.2.2.1.1

type: gauge

indexes:

- labelname: ifDescr

type: gauge

lookups:

- labels: [ifDescr]

labelname: ifDescr

oid: 1.3.6.1.2.1.2.2.1.2

type: DisplayString

- name: ifDescr

oid: 1.3.6.1.2.1.2.2.1.2

type: gauge

indexes:

- labelname: ifDescr

type: gauge

lookups:

- labels: [ifDescr]

labelname: ifDescr

oid: 1.3.6.1.2.1.2.2.1.2

type: DisplayString

- name: ifType

oid: 1.3.6.1.2.1.2.2.1.3

type: gauge

indexes:

- labelname: ifDescr

type: gauge

lookups:

- labels: [ifDescr]

labelname: ifDescr

oid: 1.3.6.1.2.1.2.2.1.2

type: DisplayString

- name: ifMtu

oid: 1.3.6.1.2.1.2.2.1.4

type: gauge

indexes:

- labelname: ifDescr

type: gauge

lookups:

- labels: [ifDescr]

labelname: ifDescr

oid: 1.3.6.1.2.1.2.2.1.2

type: DisplayString

- name: ifSpeed

oid: 1.3.6.1.2.1.2.2.1.5

type: gauge

indexes:

- labelname: ifDescr

type: gauge

lookups:

- labels: [ifDescr]

labelname: ifDescr

oid: 1.3.6.1.2.1.2.2.1.2

type: DisplayString

# - name: ifPhysAddress

# oid: 1.3.6.1.2.1.2.2.1.6

# indexes:

# - labelname: ifDescr

# type: gauge

# lookups:

# - labels: [ifDescr]

# labelname: ifDescr

# oid: 1.3.6.1.2.1.2.2.1.2

- name: ifAdminStatus

oid: 1.3.6.1.2.1.2.2.1.7

type: gauge

indexes:

- labelname: ifDescr

type: gauge

lookups:

- labels: [ifDescr]

labelname: ifDescr

oid: 1.3.6.1.2.1.2.2.1.2

type: DisplayString

- name: ifOperStatus

oid: 1.3.6.1.2.1.2.2.1.8

type: gauge

indexes:

- labelname: ifDescr

type: gauge

lookups:

- labels: [ifDescr]

labelname: ifDescr

oid: 1.3.6.1.2.1.2.2.1.2

type: DisplayString

- name: ifLastChange

oid: 1.3.6.1.2.1.2.2.1.9

type : gauge

indexes:

- labelname: ifDescr

type: gauge

lookups:

- labels: [ifDescr]

labelname: ifDescr

oid: 1.3.6.1.2.1.2.2.1.2

type: DisplayString

- name: ifInOctets

oid: 1.3.6.1.2.1.2.2.1.10

type: gauge

indexes:

- labelname: ifDescr

type: gauge

lookups:

- labels: [ifDescr]

labelname: ifDescr

oid: 1.3.6.1.2.1.2.2.1.2

type: DisplayString

- name: ifInUcastPkts

oid: 1.3.6.1.2.1.2.2.1.11

type: gauge

indexes:

- labelname: ifDescr

type: gauge

lookups:

- labels: [ifDescr]

labelname: ifDescr

oid: 1.3.6.1.2.1.2.2.1.2

type: DisplayString

- name: ifInNUcastPkts

oid: 1.3.6.1.2.1.2.2.1.12

type: gauge

indexes:

- labelname: ifDescr

type: gauge

lookups:

- labels: [ifDescr]

labelname: ifDescr

oid: 1.3.6.1.2.1.2.2.1.2

type: DisplayString

- name: ifInDiscards

oid: 1.3.6.1.2.1.2.2.1.13

type: gauge

indexes:

- labelname: ifDescr

type: gauge

lookups:

- labels: [ifDescr]

labelname: ifDescr

oid: 1.3.6.1.2.1.2.2.1.2

type: DisplayString

- name: ifInErrors

oid: 1.3.6.1.2.1.2.2.1.14

type: gauge

indexes:

- labelname: ifDescr

type: Integer

lookups:

- labels: [ifDescr]

labelname: ifDescr

oid: 1.3.6.1.2.1.2.2.1.2

type: DisplayString

- name: ifInUnknownProtos

oid: 1.3.6.1.2.1.2.2.1.15

type: gauge

indexes:

- labelname: ifDescr

type: gauge

lookups:

- labels: [ifDescr]

labelname: ifDescr

oid: 1.3.6.1.2.1.2.2.1.2

type: DisplayString

- name: ifOutOctets

oid: 1.3.6.1.2.1.2.2.1.16

type: gauge

indexes:

- labelname: ifDescr

type: gauge

lookups:

- labels: [ifDescr]

labelname: ifDescr

oid: 1.3.6.1.2.1.2.2.1.2

type: DisplayString

- name: ifOutUcastPkts

oid: 1.3.6.1.2.1.2.2.1.17

type: gauge

indexes:

- labelname: ifDescr

type: gauge

lookups:

- labels: [ifDescr]

labelname: ifDescr

oid: 1.3.6.1.2.1.2.2.1.2

type: DisplayString

- name: ifOutNUcastPkts

oid: 1.3.6.1.2.1.2.2.1.18

type: gauge

indexes:

- labelname: ifDescr

type: gauge

lookups:

- labels: [ifDescr]

labelname: ifDescr

oid: 1.3.6.1.2.1.2.2.1.2

type: DisplayString

- name: ifOutDiscards

oid: 1.3.6.1.2.1.2.2.1.19

type: gauge

indexes:

- labelname: ifDescr

type: gauge

lookups:

- labels: [ifDescr]

labelname: ifDescr

oid: 1.3.6.1.2.1.2.2.1.2

type: DisplayString

- name: ifOutErrors

oid: 1.3.6.1.2.1.2.2.1.20

type: gauge

indexes:

- labelname: ifDescr

type: gauge

lookups:

- labels: [ifDescr]

labelname: ifDescr

oid: 1.3.6.1.2.1.2.2.1.2

type: DisplayString

- name: ifOutQLen

oid: 1.3.6.1.2.1.2.2.1.21

type: gauge

indexes:

- labelname: ifDescr

type: gauge

lookups:

- labels: [ifDescr]

labelname: ifDescr

oid: 1.3.6.1.2.1.2.2.1.2

type: DisplayString

- name: ifSpecific

oid: 1.3.6.1.2.1.2.2.1.22

type: gauge

indexes:

- labelname: ifDescr

type: gauge

lookups:

- labels: [ifDescr]

labelname: ifDescr

oid: 1.3.6.1.2.1.2.2.1.2

type: DisplayString

# - name: ciscoMemoryPoolType

# oid: 1.3.6.1.4.1.9.9.48.1.1.1.1

# - name: ciscoMemoryPoolName

# oid: 1.3.6.1.4.1.9.9.48.1.1.1.2

# - name: ciscoMemoryPoolAlternate

# oid: 1.3.6.1.4.1.9.9.48.1.1.1.3

# - name: ciscoMemoryPoolValid

# oid: 1.3.6.1.4.1.9.9.48.1.1.1.4

# - name: ciscoMemoryPoolUsed

# oid: 1.3.6.1.4.1.9.9.48.1.1.1.5

# - name: ciscoMemoryPoolFree

# oid: 1.3.6.1.4.1.9.9.48.1.1.1.6

# - name: ciscoMemoryPoolLargestFree

# oid: 1.3.6.1.4.1.9.9.48.1.1.1.7

- name: ciscoMemoryPoolUtilization1Min

oid: 1.3.6.1.4.1.9.9.48.1.2.1.1

type: gauge

# - name: ciscoMemoryPoolUtilization5Min

# oid: 1.3.6.1.4.1.9.9.48.1.2.1.2

# - name: ciscoMemoryPoolUtilization10Min

# oid: 1.3.6.1.4.1.9.9.48.1.2.1.3

- name: cpmCPUTotal5minRev

oid: 1.3.6.1.4.1.9.9.109.1.1.1.1.8

type: gauge

- name: cpmCPUTotal1minRev

oid: 1.3.6.1.4.1.9.9.109.1.1.1.1.7

type: gauge

- name: cpmCPUTotal5secRev

oid: 1.3.6.1.4.1.9.9.109.1.1.1.1.6

type: gauge

- name: cpmCPUTotal5min

oid: 1.3.6.1.4.1.9.9.109.1.1.1.1.5

type: gauge

- name: cpmCPUTotal1min

oid: 1.3.6.1.4.1.9.9.109.1.1.1.1.4

type: gauge

- name: cpmCPUTotal5sec

oid: 1.3.6.1.4.1.9.9.109.1.1.1.1.3

type: gauge

- name: avgBusy5

oid: 1.3.6.1.4.1.9.2.1.58

type: gauge

- name: avgBusy1

oid: 1.3.6.1.4.1.9.2.1.57

type: gauge

- name: busyPer

oid: 1.3.6.1.4.1.9.2.1.56

type: gauge

- name: ifInMulticastPkts

oid: 1.3.6.1.2.1.31.1.1.1.2

type: gauge

indexes:

- labelname: ifDescr

type: gauge

lookups:

- labels: [ifDescr]

labelname: ifDescr

oid: 1.3.6.1.2.1.2.2.1.2

type: DisplayString

- name: ifInBroadCastPkts

oid: 1.3.6.1.2.1.31.1.1.1.3

type: gauge

indexes:

- labelname: ifDescr

type: gauge

lookups:

- labels: [ifDescr]

labelname: ifDescr

oid: 1.3.6.1.2.1.2.2.1.2

type: DisplayString

- name: ifOutMulticastPkts

oid: 1.3.6.1.2.1.31.1.1.1.4

type: gauge

indexes:

- labelname: ifDescr

type: gauge

lookups:

- labels: [ifDescr]

labelname: ifDescr

oid: 1.3.6.1.2.1.2.2.1.2

type: DisplayString

- name: ifOutBroadCastPkts

oid: 1.3.6.1.2.1.31.1.1.1.5

type: gauge

indexes:

- labelname: ifDescr

type: gauge

lookups:

- labels: [ifDescr]

labelname: ifDescr

oid: 1.3.6.1.2.1.2.2.1.2

type: DisplayString

- name: ifHCInOctets

oid: 1.3.6.1.2.1.31.1.1.1.6

type: gauge

indexes:

- labelname: ifDescr

type: gauge

lookups:

- labels: [ifDescr]

labelname: ifDescr

oid: 1.3.6.1.2.1.2.2.1.2

type: DisplayString

- name: ifHCInUcastPkts

oid: 1.3.6.1.2.1.31.1.1.1.7

type: gauge

indexes:

- labelname: ifDescr

type: gauge

lookups:

- labels: [ifDescr]

labelname: ifDescr

oid: 1.3.6.1.2.1.2.2.1.2

type: DisplayString

- name: ifHCInBroadcastPkts

oid: 1.3.6.1.2.1.31.1.1.1.8

type: gauge

indexes:

- labelname: ifDescr

type: gauge

lookups:

- labels: [ifDescr]

labelname: ifDescr

oid: 1.3.6.1.2.1.2.2.1.2

type: DisplayString

- name: ifHCInMulticastPkts

oid: 1.3.6.1.2.1.31.1.1.1.9

type: gauge

indexes:

- labelname: ifDescr

type: gauge

lookups:

- labels: [ifDescr]

labelname: ifDescr

oid: 1.3.6.1.2.1.2.2.1.2

type: DisplayString

- name: ifHCOutOctets

oid: 1.3.6.1.2.1.31.1.1.1.10

type: gauge

indexes:

- labelname: ifDescr

type: gauge

lookups:

- labels: [ifDescr]

labelname: ifDescr

oid: 1.3.6.1.2.1.2.2.1.2

type: DisplayString

- name: ifHCOutUcastPkts

oid: 1.3.6.1.2.1.31.1.1.1.11

type: gauge

indexes:

- labelname: ifDescr

type: gauge

lookups:

- labels: [ifDescr]

labelname: ifDescr

oid: 1.3.6.1.2.1.2.2.1.2

type: DisplayString

- name: ifHCOutMulticastPkts

oid: .1.3.6.1.2.1.31.1.1.1.12

type: gauge

indexes:

- labelname: ifDescr

type: gauge

lookups:

- labels: [ifDescr]

labelname: ifDescr

oid: 1.3.6.1.2.1.2.2.1.2

type: DisplayString

- name: ifHCOutBroadcastPkts

oid: 1.3.6.1.2.1.31.1.1.1.13

type: gauge

indexes:

- labelname: ifDescr

type: gauge

lookups:

- labels: [ifDescr]

labelname: ifDescr

oid: 1.3.6.1.2.1.2.2.1.2

type: DisplayString

- name: ifLinkUpDownTrapEnable

oid: 1.3.6.1.2.1.31.1.1.1.14

type: gauge

indexes:

- labelname: ifDescr

type: gauge

lookups:

- labels: [ifDescr]

labelname: ifDescr

oid: 1.3.6.1.2.1.2.2.1.2

type: DisplayString

- name: ifHighSpeed

oid: 1.3.6.1.2.1.31.1.1.1.15

type: gauge

indexes:

- labelname: ifDescr

type: gauge

lookups:

- labels: [ifDescr]

labelname: ifDescr

oid: 1.3.6.1.2.1.2.2.1.2

type: DisplayString

- name: ifPromiscuousMode

oid: 1.3.6.1.2.1.31.1.1.1.16

type: gauge

indexes:

- labelname: ifDescr

type: gauge

lookups:

- labels: [ifDescr]

labelname: ifDescr

oid: 1.3.6.1.2.1.2.2.1.2

type: DisplayString

- name: ifConnectorPresent

oid: 1.3.6.1.2.1.31.1.1.1.17

type: gauge

indexes:

- labelname: ifDescr

type: gauge

lookups:

- labels: [ifDescr]

labelname: ifDescr

oid: 1.3.6.1.2.1.2.2.1.2

type: DisplayString

- name: ifAlias

oid: 1.3.6.1.2.1.31.1.1.1.18

type: gauge

indexes:

- labelname: ifDescr

type: gauge

lookups:

- labels: [ifDescr]

labelname: ifDescr

oid: 1.3.6.1.2.1.2.2.1.2

type: DisplayString

- name: ifCounterDiscontinuityTime

oid: 1.3.6.1.2.1.31.1.1.1.19

type: gauge

indexes:

- labelname: ifDescr

type: gauge

lookups:

- labels: [ifDescr]

labelname: ifDescr

oid: 1.3.6.1.2.1.2.2.1.2

type: DisplayString

Op woensdag 12 april 2017 11:48:13 UTC+2 schreef janni...@gmail.com:

janni...@gmail.com

unread,

Apr 17, 2017, 4:24:57 PM4/17/17

to Prometheus Users

Hi

Can you answer me the following questions:

Which OIDs do i have to put in the"generator.yml" ?

Have i to create for each type of device ?

I have two Zyxel Switches. Is it normal that I don't can get a "Current link Speed", instead i just can get a "Counter" of Bits.

How do i get the kbit/s to present in grafana ?

Luc Evers

unread,

Apr 19, 2017, 3:57:20 AM4/19/17

to Jannik Tom Züllig, Prometheus Users

Which OIDs do i have to put in the"generator.yml" ? The one I send before == Default

Have i to create for each type of device ? The module default is used only for Cisco routers

I have two Zyxel Switches. Is it normal that I don't can get a "Current link Speed", instead I just can get a "Counter" of Bits. I don't know this type of device, but you cannot simulate the problem with one OID. We think that the high CPU load is caused by the number of OID's.

How do i get the kbit/s to present in grafana ? We send you our Grafana dashboard, see attach

--
You received this message because you are subscribed to a topic in the Google Groups "Prometheus Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/prometheus-users/T4nTkUs7iGY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/edd17d0c-75b0-477a-a56f-aa90c28f91d8%40googlegroups.com.

Dashboard RPC Repair-1492585507679.json

Luc Evers

unread,

Apr 28, 2017, 7:44:24 AM4/28/17

to Prometheus Users, janni...@gmail.com

Hi,

What is the progress of this problem, seems to be a bug ?

Op woensdag 19 april 2017 09:57:20 UTC+2 schreef Luc Evers:

janni...@gmail.com

unread,

Apr 28, 2017, 11:24:36 AM4/28/17

to Prometheus Users, janni...@gmail.com

Hi

Thanks for your help. I`ve now understood it.

I just have to create a personal Dashboard witch Grafana.

Have a nice Weekend:

Am Freitag, 28. April 2017 13:44:24 UTC+2 schrieb Luc Evers:

Hi,

What is the progress of this problem, seems to be a bug ?

Op woensdag 19 april 2017 09:57:20 UTC+2 schreef Luc Evers:

Which OIDs do i have to put in the"generator.yml" ? The one I send before == Default
Have i to create for each type of device ? The module default is used only for Cisco routers
I have two Zyxel Switches. Is it normal that I don't can get a "Current link Speed", instead I just can get a "Counter" of Bits. I don't know this type of device, but you cannot simulate the problem with one OID. We think that the high CPU load is caused by the number of OID's.
How do i get the kbit/s to present in grafana ? We send you our Grafana dashboard, see attach

On Mon, Apr 17, 2017 at 10:24 PM, <janni...@gmail.com> wrote:

Hi

Can you answer me the following questions:
Which OIDs do i have to put in the"generator.yml" ?
Have i to create for each type of device ?
I have two Zyxel Switches. Is it normal that I don't can get a "Current link Speed", instead i just can get a "Counter" of Bits.
How do i get the kbit/s to present in grafana ?

thank you in advance
best regarts
Jannik

--
You received this message because you are subscribed to a topic in the Google Groups "Prometheus Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/prometheus-users/T4nTkUs7iGY/unsubscribe.

To unsubscribe from this group and all its topics, send an email to prometheus-use...@googlegroups.com.
To post to this group, send email to promethe...@googlegroups.com.

Luc Evers

unread,

May 8, 2017, 3:50:13 AM5/8/17

to Prometheus Users, janni...@gmail.com

Jannik,

What is the result of your tests?

Luc.

Op vrijdag 28 april 2017 17:24:36 UTC+2 schreef janni...@gmail.com:

Luc Evers

unread,

May 28, 2017, 8:26:39 AM5/28/17

to Prometheus Users, Jannik Tom Züllig

Progress?

To unsubscribe from this group and all its topics, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/f3e70d17-bb78-451b-938d-81073d30089e%40googlegroups.com.

jeden...@gmail.com

unread,

Jul 13, 2017, 9:05:10 AM7/13/17

to Prometheus Users, janni...@gmail.com

I am also interested in the scalability of Prometheus for snmp hosts that are network devices. Has more information been learned on this topic? The ratio of number of OIDs to number of devices could prove to be very important to us.

Progress?

Ben Kochie

unread,

Jul 13, 2017, 9:22:07 AM7/13/17

to jeden...@gmail.com, Prometheus Users, janni...@gmail.com

We recently added SIGHUP support to the snmp_exporter. This reduces the overhead of having to re-load the config for every request.

A recent test showed you could easily handle over 5000 targets with a single CPU core dedicated to the snmp_exporter.

Most of the overhead is simply dealing with SNMP's protocol.

--

You received this message because you are subscribed to the Google Groups "Prometheus Users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/a3fb0898-9791-4b54-abd9-0b827fc5f376%40googlegroups.com.

jeden...@gmail.com

unread,

Jul 13, 2017, 9:48:07 AM7/13/17

to Prometheus Users, jeden...@gmail.com, janni...@gmail.com

How many OIDs can be collected for each of the 5000 targets? What is the recommendation for 50,000 targets distributed around the globe, which is my environment?

To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.

To post to this group, send email to promethe...@googlegroups.com.

Brian Brazil

unread,

Jul 13, 2017, 9:50:49 AM7/13/17

to jeden...@gmail.com, Prometheus Users, janni...@gmail.com

On 13 July 2017 at 14:48, <jeden...@gmail.com> wrote:

How many OIDs can be collected for each of the 5000 targets? What is the recommendation for 50,000 targets distributed around the globe, which is my environment?

SNMP is a very chatty protocol. You'll want to run an snmp exporter close to each set of targets (and probably a Prometheus close to those too).

Brian

To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/6a5dc361-980e-4721-bb27-519bb6ef257a%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--

Brian Brazil

www.robustperception.io

Ben Kochie

unread,

Jul 13, 2017, 9:56:23 AM7/13/17

to jeden...@gmail.com, Prometheus Users, janni...@gmail.com

Generally we recommend running Prometheus server close or inside each "failure domain". It's designed to be run in a distributed manner, forwarding alerts to central location(s). Grafana allows templating which Prometheus server to talk to, so you can have a central dashboard location talk to many different globally distributed Prometheus servers.

For something like a typical set of globally distributed datacenters (10s to 100s globally) this works out pretty well. You can have each location also roll-up site-wide health metrics, and a few "central" Prometheus can use federation to meta-monitor all of the sites globally.

A lot of it depends on how your targets are distributed, how they're connected, etc. This kind of design is totally possible, but requires a lot more information about your environment to make specific recommendations.

To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/6a5dc361-980e-4721-bb27-519bb6ef257a%40googlegroups.com.

Reply all

Reply to author

Forward