Targets are cisco routers.
Prometheus.yml
scrape_configs:
- job_name: 'snmp'
scrape_interval: 60s
scrape_timeout: 60s
file_sd_configs:
- files :
- /etc/prometheus/targets.yml
metrics_path: /snmp
params:
module: [default] #which OID's we will be querying in
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 195.x.x.x.x:9117
Max 115 targets can be put in de target file.
The problem is not memory but CPU which become very high.
If we put 500 targets then snmp_exporter blocks.
Question,
Prometheus design is based on polling (right?) which can be heavy is there are a lot of devices.
We are using grafana as dashboard.
Prometheus , snmp_exporter and Grafana are running in three separated docker containers.
Server: Ubuntu , Memory 250Gb and cpu numbers = 55.
Targets are cisco routers.
Prometheus.yml
scrape_configs: - job_name: 'snmp' scrape_interval: 60s scrape_timeout: 60s file_sd_configs: - files : - /etc/prometheus/targets.yml metrics_path: /snmp params: module: [default] #which OID's we will be querying in relabel_configs: - source_labels: [__address__] target_label: __param_target - source_labels: [__param_target] target_label: instance - target_label: __address__ replacement: 195.x.x.x.x:9117
Max 115 targets can be put in de target file.
The problem is not memory but CPU which become very high.If we put 500 targets then snmp_exporter blocks.
Question,
Prometheus design is based on polling (right?) which can be heavy is there are a lot of devices. We are using grafana as dashboard. Prometheus , snmp_exporter and Grafana are running in three separated docker containers. Server: Ubuntu , Memory 250Gb and cpu numbers = 55.
---> So you mean the CPU becomes high in the SNMP exporter, not in Prometheus?We used 3 docker containers on the same machine. (Prometheus container, SNMP exporter container and Grafana container)SNMP exporter caused the high cpu load.
I'm still disappointed in the low number of snmp targets that can be handled per SNMP exporter because Go is a very powerful language.What if we like to poll 500.000 routers ?
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/dd3a4a55-3dd1-4fea-a5d1-736a60aef393%40googlegroups.com.--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CAA0yNqJB17%2BtddL1XqdjQRu0XAXjdWyFuSqwka1qhbNOe9domQ%40mail.gmail.com.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To post to this group, send email to promethe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CAA0yNqJB17%2BtddL1XqdjQRu0XAXjdWyFuSqwka1qhbNOe9domQ%40mail.gmail.com.
Brian,We also thought in this direction and cleaned the file to a minumum.So we have 74 OID's, which we cannot reduce.Messages if the list is too long:time="2017-03-23T08:36:36Z" level=error msg="Error scraping target a.b.c.d: Error walking target a.b.c.d: Request timeout (after 1 retries)" source="collector.go:125"We also test the targets with errors on a small list and then there are no errors.
With snmp-poller v0.30 we have problems,docker run --name snmp-switch -p 9117:9116 -v /docker/docker-volumes/snmp/snmp_exporter/:/etc/snmp_exporter/ a81b7148413btime="2017-03-23T07:56:05Z" level=info msg="Starting snmp exporter (version=0.3.0, branch=master, revision=6f8aa8a24d720b36991f29ffb179b2896e92090b)" source="main.go:99"time="2017-03-23T07:56:05Z" level=info msg="Build context (go=go1.7.5, user=x@y, date=20170315-16:01:54)" source="main.go:100"time="2017-03-23T07:56:05Z" level=info msg="Listening on :9116" source="main.go:114"time="2017-03-23T07:57:14Z" level=fatal msg="Unknown index type string" source="collector.go:355"
When we use 150 cpe (+/-) and start extending the list with cpe's, we see that we loose other cpe results.What I like to tell is that some cpe which were working before , are not working any more after adding additional cpe's.I expected 10.000 cpe's / per snmp-exporter.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/cdebdf82-3ac2-46fe-bbee-3419ebd6caaa%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/cdebdf82-3ac2-46fe-bbee-3419ebd6caaa%40googlegroups.com.
Actions and results1) Upgrade to snmp_exporter V0.3 by changing the snmp.yml file. (Variable were different!)2) Test with 1 OID , Scrap time: 4 minutes.We see no problems with 6300 targets (routers) can be even more.3) Test with 79 OID's and 500 targets (routers) Scrap time: 4 minutes. Problem: snmp_exporter still works but becomes very slow. Snmp_exporter answer duration > 1 minute which means that Prometheus goes in time-out.Snmp_exporter consumes also a lot of cpu's.
A router can return about 52 lines of text!
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/804ad474-e54f-4a40-bd75-d04f400dff88%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/804ad474-e54f-4a40-bd75-d04f400dff88%40googlegroups.com.
--
You received this message because you are subscribed to a topic in the Google Groups "Prometheus Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/prometheus-users/T4nTkUs7iGY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/edd17d0c-75b0-477a-a56f-aa90c28f91d8%40googlegroups.com.
Hi,What is the progress of this problem, seems to be a bug ?
Op woensdag 19 april 2017 09:57:20 UTC+2 schreef Luc Evers:
Which OIDs do i have to put in the"generator.yml" ? The one I send before == DefaultHave i to create for each type of device ? The module default is used only for Cisco routersI have two Zyxel Switches. Is it normal that I don't can get a "Current link Speed", instead I just can get a "Counter" of Bits. I don't know this type of device, but you cannot simulate the problem with one OID. We think that the high CPU load is caused by the number of OID's.How do i get the kbit/s to present in grafana ? We send you our Grafana dashboard, see attach
On Mon, Apr 17, 2017 at 10:24 PM, <janni...@gmail.com> wrote:
HiCan you answer me the following questions:Which OIDs do i have to put in the"generator.yml" ?Have i to create for each type of device ?I have two Zyxel Switches. Is it normal that I don't can get a "Current link Speed", instead i just can get a "Counter" of Bits.How do i get the kbit/s to present in grafana ?thank you in advance
best regarts
Jannik
--
You received this message because you are subscribed to a topic in the Google Groups "Prometheus Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/prometheus-users/T4nTkUs7iGY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to prometheus-use...@googlegroups.com.
To post to this group, send email to promethe...@googlegroups.com.
To unsubscribe from this group and all its topics, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/f3e70d17-bb78-451b-938d-81073d30089e%40googlegroups.com.
Progress?
--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/a3fb0898-9791-4b54-abd9-0b827fc5f376%40googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To post to this group, send email to promethe...@googlegroups.com.
How many OIDs can be collected for each of the 5000 targets? What is the recommendation for 50,000 targets distributed around the globe, which is my environment?
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/6a5dc361-980e-4721-bb27-519bb6ef257a%40googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/6a5dc361-980e-4721-bb27-519bb6ef257a%40googlegroups.com.