Speeding up snmp exporter metrics. snmpwalk taking very long

1,546 views
Skip to first unread message

wcooley...@gmail.com

unread,
Sep 21, 2018, 5:26:30 PM9/21/18
to Prometheus Users
I am trying to speed up the snmp scrapes against a cisco switch stack.

The snmp walk against 1.3.6.1.2.1.2 and 1.3.6.1.2.1.31.1.1 is taking over 10 seconds and returning over 10K PDUs.
This is from a stack of two 24 port switches.
I'm wondering if I can limit the amount of metrics that it's trying to gather over snmp.
Is this something I would be able to do using get: instead of walk: ?
There doesn't seem to be much documentation on how or when to use get: 

Thanks
William

Ben Kochie

unread,
Sep 21, 2018, 6:35:12 PM9/21/18
to wcooley...@gmail.com, Prometheus Users
Get will likely be slower than walk, but there are some things you can do.

Instead of walking the whole interfaces and ifXTable, you could walk just some of the metrics. Here's a simple generator.yml:

modules:
  if_mib:
    walk:
    - ifDescr
    - ifHCOutOctets
    - ifHCInOctets


That will reduce the excess data you're not using.

There are also a couple of performance things to check.
* Are you using snmp v2/3? They are much more efficient than v1.
* What is the latency between your device(s) and your exporter? SNMP is very latency and packet loss sensitive.

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To post to this group, send email to promethe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/c3f3c61e-5cc3-4ff2-9a8e-96d78e33eb7d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

wcooley...@gmail.com

unread,
Sep 24, 2018, 3:45:15 PM9/24/18
to Prometheus Users
Ben,
I didn't have a version set so based on what I see in the documentation it should default to snmp v2. The latency from the exporter to the switch is about 3ms average.

I used your suggested walk: configuration to generate a new config and it made a huge difference.Walk duration is now 0.9 seconds and only returning 670 PDUs.
Not great considering how few metrics are being collected but I can live with this.

Thanks so much for your help!

William

Ben Kochie

unread,
Sep 24, 2018, 3:58:52 PM9/24/18
to wcooley...@gmail.com, Prometheus Users
v2/3ms sounds good. I just wanted to make sure that it was in a resonable range.

You can always expand the list of metrics until you find the right balance of performance and detail.

With a stack of only 2 switches, it sounds like you might have a lot of virtual interfaces in your tree. There's some open issues to control the index range walked, but we haven't had the resources to implement it.

wcooley...@gmail.com

unread,
Sep 24, 2018, 4:06:41 PM9/24/18
to Prometheus Users
I think there's actually a bug in the snmp implementation of this firmware.
It's returning metrics for more physical interfaces than the switches actually have.
It's showing 4 x 52 physical interfaces which is the maximum stack configuration for this hardware family.
Cisco has confirmed the behavior but I'm still waiting to hear if they will do anything about it.

Richard Hartmann

unread,
Sep 25, 2018, 12:35:51 AM9/25/18
to Ben Kochie, wcooley...@gmail.com, Prometheus Users
On Sat, Sep 22, 2018 at 12:35 AM Ben Kochie <sup...@gmail.com> wrote:
>
> Get will likely be slower than walk, but there are some things you can do.

Interesting data point: We are currently wrangling with a bunch of
Huawei storage systems which are

* quick with walk
* fast with get
* <pause for dramatic effect>
* <wait for it>
* <why can't we use ceph and localstorage instead of Huawei storages
for that use case>
* painfully slow with bulkget

They promised a firmware fix, we will see...


Richard

Ben Kochie

unread,
Sep 25, 2018, 3:28:01 AM9/25/18
to Richard Hartmann, wcooley...@gmail.com, Prometheus Users
If only there was a protocol that wasn't so complicated to implement. /s
Reply all
Reply to author
Forward
0 new messages