SNMP MIB-2 and HRstorage Exporter?

134 views
Skip to first unread message

dnee...@redhat.com

unread,
Oct 27, 2017, 1:24:09 PM10/27/17
to Prometheus Users
I was curious if anyone built a MIB-2 Exporter?  That is collecting CPU, memory, disk information via the generic MIB-2 and HRStorage branches of SNMP rather than co-located Node_Exporters.

Ben Kochie

unread,
Oct 27, 2017, 2:52:19 PM10/27/17
to dnee...@redhat.com, Prometheus Users
This is possible with the snmp_exporter.  There is an automatic generator that will take an arbitrary walk and generate a metrics translation from the MIB file.

The down side to this is SNMP is an _awful_ protocol.  Extremely chatty, UDP, difficult to make match to the Prometheus data model.

On the other hand, the node_exporter provides a lot more information than anything you can get from your typical MIBs and SNMP implementations.  We can also move a lot more quickly with the node_exporter code than we could possibly do with SNMP.

For example, we added direct metric access to adjtimex in less than a month[0], which included a long discussion about naming, which metric we wanted, and onboarding a new contributor to the project.


On Fri, Oct 27, 2017 at 7:24 PM, <dnee...@redhat.com> wrote:
I was curious if anyone built a MIB-2 Exporter?  That is collecting CPU, memory, disk information via the generic MIB-2 and HRStorage branches of SNMP rather than co-located Node_Exporters.

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/5b961ef7-1032-4cca-b504-53bc1e18f36a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Daniel Needles

unread,
Oct 27, 2017, 4:55:24 PM10/27/17
to Prometheus Users
Agreed on SNMP.  This is sort of what you get when the military designs the Internet.  Everything sort of looks like a hammer - aka security, unlike every other complex system in nature.  Discovery via pounding the heck of things was a horrid idea, rather than simply asking them to register and having them tell you what they want afterwards.  Not to mention with passwords in the clear isn't a great idea either 8-)  

That said, it did make sense due to agent management.  Can you spot check my thinking here?  Specifically, I'm guessing dealing with Node_Exporter distribution is not as big of a deal as it was in traditional client/server.  Given puppet/chef,orchestration, etc - it's much easier to bake in a copy of Node_Exporter and manually tracking "agents" via manual add, change, and deletes is just more trivial now.  Is that a correct interpretation from what you have seen or is there a different read?


On Friday, October 27, 2017 at 11:52:19 AM UTC-7, Ben Kochie wrote:
This is possible with the snmp_exporter.  There is an automatic generator that will take an arbitrary walk and generate a metrics translation from the MIB file.

The down side to this is SNMP is an _awful_ protocol.  Extremely chatty, UDP, difficult to make match to the Prometheus data model.

On the other hand, the node_exporter provides a lot more information than anything you can get from your typical MIBs and SNMP implementations.  We can also move a lot more quickly with the node_exporter code than we could possibly do with SNMP.

For example, we added direct metric access to adjtimex in less than a month[0], which included a long discussion about naming, which metric we wanted, and onboarding a new contributor to the project.

On Fri, Oct 27, 2017 at 7:24 PM, <dnee...@redhat.com> wrote:
I was curious if anyone built a MIB-2 Exporter?  That is collecting CPU, memory, disk information via the generic MIB-2 and HRStorage branches of SNMP rather than co-located Node_Exporters.

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To post to this group, send email to promethe...@googlegroups.com.

Ben Kochie

unread,
Oct 27, 2017, 5:22:09 PM10/27/17
to Daniel Needles, Prometheus Users
It's not just SNMP, it's enterprise design thinking. One of the biggest failures in design patterns I've discovered for myself recently has been the enterprise idea that "Monitoring" and "Management" need to be in the same system.  This is how we ended up with systems like SNMP. Where you have metrics, system state tables, and configuration updates all in the same protocol.  This has been around so long that enterprise thinkers assume that it's supposed to be this way, and expect it.

This is a unnecessary, and detrimental, design pattern.

Prometheus presents an extremely efficient, single design, metrics-based monitoring system.  This follows the *NIX design philosophy of do one thing well.  We stick to this so much, that we completely reject any kind of templating for our own configuration, and leave that to external configuration management software.



To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/f6690410-8915-4739-8053-495f5e2975cd%40googlegroups.com.

dnee...@redhat.com

unread,
Oct 27, 2017, 5:36:57 PM10/27/17
to Prometheus Users
Ah that is a good point.  To be fair things were quite different pre-cloud.  It is similar to the evolution that occurred when creatures moved from the sea to land (and is part of the reason the salinity of the blood matches that of the ocean a couple billion years ago.)   It makes sense to delegate these two with clouds, orchestration, etc.  So I am thinking we might be in violent agreement here.  8-)  The one bit I am still fuzzy on is discovery/registry of elements and discovery/registry of the properties of these elements within Prometheus. This is where there is some overlap since monitoring becomes a customer of configuration.  That said, the RTFM looks pretty good here.  Thanks for all the help!

Daniel Needles

unread,
Oct 27, 2017, 6:36:00 PM10/27/17
to Prometheus Users
Another minor nuance.  SNMP isn't necessarily inefficient.  However it is almost always implemented that way AND sometimes you have to have knowledge of the underlying transport when things are done in bulk.  In particular you don't have to snmp-walk the table or make iterative requests.  This is only when you do not know what properties (oids) to ask for.  If you know these ahead of time, you can simply do a getbulk and get everything in a single packet.  The caveat is that some implementations you can overrun the UDP packet.  Worse, even when the oids are known, almost every implementation I saw of SNMP polling would send individual packets for every property/oid request.   
 
As I understand it so far node_exporter bypasses that issue by not allowing the "pick and chose" option and as a result can abstract the reference to the node level (i.e. the ReSTful call gets everything as-is.  That prevents folks from having to know a particular property/oid AND bars the user from shooting themselves in the foot since they cannot make 100 separate requests for each property of the node when they could just do one call.

Daniel Needles

unread,
Oct 27, 2017, 6:45:31 PM10/27/17
to Prometheus Users
I forgot to include the link to an example that does this (in case anyone is curious) --  http://www.nmsguru.com/2013/01/19/poller-show-me-the-code/  This handled up to 10K packets stuffed with 50 oids/properties each in under 1 minute  (albeit - that was the limit given bandwidth availability in 2000 compounded by its use of "synchronous collection" -- blast packets for 20 seconds, then wait for 40 seconds to tally responses before "calling it" at a minute.)

Ben Kochie

unread,
Oct 27, 2017, 8:17:39 PM10/27/17
to Daniel Needles, Prometheus Users
Yup, SNMP was designed in an era where devices had kilobytes to megabytes of memory, and CPUs were measured in single digit mhz.  So efficiencies were different tradeoffs.

Now we're talking about the Prometheus wire format that takes advantage of native zlib compression instructions in modern CPU cores. Just to reduce the network bandwidth requirements. A typical node_exporter scrape with 800 metrics is 15kB over the wire in 30ms. And this isn't the most optimal code for sure. We trade off a lot of byte bandwidth for every scrape in order to eliminate the need for complex MIB-style context tables. Every scrape includes the human-readable name of each metric, and even help annotations.

Your example of 25k samples per second for SNMP isn't bad, but even in 2013 (if my memory serves), Prometheus had broken 100k samples per second a couple years into development. Now we're fast approaching 1M samples/second. Transit and ingestion is not much of a bottleneck anymore.

We have recently[0] introduced selective collection as a design pattern, but most people abuse it for turning metrics from monitoring into performance profiling.


--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/0df2299d-15ed-4879-b051-7fe0f1a42aa4%40googlegroups.com.

Daniel Needles

unread,
Oct 27, 2017, 8:38:53 PM10/27/17
to Prometheus Users
Ben,
  Excellent points.  Thanks for the insights!  8-)
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To post to this group, send email to promethe...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages