Re: Just read through esxsnmp

17 views

Skip to first unread message

Jon M. Dugan

unread,

Jan 15, 2008, 4:42:07 AM1/15/08

to Patrick Haller, esx...@googlegroups.com

[Note: I Cc'd esx...@googlegroups.com to archive my answer. I like the idea
of archiving our discussions there.]

On Mon, Jan 14, 2008 at 03:57:08AM +0000, Patrick Haller wrote:
> Heya,
>
> I just read through esxsnmp and had the following questions:
>
> OID correlators -- Why not postpone that work till presentation to the
> user? Just store the numeric OIDs, their values, and timestamp; then the
> front-end can pick out what it needs to provide information? It looks
> like the poller wants to run fast and tight, so getting the data to the
> disk as fast as possible seems right?

I was initially just saving things to disk using the ifIndex. Lots of other
systems do this (MRTG does for sure). However after a long discussion with
one of my coworkers I changed it to correlate things to interface name. In
short, interface name is more stable than ifIndex.

It can be kind of painful to do the interface name to ifIndex correlation in
post processing because ifIndex is _relatively_ stable but not guaranteed to
be stable. It pretty much will only change when the device reboots, which is
rare for our devices but it still can change.

You could track when the ifIndex changes and have a lookup function in
postprocessing but that's kind of messy. Depending on your level of paranoia
you might still need to check that ifIndex is correct before/after each poll
anyway, so doing a quick correlation is pretty cheap. (An optimization would
be to keep the ifIndex to interface name correlation map in memory and check
sysUptime and only rebuild the mapping if sysUptime decreases, but this hasn't
been necessary so far.)

Also, one of the most common queries will be to look at stats for a given
interface. Using this setup saves doing the correlation at that time. The
other queries will be using the ifRef table in the SQL database and it's just
as easy to return the interface name as it is to return the ifIndex.

The other reason for correlating to name rather than ifIndex is that the name
has meaning whereas the ifIndex is abstract. This is helpful when trolling
through the data store by hand. It also provides some survivability if the
metadata about which ifIndex maps to which interface name gets lost somehow.
I don't consider these last two very important but they are nice side effects.

OK, that's a very long answer to a short question.

I think what would really help make the polling look tighter would be to
replace the process per collection group model. I'm thinking either a couple
of polling processes or at most a process per device with a select loop to
handle the async IO. However I haven't looked into how you'd get ahold of the
sockets in question from yapsnmp...

> Config file -- I guess ConfigParser had an issue?

Meh, I didn't want to take the time to learn another module at that moment and
the config file had a very simple syntax. I just wanted to get the thing
running. I would definitely like to go back and improve that. ConfgParser
would be just fine!

Jon
--
Jon M. Dugan <jdu...@es.net>
ESnet Network Engineering Group
Lawrence Berkeley National Laboratory

Patrick

unread,

Jan 16, 2008, 1:49:56 AM1/16/08

to esxsnmp

On Jan 15, 9:42 am, "Jon M. Dugan" <jdu...@es.net> wrote:
> OK, that's a very long answer to a short question.

Seems like peeps have to have run into the IfIndex instability issue
before; looking at Cacti, they check the uptime OID for the host and
if it's less than last, it re-caches the IfIndex list.

I agree that people will most likely track by the interface name; we
might get some benefit by ID'ing data streams by an internal
identifier that then links to the interface metadata. It'll make
easier the linking of data streams as a connection grows or moves.

> Meh, I didn't want to take the time to learn another module at that moment and
> the config file had a very simple syntax. I just wanted to get the thing
> running. I would definitely like to go back and improve that. ConfgParser
> would be just fine!

Cool. I just commit'd the change to poll.py to use ConfigParser.

Patrick.

Jon M. Dugan

unread,

Jan 16, 2008, 8:11:46 PM1/16/08

to esx...@googlegroups.com

On Tue, Jan 15, 2008 at 10:49:56PM -0800, Patrick wrote:
>
> On Jan 15, 9:42 am, "Jon M. Dugan" <jdu...@es.net> wrote:
> > OK, that's a very long answer to a short question.
>
> Seems like peeps have to have run into the IfIndex instability issue
> before; looking at Cacti, they check the uptime OID for the host and
> if it's less than last, it re-caches the IfIndex list.

Eventually it is probably worth doing the same caching trick as Cacti. (I
didn't mention it as an optimization in my first email, but I guess I wasn't
very clear...)

> I agree that people will most likely track by the interface name; we
> might get some benefit by ID'ing data streams by an internal
> identifier that then links to the interface metadata. It'll make
> easier the linking of data streams as a connection grows or moves.
>
> > Meh, I didn't want to take the time to learn another module at that moment and
> > the config file had a very simple syntax. I just wanted to get the thing
> > running. I would definitely like to go back and improve that. ConfgParser
> > would be just fine!
>
> Cool. I just commit'd the change to poll.py to use ConfigParser.

Great! Thanks. That wasn't too painful... I tweaked it with the last commit
to look in the espolld section rather than the main section.

I also committed some other changes, see the commit log for details. Should I
have commits posted to this list or is that too much noise?

Patrick

unread,

Jan 17, 2008, 1:03:00 AM1/17/08

to esxsnmp

On Jan 17, 1:11 am, "Jon M. Dugan" <jdu...@es.net> wrote:
> Eventually it is probably worth doing the same caching trick as Cacti. (I
> didn't mention it as an optimization in my first email, but I guess I wasn't
> very clear...)

The database tables' idea of an external object that has traffic
should not be tied to the interface name, descr, address, etc. because
we don't control those names (in esxsnmp) and any of that metadata can
change without changing the traffic, which is what we care about when
graphing over time.

> Great! Thanks. That wasn't too painful... I tweaked it with the last commit
> to look in the espolld section rather than the main section.
>
> I also committed some other changes, see the commit log for details. Should I
> have commits posted to this list or is that too much noise?

Good deal. I don't think reduplicating commit logs via the group makes
much sense either. What direction are you working towards currently?

Patrick.

Jon M. Dugan

unread,

Jan 17, 2008, 6:06:13 PM1/17/08

to esx...@googlegroups.com

On Wed, Jan 16, 2008 at 10:03:00PM -0800, Patrick wrote:
>
> On Jan 17, 1:11 am, "Jon M. Dugan" <jdu...@es.net> wrote:
> > Eventually it is probably worth doing the same caching trick as Cacti. (I
> > didn't mention it as an optimization in my first email, but I guess I wasn't
> > very clear...)
>
> The database tables' idea of an external object that has traffic
> should not be tied to the interface name, descr, address, etc. because
> we don't control those names (in esxsnmp) and any of that metadata can
> change without changing the traffic, which is what we care about when
> graphing over time.

I am not sure I am folling you here... I think you are saying we don't want to
make the storage of the traffic data for a managed device dependent on some
data source other than the device itself. Is that correct?

I was thinking that similar to cacti we could keep a copy of the mapping
between ifIndex and ifAlias in memory and only update it if sysUptime
decreases or there is an ifIndex that does not have a mapping. Each time the
process starts it will fetch an up to date mapping from the device and cache
it in memory until it detects that sysUptime has decreased or it finds an
ifIndex it doesn't have a mapping for (eg. a new interface appears on the
box).

That's an optimization for the future, since rebuillding the mapping each poll
isn't apparently causing any issues right now.

> > Great! Thanks. That wasn't too painful... I tweaked it with the last commit
> > to look in the espolld section rather than the main section.
> >
> > I also committed some other changes, see the commit log for details. Should I
> > have commits posted to this list or is that too much noise?
>
> Good deal. I don't think reduplicating commit logs via the group makes
> much sense either. What direction are you working towards currently?

Presently I am working on finishing the tool to export data in a format
comparable to 'rrdtool fetch'. This way I can feed it into perfSONAR [1].
There's hopefully going to be a demo of this next week at JointTechs [2].

[1] Some more info on perfSONAR: http://www.perfsonar.net/
[2] JointTechs: http://jointtechs.es.net/

Current things I would like to work on are:

Right now I am doing a lot of processing to generate rates and missing
datapoint detection in the esfetch script. This should be pushed into esdb
and/or TSDB as appropriate. This is probably what I will work on later today
after I get my current script deployed for demo testing. This is related to
the next task.

I need to add something similar to consolidation functions in RRD speak. I
think this should go in TSDB rather than in ESxSNMP. I have two thoughts on
how to proceed with this. The first is to have a process that goes through
and creates these aggregates on a periodic basis. For example once an hour
scan the 30 second data and create a min, max and average data point in the
hourly aggregate. Once every 24 hours scan the hourly data and create a data
point in the daily aggregate. The other option is to emulate these aggregates
by skipping through the data and cherry picking values. This will work for
averages over relatively short time lines. Hmm, I think the first option is
considerably more general. The reason for this is to reduce the number of
datapoints needed to plot long term trends.

I think I want to create an aggregate called AVERAGE. This will take the 30
second raw counters and convert them to rates. I think this looks something
like:

average: n bits wide (32 or 64? or maybe a float or double?)
sum: total for this averaging period
datapoints: number of datapoints that make up the sum

optionally: min, max (but those are really their own aggregate, but it
would be convienent to store them here...)

If we have sum and datapoints we can derive average with a single division, so
maybe it is excessive to store the actual average. The reason for keeping
these is so that when we calculate averages for a longer aggreagate we don't
make averages of averages. Hmm, need to think about this more. Any thoughts
you might have are most welcome. Note that I intend to store aggregate
information in a separate file from the raw data.

At the present time I don't do rollover or reset detection. There is a 32 bit
flags field in each TSDB row that could be used to store a flag if we detect a
rollover or a reset. Initially I was planning to do this detection as a post
processing operation but I am not so sure that is the best approach. If it is
done in postprocessing it can be done while creating the aggregates above.
However, to do it in real time only requires that we cache the previous
datapoint which isn't a huge overhead. Reset detection can be done by
monitoring sysUptime at each poll. My plan is to leave the lower 16 bits for
flags defined by TSDB and the upper 16 bits for application specific flags.

Preallocating new chunks for data stored in TSDB. Presently I am chunking the
data for each polled interface into day sized chunks. (TSDB allows you do
decide how you want to chunk things and perhaps day sized chunks are too
small, but that's a different question.) The thing is that every day at
midnight I need to creat 7300 new files, this causes a large IO spike which at
present it causing most polls during the first few minutes of the day to run
longer than the 30 second polling interval I am using. So if I spread out the
creation of these new files over the course of the last hour of each day I
think it would help a lot. This is in some ways a kludge to get around the
fact that the polling and the storage are tightly coupled.

It might be good to decouple the polling and the storage. I didn't do this
initially because I wanted to get some kind of polling going right away and
doing the decoupling properly was taking more time than I wanted. The
original approach I was thinking of was to funnel all writes through the esdbd
daemon. The potential problem with this is that it focuses a lot of traffic
through that daemon, so I'm not sure that's the right way to go. I avoided
having multithreaded processes due to the potential for sublte and annoying
bugs, but it might be a reasonable approach to have a poller and a writer
thread for each polling process. This would skirt the issue caused by the
creation of all the new chunks since polling would continue but the writer
thread would be behind for a few minutes.

Rename esfetch to esxsnmp and add commands in additon to fetch. Commands that
spring to mind are: add-device, add-oid, add-oidset, retire-device.

The ESDB RPC API (wow that's 3 acronyms in a row) is still very much in flux.
It needs to be refined and stabilized.

There's a few things I'd like to work on, I probably should create a wiki page
to encapsulate these.

Hopefully this is interesting and informative for you. It is a helpful
exercise for me in that it gets ideas out of my head and into a more concrete
form.

I need to get back to getting the demo setup.

If you think it would be more productive to talk in person I could meet up
sometime.

Cheers,

Patrick

unread,

Jan 17, 2008, 11:13:32 PM1/17/08

to esxsnmp

On Jan 17, 11:06 pm, "Jon M. Dugan" <jdu...@es.net> wrote:
> I am not sure I am folling you here... I think you are saying we don't want to
> make the storage of the traffic data for a managed device dependent on some
> data source other than the device itself. Is that correct?

Well, identification of a given data flow should be internal to
esxsnmp, like:

create table flows (
id bigint primary key not null auto_increment,
bytes bigint not null default 0,
stamp timestamp default now()
)

create table interfaces (
-- recache this when sysUptime zeroes
id bigint primary key not null auto_increment,
host bigint not null, -- IP as decimal
index bigint not null, -- ifIndex
name varchar(255) not null, -- ifName
descr varchar(255), -- ifDescr
flow bigint references flows(id)
)

So if any metadata in interfaces changes, we just update that and the
flow data remains independent.

> I was thinking that similar to cacti we could keep a copy of the mapping
> between ifIndex and ifAlias in memory and only update it if sysUptime
> decreases or there is an ifIndex that does not have a mapping. Each time the
> process starts it will fetch an up to date mapping from the device and cache
> it in memory until it detects that sysUptime has decreased or it finds an
> ifIndex it doesn't have a mapping for (eg. a new interface appears on the
> box).
>
> That's an optimization for the future, since rebuillding the mapping each poll
> isn't apparently causing any issues right now.

True. I'm just pointing at the mapping and saying it shouldn't map to
a interface name, but rather flow.id from above.

> Presently I am working on finishing the tool to export data in a format
> comparable to 'rrdtool fetch'. This way I can feed it into perfSONAR [1].
> There's hopefully going to be a demo of this next week at JointTechs [2].
>
> [1] Some more info on perfSONAR:http://www.perfsonar.net/
> [2] JointTechs:http://jointtechs.es.net/

Nice!

> Right now I am doing a lot of processing to generate rates and missing
> datapoint detection in the esfetch script. This should be pushed into esdb
> and/or TSDB as appropriate. This is probably what I will work on later today
> after I get my current script deployed for demo testing. This is related to
> the next task.

Why don't we put missing datapoints into the tables as NULLs? We can
differentiate between NULLs and 0s when we do any work on the data.

> If we have sum and datapoints we can derive average with a single division, so
> maybe it is excessive to store the actual average. The reason for keeping
> these is so that when we calculate averages for a longer aggreagate we don't
> make averages of averages. Hmm, need to think about this more. Any thoughts
> you might have are most welcome. Note that I intend to store aggregate
> information in a separate file from the raw data.

We can do most of this in SQL, in the past I've hacked around a bit
with transforms like this:
http://haller.ws/logs/view.cgi/CalculatingDerivativesWithinSQL

> At the present time I don't do rollover or reset detection. There is a 32 bit
> flags field in each TSDB row that could be used to store a flag if we detect a
> rollover or a reset. Initially I was planning to do this detection as a post
> processing operation but I am not so sure that is the best approach. If it is
> done in postprocessing it can be done while creating the aggregates above.
> However, to do it in real time only requires that we cache the previous
> datapoint which isn't a huge overhead. Reset detection can be done by
> monitoring sysUptime at each poll. My plan is to leave the lower 16 bits for
> flags defined by TSDB and the upper 16 bits for application specific flags.

I vote for post-processing detection of any issues. We may get smarter
in the future about detection and we'd still have the data if we post-
processed. The poller should just get data to disk, while the
extraction routines should turn that data into information for the
user.

> Preallocating new chunks for data stored in TSDB. Presently I am chunking the
> data for each polled interface into day sized chunks. (TSDB allows you do
> decide how you want to chunk things and perhaps day sized chunks are too
> small, but that's a different question.) The thing is that every day at
> midnight I need to creat 7300 new files, this causes a large IO spike which at
> present it causing most polls during the first few minutes of the day to run
> longer than the 30 second polling interval I am using. So if I spread out the
> creation of these new files over the course of the last hour of each day I
> think it would help a lot. This is in some ways a kludge to get around the
> fact that the polling and the storage are tightly coupled.
>
> It might be good to decouple the polling and the storage. I didn't do this
> initially because I wanted to get some kind of polling going right away and
> doing the decoupling properly was taking more time than I wanted. The
> original approach I was thinking of was to funnel all writes through the esdbd
> daemon. The potential problem with this is that it focuses a lot of traffic
> through that daemon, so I'm not sure that's the right way to go. I avoided
> having multithreaded processes due to the potential for sublte and annoying
> bugs, but it might be a reasonable approach to have a poller and a writer
> thread for each polling process. This would skirt the issue caused by the
> creation of all the new chunks since polling would continue but the writer
> thread would be behind for a few minutes.

Well, one writer process works, I vote for letting it worry about
directing data traffic to a distributed system if needed.

> Rename esfetch to esxsnmp and add commands in additon to fetch. Commands that
> spring to mind are: add-device, add-oid, add-oidset, retire-device.
>
> The ESDB RPC API (wow that's 3 acronyms in a row) is still very much in flux.
> It needs to be refined and stabilized.
>
> There's a few things I'd like to work on, I probably should create a wiki page
> to encapsulate these.
>
> Hopefully this is interesting and informative for you. It is a helpful
> exercise for me in that it gets ideas out of my head and into a more concrete
> form.
>
> I need to get back to getting the demo setup.
>
> If you think it would be more productive to talk in person I could meet up
> sometime.