On Mon, Jan 14, 2008 at 03:57:08AM +0000, Patrick Haller wrote:
> Heya,
>
> I just read through esxsnmp and had the following questions:
>
> OID correlators -- Why not postpone that work till presentation to the
> user? Just store the numeric OIDs, their values, and timestamp; then the
> front-end can pick out what it needs to provide information? It looks
> like the poller wants to run fast and tight, so getting the data to the
> disk as fast as possible seems right?
I was initially just saving things to disk using the ifIndex. Lots of other
systems do this (MRTG does for sure). However after a long discussion with
one of my coworkers I changed it to correlate things to interface name. In
short, interface name is more stable than ifIndex.
It can be kind of painful to do the interface name to ifIndex correlation in
post processing because ifIndex is _relatively_ stable but not guaranteed to
be stable. It pretty much will only change when the device reboots, which is
rare for our devices but it still can change.
You could track when the ifIndex changes and have a lookup function in
postprocessing but that's kind of messy. Depending on your level of paranoia
you might still need to check that ifIndex is correct before/after each poll
anyway, so doing a quick correlation is pretty cheap. (An optimization would
be to keep the ifIndex to interface name correlation map in memory and check
sysUptime and only rebuild the mapping if sysUptime decreases, but this hasn't
been necessary so far.)
Also, one of the most common queries will be to look at stats for a given
interface. Using this setup saves doing the correlation at that time. The
other queries will be using the ifRef table in the SQL database and it's just
as easy to return the interface name as it is to return the ifIndex.
The other reason for correlating to name rather than ifIndex is that the name
has meaning whereas the ifIndex is abstract. This is helpful when trolling
through the data store by hand. It also provides some survivability if the
metadata about which ifIndex maps to which interface name gets lost somehow.
I don't consider these last two very important but they are nice side effects.
OK, that's a very long answer to a short question.
I think what would really help make the polling look tighter would be to
replace the process per collection group model. I'm thinking either a couple
of polling processes or at most a process per device with a select loop to
handle the async IO. However I haven't looked into how you'd get ahold of the
sockets in question from yapsnmp...
> Config file -- I guess ConfigParser had an issue?
Meh, I didn't want to take the time to learn another module at that moment and
the config file had a very simple syntax. I just wanted to get the thing
running. I would definitely like to go back and improve that. ConfgParser
would be just fine!
Jon
--
Jon M. Dugan <jdu...@es.net>
ESnet Network Engineering Group
Lawrence Berkeley National Laboratory
Eventually it is probably worth doing the same caching trick as Cacti. (I
didn't mention it as an optimization in my first email, but I guess I wasn't
very clear...)
> I agree that people will most likely track by the interface name; we
> might get some benefit by ID'ing data streams by an internal
> identifier that then links to the interface metadata. It'll make
> easier the linking of data streams as a connection grows or moves.
>
> > Meh, I didn't want to take the time to learn another module at that moment and
> > the config file had a very simple syntax. I just wanted to get the thing
> > running. I would definitely like to go back and improve that. ConfgParser
> > would be just fine!
>
> Cool. I just commit'd the change to poll.py to use ConfigParser.
Great! Thanks. That wasn't too painful... I tweaked it with the last commit
to look in the espolld section rather than the main section.
I also committed some other changes, see the commit log for details. Should I
have commits posted to this list or is that too much noise?
I am not sure I am folling you here... I think you are saying we don't want to
make the storage of the traffic data for a managed device dependent on some
data source other than the device itself. Is that correct?
I was thinking that similar to cacti we could keep a copy of the mapping
between ifIndex and ifAlias in memory and only update it if sysUptime
decreases or there is an ifIndex that does not have a mapping. Each time the
process starts it will fetch an up to date mapping from the device and cache
it in memory until it detects that sysUptime has decreased or it finds an
ifIndex it doesn't have a mapping for (eg. a new interface appears on the
box).
That's an optimization for the future, since rebuillding the mapping each poll
isn't apparently causing any issues right now.
> > Great! Thanks. That wasn't too painful... I tweaked it with the last commit
> > to look in the espolld section rather than the main section.
> >
> > I also committed some other changes, see the commit log for details. Should I
> > have commits posted to this list or is that too much noise?
>
> Good deal. I don't think reduplicating commit logs via the group makes
> much sense either. What direction are you working towards currently?
Presently I am working on finishing the tool to export data in a format
comparable to 'rrdtool fetch'. This way I can feed it into perfSONAR [1].
There's hopefully going to be a demo of this next week at JointTechs [2].
[1] Some more info on perfSONAR: http://www.perfsonar.net/
[2] JointTechs: http://jointtechs.es.net/
Current things I would like to work on are:
Right now I am doing a lot of processing to generate rates and missing
datapoint detection in the esfetch script. This should be pushed into esdb
and/or TSDB as appropriate. This is probably what I will work on later today
after I get my current script deployed for demo testing. This is related to
the next task.
I need to add something similar to consolidation functions in RRD speak. I
think this should go in TSDB rather than in ESxSNMP. I have two thoughts on
how to proceed with this. The first is to have a process that goes through
and creates these aggregates on a periodic basis. For example once an hour
scan the 30 second data and create a min, max and average data point in the
hourly aggregate. Once every 24 hours scan the hourly data and create a data
point in the daily aggregate. The other option is to emulate these aggregates
by skipping through the data and cherry picking values. This will work for
averages over relatively short time lines. Hmm, I think the first option is
considerably more general. The reason for this is to reduce the number of
datapoints needed to plot long term trends.
I think I want to create an aggregate called AVERAGE. This will take the 30
second raw counters and convert them to rates. I think this looks something
like:
average: n bits wide (32 or 64? or maybe a float or double?)
sum: total for this averaging period
datapoints: number of datapoints that make up the sum
optionally: min, max (but those are really their own aggregate, but it
would be convienent to store them here...)
If we have sum and datapoints we can derive average with a single division, so
maybe it is excessive to store the actual average. The reason for keeping
these is so that when we calculate averages for a longer aggreagate we don't
make averages of averages. Hmm, need to think about this more. Any thoughts
you might have are most welcome. Note that I intend to store aggregate
information in a separate file from the raw data.
At the present time I don't do rollover or reset detection. There is a 32 bit
flags field in each TSDB row that could be used to store a flag if we detect a
rollover or a reset. Initially I was planning to do this detection as a post
processing operation but I am not so sure that is the best approach. If it is
done in postprocessing it can be done while creating the aggregates above.
However, to do it in real time only requires that we cache the previous
datapoint which isn't a huge overhead. Reset detection can be done by
monitoring sysUptime at each poll. My plan is to leave the lower 16 bits for
flags defined by TSDB and the upper 16 bits for application specific flags.
Preallocating new chunks for data stored in TSDB. Presently I am chunking the
data for each polled interface into day sized chunks. (TSDB allows you do
decide how you want to chunk things and perhaps day sized chunks are too
small, but that's a different question.) The thing is that every day at
midnight I need to creat 7300 new files, this causes a large IO spike which at
present it causing most polls during the first few minutes of the day to run
longer than the 30 second polling interval I am using. So if I spread out the
creation of these new files over the course of the last hour of each day I
think it would help a lot. This is in some ways a kludge to get around the
fact that the polling and the storage are tightly coupled.
It might be good to decouple the polling and the storage. I didn't do this
initially because I wanted to get some kind of polling going right away and
doing the decoupling properly was taking more time than I wanted. The
original approach I was thinking of was to funnel all writes through the esdbd
daemon. The potential problem with this is that it focuses a lot of traffic
through that daemon, so I'm not sure that's the right way to go. I avoided
having multithreaded processes due to the potential for sublte and annoying
bugs, but it might be a reasonable approach to have a poller and a writer
thread for each polling process. This would skirt the issue caused by the
creation of all the new chunks since polling would continue but the writer
thread would be behind for a few minutes.
Rename esfetch to esxsnmp and add commands in additon to fetch. Commands that
spring to mind are: add-device, add-oid, add-oidset, retire-device.
The ESDB RPC API (wow that's 3 acronyms in a row) is still very much in flux.
It needs to be refined and stabilized.
There's a few things I'd like to work on, I probably should create a wiki page
to encapsulate these.
Hopefully this is interesting and informative for you. It is a helpful
exercise for me in that it gets ideas out of my head and into a more concrete
form.
I need to get back to getting the demo setup.
If you think it would be more productive to talk in person I could meet up
sometime.
Cheers,