Feature: Database Storage of Results

31 views
Skip to first unread message

Brian Knox

unread,
Feb 24, 2010, 8:33:32 AM2/24/10
to multi-mechanize
Your multi-mechanize release came just at the right time. I am in the
middle of load testing and optimization of a large web application. I
had just started writing a tool very similar in concept to multi-
mechanize, and you have saved me a great deal of time! So first of
all, thank you.

I am most of the way finished with adding database storage of result
sets to multi-mechanize (I should have it completed later today) I
have a stats db where I aggregate the logs from our web servers and
application for analysis, and sending the results straight to this
database will allow me to perform some deeper analysis of our
application.

I am have the database storage configurable through the multi-
mechanize config file. The code allows for storing in the database
while still writing to the normal multi-mechanize results file. I
would be more than glad to contribute this code back to you if anyone
would be interested in this capability and it would save you the
trouble of writing it yourself.

With much appreciation,
Brian

corey goldberg

unread,
Feb 24, 2010, 9:23:35 AM2/24/10
to multi-mechanize
Hi Brian,
thanks for your post.

> The code allows for storing in the database
> while still writing to the normal multi-mechanize results file.  

> would be more than glad to contribute this code back

what type of database did you use? this might be a feature that
others would want as well. perhaps you can explain a little more
about what data is stored, and then how you are using the database
afterwards. did you build any reporting?

regards,
-Corey

Brian Knox

unread,
Feb 24, 2010, 11:20:29 AM2/24/10
to multi-m...@googlegroups.com
At the moment, I am just storing the same results that are put into the CSV file, into a table in my stats db.  I'm using a postgresql database for myself.  The additions I've made to multi-mechanize are using sqlalchemy on the back end and therefore aren't dependent on postgres... you could just as easily configure it for mysql, sql server, or whatever is your preference.

In my specific case, outside of multi-mechanize, I have python scripts I've written for importing our apache logs.  I have one that understands common logger format, and one for our specific custom logging, which includes a few extras in our logs such as response time in microseconds, and I'm importing our rails application log files (yes, I'm using python tools to load test rails applications :) ).

I don't have any nice reporting features yet...  I query our db manually so look for what is happening in the application logs and the apache logs around any time periods from the load tests that look "interesting" to give myself a clearer understanding of what is occurring.

I have plans on making a little web.py interface that at the very least can use the code you've written for generating your html reports and graphs so that something like http://myreportingserver/results/2010_02_24_07_40_14 would bring up the corresponding report page + graphs.  After that, I'll be looking at what sort of useful reports I can write using the multi-mechanize results + the apache logs, etc together.

My immediate plans are:

1. finish the code for storing results in the DB.  This is pretty simple.  In my config.cfg file, I have:

db_logging: postgres://user:password@host:port/dbname

I modified multi_mechanize.py slightly so that if there is a db_logging option in the config file, it starts up a sqlalchemy session using the connection information.

In the multi-mechanize/lib directory, I've added a lib, "storedresult.py", which contains the sqlalchemy table definition and a "StoredResult" class file that accepts the data used for generating the csv files for storage into the db (trans_count, elapsed, epoch, user_group_name, scriptrun_time, error, custom_timers).  This will be stored along with the run number.

Next I'll modify multi_mechanize.py to use that class to write the results to a 'multimech_stored_results' table in the configured database. 

I figure that once I have that done, that could be useful to others as well.  I'd be glad to shoot the code your way to take a look at it once I get to the point that the db storage is working (hopefully later today if I can squeeze the time for working on it out of my day).

Brian

Corey Goldberg

unread,
Feb 24, 2010, 11:55:25 AM2/24/10
to multi-m...@googlegroups.com
> I figure that once I have that done, that could be useful to others as
> well.  I'd be glad to shoot the code your way to take a look at


that would be great. I'll gladly take a look. this could be really
useful. I especially like that its not tied to a specific database
type.

I just did a release a few mins ago, so you might want to take a look
at the new directory structure and base your code off that.

-Corey

Brian Knox

unread,
Feb 24, 2010, 1:02:23 PM2/24/10
to multi-m...@googlegroups.com
Up and running on it now.  The new projects directory structure is definitely handy.  I'll try to get something working your way by this evening, tomorrow at the latest, for the db storage.

Brian

rstens

unread,
Feb 24, 2010, 11:24:04 PM2/24/10
to multi-mechanize
I like to suggest to wait writing to the database after the test has
run.
The intermediate results can be written to the fastest, lowest latency
target and after the test has finished it is uploaded in the database.

This will make sure that the solution stays scalable instead of
inserting the database latency into the game.
Granted database latencies will not show up for trivial tests but
anything beyond that and you'll see an impact.

Roland

On Feb 24, 1:02 pm, Brian Knox <taote...@gmail.com> wrote:
> Up and running on it now.  The new projects directory structure is
> definitely handy.  I'll try to get something working your way by this
> evening, tomorrow at the latest, for the db storage.
>
> Brian
>

Brian Knox

unread,
Feb 25, 2010, 9:40:05 AM2/25/10
to multi-m...@googlegroups.com
This is an excellent point.  I'm sending Corey my prototype this morning just so he can begin experimenting with it, when I get a little time later today I'll look at working up a version that loads the CSV data up after the tests are completed

Brian

Corey Goldberg

unread,
Feb 25, 2010, 10:30:46 AM2/25/10
to multi-m...@googlegroups.com
> I like to suggest to wait writing to the database after the test has
> run. The intermediate results can be written to the fastest, lowest latency
> target and after the test has finished it is uploaded in the database.

Roland,
that's a great point. All of the output is written asynchronously and
from a different OS process than the load generator, so doing db
writes during the test isn't a huge concern... But during large
tests, we would definitely take a hit in disk i/o and it would make
multi-mechanize less scalable.


> when I get a little time later today I'll look at working up a version
> that loads the CSV data up after the tests are completed

Brian,
that would be great. so basically, once the test completes, the db
would get populated by parsing the results.csv file and inserting all
of the data into a db. This can be done during the results analysis
phase, so we don't interfere with load generation by additional i/o.

I received your initial code and will start playing with it.

thanks!

-Corey

Brian Knox

unread,
Feb 25, 2010, 11:22:30 AM2/25/10
to multi-m...@googlegroups.com
Exactly my thought Corey.  I will take a look at moving the database population code from ResultsWriter to ReportWriter.  Which actually also opens up the possibility of storing the statistics calculated for the reports in the database as well.

Brian

Grig Gheorghiu

unread,
Feb 25, 2010, 11:28:00 AM2/25/10
to multi-m...@googlegroups.com
On Thu, Feb 25, 2010 at 7:30 AM, Corey Goldberg <cgol...@gmail.com> wrote:
>> I like to suggest to wait writing to the database after the test has
>> run. The intermediate results can be written to the fastest, lowest latency
>> target and after the test has finished it is uploaded in the database.
>
> Roland,
> that's a great point.  All of the output is written asynchronously and
> from a different OS process than the load generator, so doing db
> writes during the test isn't a huge concern...  But during large
> tests, we would definitely take a hit in disk i/o and it would make
> multi-mechanize less scalable.

My 2 cents: how about using memcached as an in-memory store for the
intermediate results? Then we flush it to a DB when the test run is
over. I've seen this done successfully in other projects.

Grig

Corey Goldberg

unread,
Feb 25, 2010, 11:37:48 AM2/25/10
to multi-m...@googlegroups.com
> My 2 cents: how about using memcached as an in-memory store for the
> intermediate results? Then we flush it to a DB when the test run is over

since results are already written to disk as the test runs (in csv),
we don't need to store all intermediate results in memory. I think it
makes more sense to decouple the db totally from a test run and have
it just populated as a post-processing step.

however, memcached is an interesting idea and would bring up the
possibility of not doing *any* disk i/o during the test. Everything
could be saved in memory and then later flushed to a db and the
results.csv file could be generated from memcached data also (or from
the db). This would alleviate all disk i/o, but add some additional
memory/processing overhead. I don't think we need to make a change
like this yet though.

thanks for the input,

-Corey

Brian Knox

unread,
Feb 25, 2010, 11:48:47 AM2/25/10
to multi-m...@googlegroups.com
So my thought as a multi-mechanize user is, if multi-mechanize were extended to allow things like memcached or redis for intermediate results, I personally would like to see those as options and not as requirements.

That being said I think it's an interesting idea, and my mind fills with visions of multi-mechanize client processes running across multiple servers all happily sending their results to a central memcache server *laughs*.

I just think there is great value in the simplicity of the tool and it's lack of infrastructure requirements.

For now my personal goal is to continue with the database code, trying to keep it as unobtrusive as possible (making it an option that does not depend on additional requirements unless you are using it), and to continue sending updates to Corey to incorporate or use as he wishes if he likes them.

Brian

Grig Gheorghiu

unread,
Feb 25, 2010, 11:59:41 AM2/25/10
to multi-m...@googlegroups.com
I agree that simple is good. Maybe have a back-end storage plugin that would default to storing CSV flat files on disk, but could replace the storage mechanism with a database, or memcached + database, or whatever the end user wants?

Grig

From: Brian Knox <taot...@gmail.com>
Date: Thu, 25 Feb 2010 11:48:47 -0500
Subject: Re: [multi-mech] Re: Feature: Database Storage of Results

Jeffrey Wong

unread,
Feb 25, 2010, 11:58:34 AM2/25/10
to multi-m...@googlegroups.com
Brian brings up an interesting point, being able to run multi-mechanize on multiple servers and having the results aggregated into one single report.  I can see this being useful when trying to test an application hosted in a production environment or on a cloud service.  A single load-generator may not be enough to fully saturate the application and would exhaust the single node's resources (cpu, memory, bandwidth) before reaching the full potential of the system under test.  Having multiple generating nodes would more closely resemble a real-world environment where the client resources are infinite.

Having these results aggregated into a single report, but possibly also broken down by node would be very useful.  The single report shows the performance of the system as a whole, while the individual reports could indicate geographic or network issues.

Jeffrey

rstens

unread,
Feb 25, 2010, 12:03:53 PM2/25/10
to multi-mechanize
I like this plugin suggestion.

BTW, unless memcached runs on your server, you will incur network
latency again small, but small things will bite when volume hits.
Network access is not a no-latency proposition.


On Feb 25, 8:59 am, "Grig Gheorghiu" <grig.gheorg...@gmail.com> wrote:
> I agree that simple is good. Maybe have a back-end storage plugin that would default to storing CSV flat files on disk, but could replace the storage mechanism with a database, or memcached + database, or whatever the end user wants?
>
> Grig
>

> -----Original Message-----
> From: Brian Knox <taote...@gmail.com>
> Date: Thu, 25 Feb 2010 11:48:47
> To: <multi-m...@googlegroups.com>
> Subject: Re: [multi-mech] Re: Feature: Database Storage of Results
>
> So my thought as a multi-mechanize user is, if multi-mechanize were extended
> to allow things like memcached or redis for intermediate results, I
> personally would like to see those as options and not as requirements.
>
> That being said I think it's an interesting idea, and my mind fills with
> visions of multi-mechanize client processes running across multiple servers
> all happily sending their results to a central memcache server *laughs*.
>
> I just think there is great value in the simplicity of the tool and it's
> lack of infrastructure requirements.
>
> For now my personal goal is to continue with the database code, trying to
> keep it as unobtrusive as possible (making it an option that does not depend
> on additional requirements unless you are using it), and to continue sending
> updates to Corey to incorporate or use as he wishes if he likes them.
>
> Brian
>
> On Thu, Feb 25, 2010 at 11:28 AM, Grig Gheorghiu

> <grig.gheorg...@gmail.com>wrote:
>
> > On Thu, Feb 25, 2010 at 7:30 AM, Corey Goldberg <cgoldb...@gmail.com>

Corey Goldberg

unread,
Feb 25, 2010, 12:27:17 PM2/25/10
to multi-m...@googlegroups.com
> So my thought as a multi-mechanize user is, if multi-mechanize were extended
> to allow things like memcached or redis for intermediate results, I
> personally would like to see those as options and not as requirements.

I absolutely agree. I want to keep external dependencies as minimal
as possible... so everything we are talking about here would be
optionally turned off so you can run multi-mechanize with just a
standard python setup (+ matplotlib). I see multi-mechanize working
"out of the box" just like it does now, with an optional switch you
can configure to also allow results to go to a db.


> For now my personal goal is to continue with the database code, trying to
> keep it as unobtrusive as possible (making it an option that does not depend
> on additional requirements unless you are using it), and to continue sending
> updates to Corey to incorporate or use as he wishes if he likes them.

sounds good. keep em coming :)

-Corey

Corey Goldberg

unread,
Feb 25, 2010, 12:32:24 PM2/25/10
to multi-m...@googlegroups.com
> Brian brings up an interesting point, being able to run multi-mechanize on
> multiple servers and having the results aggregated into one single report.

I also have some thoughts about turning it into a distributed system
that uses multiple load generating nodes. I don't wanna get too far
ahead of myself, so I haven't started working in that area yet. feel
free to start a new thread with ideas for how to get there.

-Corey

Reply all
Reply to author
Forward
0 new messages