Benchmarking

5 views
Skip to first unread message

Ian Clarke

unread,
Jun 27, 2010, 10:06:19 AM6/27/10
to athena-...@googlegroups.com
I created a simple class, ShortcutNumberMeasurement.java in src/test/java/athena/.  Its goal is to examine the impact of various shortcut limits on performance.

In this run I created 100,000 each with about 10 random tags, 100 queries (with a maximum depth of 7).  For each shortcut value I cycle through the queries twice.

Note that because the queries are randomly generated, this is a rather nasty situation for Athena, since one imagines that in most real-world situations queries will tend to have common sub-parts that Athena can take advantage of.

The columns are:

shortct: The number of shortcuts, 0 means essentially it will always do an exhaustive scan.
results: The total number of items found (should be the same every time, and it is)
tests: The number of elements that had to be tested to return these results
ms: The time in milliseconds for each test run

shortct results tests ms
0 20,803     100,000 65,367.6
1 20,803     81,600.6 40,356.6
3 20,803     69,737.3 38,774.9
7 20,803     51,754.5 26,307
15 20,803     45,138.8 27,042.9
31 20,803     44,998.1 28,575
63 20,803      44,489.6 29,776.1

As can be seen, the number of tests drops as the number of shortcuts increases, as we would expect, however the benefit tails off once we hit around 15 shortcuts.  Interestingly the actual time required starts to increase somewhere between 7 and 15 shortcuts.

Unfortunately this test is very artificial.  Next I want to create a test that tries to mirror more real-world usage, such as by simulating how an ad network would need to select ads appropriate to particular publishers and visitors.

Ian.

--
Ian Clarke
CEO, SenseArray
Email: i...@sensearray.com
Ph: +1 512 422 3588

Ian Clarke

unread,
Jun 27, 2010, 10:11:32 AM6/27/10
to athena-...@googlegroups.com
Doh, formatting may be screwed up on that table of results, here are the results in a Gist:


Ian.

Kushal Pisavadia

unread,
Jun 29, 2010, 11:49:32 AM6/29/10
to athena-...@googlegroups.com
Okay, so I've had a look at the benchmarks.

It looks interesting. I can see why you've looked at the 7 shortcut mark. Maybe we have a generalised config file where users can control this limit for efficiency?

Also, as the benchmark isn't exactly a unit test, shouldn't it go into a contrib folder or somewhere separate so it doesn't get confused with the Maven unit tests?

Kind Regards,

Kushal Pisavadia
E: kus...@gmail.com

Ian Clarke

unread,
Jun 29, 2010, 2:42:27 PM6/29/10
to athena-...@googlegroups.com
On Tue, Jun 29, 2010 at 10:49 AM, Kushal Pisavadia <kus...@gmail.com> wrote:
It looks interesting. I can see why you've looked at the 7 shortcut mark. Maybe we have a generalised config file where users can control this limit for efficiency?

Yes, although I think first we need a better understanding of how this limit impacts performance.
 
Also, as the benchmark isn't exactly a unit test, shouldn't it go into a contrib folder or somewhere separate so it doesn't get confused with the Maven unit tests?

Good point, I've moved them into athena.benchmarks package.

Ian.
 
Reply all
Reply to author
Forward
0 new messages