Benchmarking

Ian Clarke

unread,

Jun 27, 2010, 10:06:19 AM6/27/10

to athena-...@googlegroups.com

I created a simple class, ShortcutNumberMeasurement.java in src/test/java/athena/. Its goal is to examine the impact of various shortcut limits on performance.

In this run I created 100,000 each with about 10 random tags, 100 queries (with a maximum depth of 7). For each shortcut value I cycle through the queries twice.

Note that because the queries are randomly generated, this is a rather nasty situation for Athena, since one imagines that in most real-world situations queries will tend to have common sub-parts that Athena can take advantage of.

The columns are:

shortct: The number of shortcuts, 0 means essentially it will always do an exhaustive scan.

results: The total number of items found (should be the same every time, and it is)

tests: The number of elements that had to be tested to return these results

ms: The time in milliseconds for each test run

shortct results tests ms

0 20,803 100,000 65,367.6

1 20,803 81,600.6 40,356.6

3 20,803 69,737.3 38,774.9

7 20,803 51,754.5 26,307

15 20,803 45,138.8 27,042.9

31 20,803 44,998.1 28,575

63 20,803 44,489.6 29,776.1

As can be seen, the number of tests drops as the number of shortcuts increases, as we would expect, however the benefit tails off once we hit around 15 shortcuts. Interestingly the actual time required starts to increase somewhere between 7 and 15 shortcuts.

Unfortunately this test is very artificial. Next I want to create a test that tries to mirror more real-world usage, such as by simulating how an ad network would need to select ads appropriate to particular publishers and visitors.

Ian.

--
Ian Clarke
CEO, SenseArray
Email: i...@sensearray.com
Ph: +1 512 422 3588

Ian Clarke

unread,

Jun 27, 2010, 10:11:32 AM6/27/10

to athena-...@googlegroups.com

Doh, formatting may be screwed up on that table of results, here are the results in a Gist:

http://gist.github.com/454939

Ian.

Kushal Pisavadia

unread,

Jun 29, 2010, 11:49:32 AM6/29/10

to athena-...@googlegroups.com

Okay, so I've had a look at the benchmarks.

It looks interesting. I can see why you've looked at the 7 shortcut mark. Maybe we have a generalised config file where users can control this limit for efficiency?

Also, as the benchmark isn't exactly a unit test, shouldn't it go into a contrib folder or somewhere separate so it doesn't get confused with the Maven unit tests?

Kind Regards,

Kushal Pisavadia
E: kus...@gmail.com

Ian Clarke

unread,

Jun 29, 2010, 2:42:27 PM6/29/10

to athena-...@googlegroups.com

On Tue, Jun 29, 2010 at 10:49 AM, Kushal Pisavadia <kus...@gmail.com> wrote:

It looks interesting. I can see why you've looked at the 7 shortcut mark. Maybe we have a generalised config file where users can control this limit for efficiency?

Yes, although I think first we need a better understanding of how this limit impacts performance.

Also, as the benchmark isn't exactly a unit test, shouldn't it go into a contrib folder or somewhere separate so it doesn't get confused with the Maven unit tests?