I created a simple class, ShortcutNumberMeasurement.java in src/test/java/athena/. Its goal is to examine the impact of various shortcut limits on performance.
In this run I created 100,000 each with about 10 random tags, 100 queries (with a maximum depth of 7). For each shortcut value I cycle through the queries twice.
Note that because the queries are randomly generated, this is a rather nasty situation for Athena, since one imagines that in most real-world situations queries will tend to have common sub-parts that Athena can take advantage of.
The columns are:
shortct: The number of shortcuts, 0 means essentially it will always do an exhaustive scan.
results: The total number of items found (should be the same every time, and it is)
tests: The number of elements that had to be tested to return these results
ms: The time in milliseconds for each test run
shortct results tests ms
0 20,803 100,000 65,367.6
1 20,803 81,600.6 40,356.6
3 20,803 69,737.3 38,774.9
7 20,803 51,754.5 26,307
15 20,803 45,138.8 27,042.9
31 20,803 44,998.1 28,575
63 20,803 44,489.6 29,776.1
As can be seen, the number of tests drops as the number of shortcuts increases, as we would expect, however the benefit tails off once we hit around 15 shortcuts. Interestingly the actual time required starts to increase somewhere between 7 and 15 shortcuts.
Unfortunately this test is very artificial. Next I want to create a test that tries to mirror more real-world usage, such as by simulating how an ad network would need to select ads appropriate to particular publishers and visitors.
Ian.
--
Ian Clarke
CEO, SenseArray
Email:
i...@sensearray.comPh:
+1 512 422 3588