Nimrod Defrag time looks to be based on DB size :(

4 views
Skip to first unread message

ad...@metabroadcast.com

unread,
Apr 24, 2013, 9:31:10 AM4/24/13
to nimro...@googlegroups.com
Hi Sergio,

I've done a quick analysis of the amount of time spent on a defag being performed with the new method of defragging every x inserts (default being 1000000). While this is nice and predictable (we see a defrag event on average every 60 minutes), the amount of time spent during the defrag process tends to increase with each subsequent defrag. This is not particularly practical in a production environment, as within a week we would be dropping significant numbers of log records each time a defrag runs:

./defragtime.py 
1: 26.0 seconds
2: 35.0 seconds
3: 35.0 seconds
4: 40.0 seconds
5: 47.0 seconds
6: 53.0 seconds
7: 62.0 seconds
8: 54.0 seconds
9: 103.0 seconds
10: 170.0 seconds
11: 238.0 seconds
12: 260.0 seconds
13: 287.0 seconds
14: 348.0 seconds
15: 344.0 seconds
16: 441.0 seconds
17: 387.0 seconds
18: 407.0 seconds
19: 425.0 seconds
20: 480.0 seconds
21: 522.0 seconds
22: 540.0 seconds
23: 558.0 seconds
24: 561.0 seconds
25: 580.0 seconds
26: 590.0 seconds
27: 611.0 seconds
28: 632.0 seconds
29: 646.0 seconds
30: 653.0 seconds
31: 681.0 seconds
32: 672.0 seconds
33: 699.0 seconds
34: 727.0 seconds
35: 733.0 seconds
36: 905.0 seconds
37: 778.0 seconds
38: 872.0 seconds
39: 859.0 seconds
40: 877.0 seconds
41: 876.0 seconds
42: 915.0 seconds
43: 940.0 seconds
44: 945.0 seconds
45: 960.0 seconds

This is after running from a fresh database for 2 days. While increasing the number of inserts before defragging will cut down on the frequency of the tasks, the time spent on the defrag action will be longer, and still increase over time. This looks to be a limitation of the HSQLDB backend's defrag implementation. Have you considered adapting Nimrod for other storage engines, like Cassandra?

Regards,

Adam Horwich

Sergio Bossa

unread,
Apr 25, 2013, 5:18:09 AM4/25/13
to nimro...@googlegroups.com
On Wed, Apr 24, 2013 at 2:31 PM, <ad...@metabroadcast.com> wrote:
 
While increasing the number of inserts before defragging will cut down on the frequency of the tasks, the time spent on the defrag action will be longer, and still increase over time. This looks to be a limitation of the HSQLDB backend's defrag implementation.

That sounds bad actually, I'll have a look and see if I can improve things.
 
Have you considered adapting Nimrod for other storage engines, like Cassandra?
 
Yes, that would be an option, but implementing aggregates on top of a distributed database requires some thinking (and wouldn't be as flexible and fast as with a local-only database obviously).

I'll have a think and come back to you :) 

--
Sergio Bossa
http://www.linkedin.com/in/sergiob
Reply all
Reply to author
Forward
0 new messages