Nimrod Defrag time looks to be based on DB size :(

7 views

Skip to first unread message

ad...@metabroadcast.com

unread,

Apr 24, 2013, 9:31:10 AM4/24/13

to nimro...@googlegroups.com

Hi Sergio,

I've done a quick analysis of the amount of time spent on a defag being performed with the new method of defragging every x inserts (default being 1000000). While this is nice and predictable (we see a defrag event on average every 60 minutes), the amount of time spent during the defrag process tends to increase with each subsequent defrag. This is not particularly practical in a production environment, as within a week we would be dropping significant numbers of log records each time a defrag runs:

./defragtime.py

1: 26.0 seconds

2: 35.0 seconds

3: 35.0 seconds

4: 40.0 seconds

5: 47.0 seconds

6: 53.0 seconds

7: 62.0 seconds

8: 54.0 seconds

9: 103.0 seconds

10: 170.0 seconds

11: 238.0 seconds

12: 260.0 seconds

13: 287.0 seconds

14: 348.0 seconds

15: 344.0 seconds

16: 441.0 seconds

17: 387.0 seconds

18: 407.0 seconds

19: 425.0 seconds

20: 480.0 seconds

21: 522.0 seconds

22: 540.0 seconds

23: 558.0 seconds

24: 561.0 seconds

25: 580.0 seconds

26: 590.0 seconds

27: 611.0 seconds

28: 632.0 seconds

29: 646.0 seconds

30: 653.0 seconds

31: 681.0 seconds

32: 672.0 seconds

33: 699.0 seconds

34: 727.0 seconds

35: 733.0 seconds

36: 905.0 seconds

37: 778.0 seconds

38: 872.0 seconds

39: 859.0 seconds

40: 877.0 seconds

41: 876.0 seconds

42: 915.0 seconds

43: 940.0 seconds

44: 945.0 seconds

45: 960.0 seconds

This is after running from a fresh database for 2 days. While increasing the number of inserts before defragging will cut down on the frequency of the tasks, the time spent on the defrag action will be longer, and still increase over time. This looks to be a limitation of the HSQLDB backend's defrag implementation. Have you considered adapting Nimrod for other storage engines, like Cassandra?

Regards,

Adam Horwich

Sergio Bossa

unread,

Apr 25, 2013, 5:18:09 AM4/25/13

to nimro...@googlegroups.com

On Wed, Apr 24, 2013 at 2:31 PM, <ad...@metabroadcast.com> wrote:

While increasing the number of inserts before defragging will cut down on the frequency of the tasks, the time spent on the defrag action will be longer, and still increase over time. This looks to be a limitation of the HSQLDB backend's defrag implementation.

That sounds bad actually, I'll have a look and see if I can improve things.

Have you considered adapting Nimrod for other storage engines, like Cassandra?

Yes, that would be an option, but implementing aggregates on top of a distributed database requires some thinking (and wouldn't be as flexible and fast as with a local-only database obviously).

I'll have a think and come back to you :)

--
Sergio Bossa
http://www.linkedin.com/in/sergiob

Reply all

Reply to author

Forward

0 new messages