*_versions cleanup

75 views
Skip to first unread message

Simon J Welsh

unread,
Nov 1, 2008, 5:15:06 AM11/1/08
to SilverStripe Development
As most of you are aware, the _versions tables can get really big.
Thus having a script which tidies up the tables would be nice.
Discussing this with Andrew in IRC (http://logs.silverstripe.com/index.php?date=2008-11-01#21_59
) we came up with two ways in which we could do this:

1) Time Machine style:
- Leave the last week as is
- have a daily, at most, for a month
- monthly for a year
- yearly after that

2) Based on amount. When there's 2n or so, merge the oldest n.

If necessary, I can create a basic patch for each type.
---
Simon Welsh
Admin of http://simon.geek.nz/

Who said Microsoft never created a bug-free program? The blue screen
never, ever crashes!

http://www.thinkgeek.com/brain/gimme.cgi?wid=81d520e5e

Michael Gall

unread,
Nov 1, 2008, 6:41:36 AM11/1/08
to silverst...@googlegroups.com
I looked at this, and the problem with a time only based solution is that you must leave atleast the latest version. I couldn't think of a good solution based only on sql so I didn't write the script :)


Michael
--
Checkout my new website: http://myachinghead.net
http://wakeless.net

Ingo Schommer

unread,
Nov 1, 2008, 7:01:11 AM11/1/08
to silverst...@googlegroups.com
Time-based granularities sounds good.
As we're basically modifying the most important data
in the system with this script, it would have to be fully unit tested
before we can let it anywhere near a release :)
As Michael stated, a pure SQL-solution isnt feasible, but
I think for this kind of script a PHP-solution will be more expressive
anyway.
Just keep in mind that this script will be most useful for big
tables, so keep an eye on memory consumption and performance.
For the technical underpinnings, it'd be good to have it as a task
which shows up in dev/tasks and is executable on the commandline
as well (e.g. through a cronjob). Have a look at BuildTask and
i18nTextCollectorTask
for example usage (on trunk).


-------
Ingo Schommer | Senior Developer
SilverStripe
http://silverstripe.com

Phone: +64 4 978 7330 ext 42
Skype: chillu23

Sigurd Magnusson

unread,
Nov 1, 2008, 2:18:45 PM11/1/08
to silverst...@googlegroups.com
Correct me if I'm wrong but large version tables provide no performance hit, and basically are more of a file-system disk space nuisance?
I think its useful to have copies going back over time and thus something more like (1) than (2) and they don't take up much space. 
I do like the idea of being able to keep on top of them, although I'd have thought the Page Views logs were more of a direct issue.

You may as well flush out un-published versions, keeping only published versions.
From experience on silverstripe.com I like to see everything over the past year, but older than that year by year feels fine.

I'd suggest doing something like the above, plus a separate action were you can just remove _all_ old versions. (Make sure they are locked down! )

Sig
Sigurd Magnusson
Sales and Marketing Director
SilverStripe

--
Help us win the global open source CMS awards, and win an iPod Touch!
--
 
Mobile: +64 21 42 12 08
Skype: sigurdmagnusson
 
Level 5, 97-99 Courtenay Place
Wellington, New Zealand

Michael Gall

unread,
Nov 1, 2008, 5:36:12 PM11/1/08
to silverst...@googlegroups.com
There's no actual performance hit, but shared hosting costs for storage and the size of backups starts blowing out of control on a bigger site.

Also, downloading or uploading the file to a live server can be a pain if they are huge, especially if you only have phpmyadmin access and not shell.

Cheers,

Michael
Reply all
Reply to author
Forward
0 new messages