Archiving old reviews

395 views
Skip to first unread message

Joshua Olson

unread,
Jun 19, 2018, 2:35:20 AM6/19/18
to Review Board Development
Hi RB devs,

Our database has a over a million reviews and more than half of them are years old.  Is there a clean way of archiving them to a file? I.e. getting a dump of those rows and then deleting them from the DB with full dumps of tables that don't change often like repositories?

Cheers,
Josh

Christian Hammond

unread,
Jun 19, 2018, 3:18:28 AM6/19/18
to reviewb...@googlegroups.com
Hi Joshua,

No, there's no way to do this. Review Board expects data to remain in the database, and things will go wrong if data is suddenly missing.

Are you mainly concerned about storage requirements, or are there other issues you're seeing?

Christian

--
Christian Hammond
President/CEO of Beanbag
Makers of Review Board


--

---
You received this message because you are subscribed to the Google Groups "Review Board Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to reviewboard-d...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Joshua Olson

unread,
Jun 20, 2018, 2:40:38 PM6/20/18
to reviewb...@googlegroups.com
No issues, but our DB is 110 GB with ~100GB of that being in the diffs tables. 
You received this message because you are subscribed to a topic in the Google Groups "Review Board Development" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/reviewboard-dev/OuvGxtmhPsI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to reviewboard-d...@googlegroups.com.

Christian Hammond

unread,
Jun 20, 2018, 3:21:46 PM6/20/18
to reviewb...@googlegroups.com
Remind me what version of Review Board you’re using? 3.0’s diff condensing support will do a lot to reduce storage (typically shrinks databases by 70-80% in our tests). Condensing happens when viewing older diffs it or when running:

    rb-site manage /path/to/sitedir condensediffs

This will take a while, and it’s not worth doing on pre-3.0 versions if you’re otherwise going to upgrade.

Christian

Joshua Olson

unread,
Jun 20, 2018, 4:12:45 PM6/20/18
to reviewb...@googlegroups.com
The other reason we’d like to be able to archive reviews is that whole repos (we have 10k+ of them) are “deleted”/archived regularly so there’s no need to keep the reviews for them around and we’d like to move them to an archive rb setup or the like. 

If we were careful about all of the tables and FKs is there any reason why doing that would be an issue?  For example removing all of the review_requests and all associated data for a single repo.  We’ve looked and it seems like an intricate, but doable process. 

-Josh

Christian Hammond

unread,
Jun 21, 2018, 3:13:31 AM6/21/18
to reviewb...@googlegroups.com
There's various bits of data we store in internal fields for references and stats that can matter. It's doable, but I can't give you a definitive "it won't break" answer for your setup, since the product doesn't expect to have data pulled out from underneath it. We do delete review request data when a team account is cancelled on RBCommons, but those review requests are distinctly partitioned into Local Sites, so it's a little different. Plus we have our own logic handling parts of this process.

I'd say if you're going to go ahead and attempt deletion, make sure you're keeping long-term backups, in case things go wrong. I know your install is large and many people depend on it, so for liability reasons, it's just not something I'm able to encourage.

The biggest amount of data in the database is the diff data, and our diff de-duplicating/compression in 3.0 should really do a *lot* to address this. It will combine multiple entries for the same diff content into a single entry, and compress it with bzip2, reducing the size greatly.

Deleting older review requests won't reduce the size of the diff data at all, so that's also worth mentioning.

Christian

Joshua Olson

unread,
Jun 21, 2018, 7:43:53 PM6/21/18
to reviewb...@googlegroups.com
Thanks for the feedback. And I’ll check what the size of our staging server (just a prod dump from a couple months ago) is at since it’s on RB 3. And that’s where we’ll be doing the archiving experiments. 

-Josh
Reply all
Reply to author
Forward
0 new messages