H2 vs PostgreSQL

2,049 views
Skip to first unread message

Patrick Renaud

unread,
Jan 4, 2011, 10:35:29 AM1/4/11
to Repo and Gerrit Discussion
At the last GitTogether10 event, Shawn mentioned something that caught
my attention about H2. Shawn mentioned something like H2 was good for
playing around with Gerrit but that we should use MySQL or PostgreSQL
instead in a real production environment.

We started with Gerrit using H2 at first and so far we are still using
it. Since then our environment moved from test bed to production and
H2 does not appear to have caused issues (yet). But I am afraid it may
become an issue based on Shawn's comment and this is why I am sending
out this mail.

I really like the fact that H2 ships and runs embedded with Gerrit as
it greatly reduces the administrative overhead of managing an external
db server. But obviously I don't want to sacrifice the robustness,
reliability and scalability of our setup for simplicity reasons;
especially now that our Git user base will grow to 3000+ users very
quickly in Q1-Q2. Would it be possible, for someone who knows, to
drill down into Shawn's comment and explain why we should avoid H2 in
a real production environment? That would really help us.

Tx.

-Patrick
Ericsson

Shawn Pearce

unread,
Jan 4, 2011, 12:02:00 PM1/4/11
to Patrick Renaud, Repo and Gerrit Discussion

I have a few issues with H2 as a production database for Gerrit,
related to management of the database:

* No easy way to backup H2 while Gerrit is running. To make a backup,
a Gerrit admin needs to login over SSH and use 'gerrit gsql' to
execute an H2 database backup command[1], which constructs a ZIP file
holding the backup data. Restoring this backup is probably somewhat
difficult, you need an H2 console to an empty database, but Gerrit
doesn't exactly come with one that will run without creating a
database first. So restoration isn't well planned. :-(

* No warm-spare configuration. Setting up a warm-spare server to keep
a reasonably current copy of the metadata improves reliability of the
entire installation. MySQL and PostgreSQL both have warm-spare (or
better master/master) configurations that are documented, and some
companies even sell support to your DBA team to help make these
systems really work. The H2 warm-spare configuration *might* work if
you custom-configure the JDBC connection options in gerrit.config, but
I've never tried this.


Given the limitations of H2 [2]. you won't run into issues with the
Gerrit metadata storage. 2+ years of metadata for the Google Android
team is still only a few hundred MiBs of data in the database. H2
scales up to 4 TB.

However the 2 billion rows/table limit might be an issue with
patch_set_approvals table. Assuming 2 reviewers per change and 4
patch sets before submission, with the default Code Review and
Verified categories, that is about 16 rows/change. That is only
125,000 changes before you bump into this limit in H2.


[1] http://www.h2database.com/html/grammar.html#backup
[2] http://www.h2database.com/html/advanced.html#limits_limitations

Swindells, Thomas

unread,
Jan 4, 2011, 12:10:14 PM1/4/11
to repo-d...@googlegroups.com
> However the 2 billion rows/table limit might be an issue with
> patch_set_approvals table. Assuming 2 reviewers per change and 4
> patch sets before submission, with the default Code Review and
> Verified categories, that is about 16 rows/change. That is only
> 125,000 changes before you bump into this limit in H2.

I'm not sure your maths is right here - I think you've confused million and billion. I make it 125,000,000 changes which by my calculations gives you enough changes to have 1000 developers making 34 changes a day for 10 years ... which should be plenty!


**************************************************************************************
This message is confidential and intended only for the addressee. If you have received this message in error, please immediately notify the postm...@nds.com and delete it from your system as well as any copies. The content of e-mails as well as traffic data may be monitored by NDS for employment and security purposes. To protect the environment please do not print this e-mail unless necessary.

NDS Limited. Registered Office: One London Road, Staines, Middlesex, TW18 4EX, United Kingdom. A company registered in England and Wales. Registered no. 3080780. VAT no. GB 603 8808 40-00
**************************************************************************************

Shawn Pearce

unread,
Jan 4, 2011, 12:35:34 PM1/4/11
to Swindells, Thomas, repo-d...@googlegroups.com
On Tue, Jan 4, 2011 at 09:10, Swindells, Thomas <TSwin...@nds.com> wrote:
>> However the 2 billion rows/table limit might be an issue with
>> patch_set_approvals table.  Assuming 2 reviewers per change and 4
>> patch sets before submission, with the default Code Review and
>> Verified categories, that is about 16 rows/change.  That is only
>> 125,000 changes before you bump into this limit in H2.
>
> I'm not sure your maths is right here - I think you've confused million and billion. I make it 125,000,000 changes which by my calculations gives you enough changes to have 1000 developers making 34 changes a day for 10 years ... which should be plenty!

Uhm, yes. Thanks for that correction. :-)


I completely blame the bagel I was eating during breakfast while
drafting the above quoted email. It was large, and looked like a
zero. It used up 3 zeros from my estimate... :-)

Rob Heittman

unread,
Jan 4, 2011, 1:32:09 PM1/4/11
to Shawn Pearce, Swindells, Thomas, repo-d...@googlegroups.com
I'll report something horrid but pragmatic.  Snapshotting and restoring the filesystem containing a H2 database does seem to work fine, even with embedded applications running and connected to it.  We do this frequently, because we manage cloud systems where users run H2 databases, and there's no way to coordinate application shutdown with them.  While it is absolutely likely that the H2 files will be snapped in some inconsistent state, H2 is very good at auto-recovering this kind of inconsistency when it restarts.  Anecdotally, only once has an H2 database snapped in this way ever failed to auto-recover for us, and in that case the files were recovered OK with the command line recovery tool.  Caveat: the H2 databases we're dealing with here are in the smallish range -- biggest one is something like 2 GB.

So if "recovery" consists of rewinding the whole Gerrit system to a known working state in the recent past, you'll probably be able to do this with hot filesystem-based backups.  This "probably" is in the same category as your doctor telling you something is "probably not terminal," but -- for some folks, including me in a number of cases -- this actually is enough.  For example, with a couple dozen users sharing an inexpensive cloud instance, I wouldn't recommend going much further than this.  If you can coordinate a known clean, Gerrit-shut-down backup in non-working hours, and then have some hot snapshots going on during working hours, that may be sufficient paranoia.

For your use, with 3000 users, I would definitely say to seriously look at the other database alternatives as Shawn recommends.  Not because H2 will get outgrown, fall over, or suffer bad performance at scale ... but because I think only a more established database ecosystem will give you the enterprise backup and management features you really should have.  For a similar target load system we recently provisioned, we used PostgreSQL 9, which provides as many of these facilities as we could find in the FOSS world.

Tariq Mehmood

unread,
Aug 20, 2012, 9:28:15 AM8/20/12
to repo-d...@googlegroups.com
No easy way to backup H2 while Gerrit is running.  To make a backup,
a Gerrit admin needs to login over SSH and use 'gerrit gsql' to
execute an H2 database backup command[1], which constructs a ZIP file
holding the backup data.  Restoring this backup is probably somewhat
difficult, you need an H2 console to an empty database, but Gerrit
doesn't exactly come with one that will run without creating a
database first.  So restoration isn't well planned.  :-( 

Hi Shawn, 
Just wondering did this BACKUP TO 'backup.zip' ever worked for you?
If yes where it will store backup.zip file, can i make it backup.sql instead?
I tried to give it my local VM path but does not work........................
any suggestions are welcome

Thanks, 
Tariq

Tariq Mehmood

unread,
Aug 20, 2012, 9:29:01 AM8/20/12
to repo-d...@googlegroups.com

Chris Barrera

unread,
Aug 20, 2012, 10:34:45 AM8/20/12
to repo-d...@googlegroups.com


The "backup to" H2 command works reasonably but i recommend using the "drop" command which includes commands to drop and then rebuild tables and other DB meta-data from scratch, prior to data loads.

However, another way to do backups is to run the H2 database in "Automatic mixed server mode" which allows external H2 commands to do either binary "backup" or "script'ed" text backups while keeping gerrit running (handles lock issues). In fact, we use this to copy the database to a remote gerrit gerrit cache instance so that the remote has an up-to-date copy of the database for performance and DR purposes.

We use this method to
To do this we configure in gerrit.config:
[database]
        type = jdbc
        driver = org.h2.Driver
        url = "jdbc:h2:db/ReviewDB;AUTO_SERVER=TRUE"
        database = db/ReviewDB
[container]
        javaOptions = -Dh2.bindAddress=localhost

And then run:
java -cp h2-1.2.147.jar org.h2.tools.Script -url "jdbc:h2:/gerrit/db/ReviewDB;AUTO_SERVER=TRUE" -script backup.sql  -user \"\" -options drop

You can do the reverse using h2 "runscript" to load a running gerrit instance - live. Or can initialize an empty database.

Using this to copy databases on a 5 minute schedule we can use H2 as backend for gerrit cache servers, maintain "warm" DR copies of the databases, and don't have to migrate to mysql or postgres for the benefit of gerrit replication.



Rick Balwani

unread,
Feb 23, 2015, 2:49:49 PM2/23/15
to repo-d...@googlegroups.com
Hey Chris,

I am really interested in learning more about how you take live backup of gerrit. Have you written a script to accomplish this? Having a little trouble understanding your guidance here.

You help is really appreciated,

Rick
Reply all
Reply to author
Forward
0 new messages