Database Microbenchmarks

833 views
Skip to first unread message

Howard Chu

unread,
Jul 18, 2012, 9:30:53 AM7/18/12
to lev...@googlegroups.com
Fyi, comparisons of Google LevelDB 1.5, SQLite 3.7.7.1, Kyoto Cabinet 1.2.76,
BerkeleyDB 5.3.21, and OpenLDAP MDB are available along with source code of
the modified benchmark programs and raw output from the tests.

-------- Original Message --------
Subject: Re: MDB microbenchmark
Date: Wed, 18 Jul 2012 06:17:13 -0700
From: Howard Chu <h...@symas.com>
To: OpenLDA...@openldap.org <OpenLDA...@openldap.org>

Howard Chu wrote:
> Howard Chu wrote:
>> Was reading thru Google's leveldb stuff and found their benchmark page
>>
>> http://leveldb.googlecode.com/svn/trunk/doc/benchmark.html
> 
>> I haven't duplicated all of the test scenarios described on the web page yet,
>> you can do that yourself with the attached code. It's pretty clear that
>> nothing else even begins to approach MDB's read speed.
> 
> The results for large data values are even more dramatic:
I've expanded the tests, added BerkeleyDB 5.3.21 to the mix, and summarized
the results here:
   http://highlandsun.com/hyc/mdb/microbench/

-- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

Howard Chu

unread,
Jul 20, 2012, 7:17:49 AM7/20/12
to lev...@googlegroups.com


On Wednesday, July 18, 2012 6:30:53 AM UTC-7, Howard Chu wrote:
Fyi, comparisons of Google LevelDB 1.5, SQLite 3.7.7.1, Kyoto Cabinet 1.2.76,
BerkeleyDB 5.3.21, and OpenLDAP MDB are available along with source code of
the modified benchmark programs and raw output from the tests.

By the way, examining the raw output shows a discrepancy in the database sizes for the Random Write tests. Most likely because the random number generator isn't returning completely unique keys, resulting in fewer than the specified number of records being stored. The benchmark program ought to be using a shuffle approach instead, to guarantee that the specified number of records get stored. E.g. http://benpfaff.org/writings/clc/shuffle.html

I'd also consider printing the random number seed, and allowing it to be passed as a command line argument, so that the same random ordering is used by each test invocation. Testing is all about getting repeatable results, currently you have no way to verify that the Random Write tests do anything consistent.

And also, should consider just running the Synch write tests for the same duration as all the others. It may take more time overall, but right now the resulting numbers are Apples and Oranges. The Btrees are much faster at small DB sizes and slow down as they grow, and this test result just can't be usefully compared to the other numbers.

I'll probably rerun with these changes in a couple days.

Howard Chu

unread,
Jul 31, 2012, 1:13:38 PM7/31/12
to lev...@googlegroups.com
Another update - http://highlandsun.com/hyc/mdb/microbench/MDB-fs.ods is an
OpenOffice spreadsheet tabulating the results from running the benchmarks
across many different filesystems. You can compare btrfs, ext2, ext3, ext4,
jfs, ntfs, reiserfs, xfs, and zfs to see which is best for the database
workloads being tested. In addition, ext3, ext4, jfs, reiserfs, and xfs are
tested in a 2nd configuration, with the journal stored on a tmpfs device, to
show how much overhead the filesystem's journaling mechanism imposes.

The hard drive used is the same as in the main benchmark document, attached
via eSATA to my laptop. The filesystems were created fresh for each test. The
tests are only run once each due to the great length of time needed to collect
all of the data. (It takes several minutes just to run mkfs for some of these
filesystems...) You will probably want to toggle through the tests in cell B13
of the spreadsheet to get the best view of the results.

With this drive, jfs with an external journal is the clear winner when you
need fully synchronous transactions. If you can tolerate some degree of asynch
operation, plain old ext2 is still the fastest for writes.

MDB read speed is largely independent of FS type. I believe any variation in
the reported speeds here is just measurement noise.


On Wednesday, July 18, 2012 6:30:53 AM UTC-7, Howard Chu wrote:

Howard Chu

unread,
Aug 3, 2012, 9:59:30 AM8/3/12
to lev...@googlegroups.com
I just noticed something troubling about the Batched Sequential Write performance with large values. You can see it in Section 4 of my results:
http://highlandsun.com/hyc/mdb/microbench/#sec4

The non-batched results in 4C are faster than the batched results in 4E for LevelDB, MDB, and SQLite3. valgrind/callgrind on the MDB test shows that 80% of the time is spent in free(), freeing the write buffers of the large items. Switching between regular libc malloc and tcmalloc didn't change the outcome. I also tried with different value sizes; the threshold seems to be between 15 and 16K. Below that threshold, batched writes are still faster than non-batched.


On Wednesday, July 18, 2012 6:30:53 AM UTC-7, Howard Chu wrote:
Reply all
Reply to author
Forward
0 new messages