Help needed: How to performance tune LevelDB?

IM46

unread,

Jul 7, 2011, 5:31:12 PM7/7/11

to leveldb

Hi everyone,

We've just gotten levelDB to compile and run successfully as a drop-in
replacement for our Berkeley DB implementation. We're seeing some
promising results (linear scaling with database size as opposed to the
rapid dropoffs past a certain point with BDB) that are encouraging us
to continue development. However, we are running into issues with
performance that I'm hoping someone can assist with.

Specifically, our test case inserts 10 million randomly generated keys
(uses CityHash64 to generate them) in sorted order (eg: for data
locality). There are no duplicates. Prior to each write, we call DB-
>Get() to see if the key exists (which it never does). The values are
1-byte strings consisting of a single dash ("-").

We're running off of a Windows port provided by Edouard A here:

http://groups.google.com/group/leveldb/browse_thread/thread/56e10264c9eca1be/5fa21b67781725b4

Some other details:

* Block size is set to 64k.
* The block_cache is set to 256Mb.
* x86 release mode build with full optimizations
* Using TCMalloc (gained a 50% increase in performance from using it)
* Not using TBB (had poor performance) and instead replaced with
native Windows locking functions.

The 10M inserts insert at about 42,000 rows/second. Our BerkeleyDB
implementation running the same test case on the same machine runs at
a peak of 130,000 rows/second and dropping to around 65,000 rows/
second over the same set of test conditions as described here.

It doesn't seem that the machine is being taxed very much at all, so I
suspect something subtle is happening here. Maybe there are some other
settings we should try?

One other option we have at our disposal is to perform a full table
scan on the LevelDB and merge sort with our existing records. I'm
hoping not to go there though as I'm worried about scalability
dropping off at larger database sizes.

Jeff Dean

unread,

Jul 7, 2011, 5:55:07 PM7/7/11

to lev...@googlegroups.com

Can you batch things up so that you test, say, 1000 keys using Get
operations, and then commit a WriteBatch with 1000 keys at a time? I
suspect that will be somewhat faster (comparing the 'fillseq' vs.
'fillbatch' performance in the ./db_bench benchmark with one-byte
values shows it to be about 20% faster on my machine).

It might also be interesting to compare the performance you're getting
with the performance of just doing the 10M inserts without the
verifying Get operations.

Up to fairly large database sizes, it's probably going to be faster to
do a sequential scan over ranges of the database rather than 10M
individual Get operations. For example, using db_bench, I simulated
this approach via this command:

% ./db_bench --value_size=1 --cache_size=256000000 --num=10000000
--benchmarks=fillseq,readseq,readrandom
LevelDB: version 1.2
Date: Thu Jul 7 14:44:19 2011
CPU: 4 * Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz
CPUCache: 4096 KB
Keys: 16 bytes each
Values: 1 bytes each (1 bytes after compression)
Entries: 10000000
RawSize: 162.1 MB (estimated)
FileSize: 157.4 MB (estimated)
------------------------------------------------
fillseq : 1.248 micros/op; 13.0 MB/s
readseq : 0.277 micros/op; 58.5 MB/s
readrandom : 4.083 micros/op;

From this, you can see that the readrandom operations (each one
essentially a Get operation) on a database of 100M 16-byte keys with
1-byte values takes about 4.083 microseconds/key, while iterating
sequentially over the whole database can be done in 0.277
microseconds/key. So, if your database is less than ~15X times the
size of the new set of keys you're inserting, you're better off doing
a sequential scan rather than individual key Get operations. Note
that you don't have to do a full scan over the database: you should be
able to create an Iterator on the database, and then process the keys
in the iterator in parallel with the new keys you want to add to the
database.

Please report back here if any of these suggestions make a difference.

-Jeff

IM46

unread,

Jul 7, 2011, 6:05:41 PM7/7/11

to leveldb

Hi Jeff,

Good suggestions. Let me address your latter one first. Our database
is intended to hold about 1 billion keys and a typical insert will
consist of around 10 million rows. As your example indicated, we found
sequential scan to be much faster with smaller database sizes.
However, when we hit around 200 million keys performance quickly
degraded (to the point where we couldn't use it). This is what is
pushing us to prefer the individual seek model.

Also, we're working in a simulated testing environment which is
intended to be "worst case". By that, I mean the inserts are randomly
distributed throughout the keyspace with no duplicates (although they
are pre-sorted). In a real environment, I would expect around 85%
duplication so there will be a nice performance gain from that. I do
like the suggestion of iterating through chunks of the keyspace but I
don't think it will work in a real world scenario. We are using
CityHash64 as a hashing function and it appears that it does a great
job of distributing keys randomly throughout the keyspace.

Back to your first suggestion. I got a nice boost in performance by
increasing the block size to 256k (from 64k). This got me to around
54,000 rows/second. Then I commented out the Get() operation and
throughput increased to 130,000 rows/second.

-Rich

On Jul 7, 4:55 pm, Jeff Dean <j...@google.com> wrote:

> On Thu, Jul 7, 2011 at 2:31 PM, IM46 <internetmarkete...@gmail.com> wrote:
> > Hi everyone,
>
> > We've just gotten levelDB to compile and run successfully as a drop-in
> > replacement for our Berkeley DB implementation. We're seeing some
> > promising results (linear scaling with database size as opposed to the
> > rapid dropoffs past a certain point with BDB) that are encouraging us
> > to continue development. However, we are running into issues with
> > performance that I'm hoping someone can assist with.
>
> > Specifically, our test case inserts 10 million randomly generated keys
> > (uses CityHash64 to generate them) in sorted order (eg: for data
> > locality). There are no duplicates. Prior to each write, we call DB-
> >>Get() to see if the key exists (which it never does). The values are
> > 1-byte strings consisting of a single dash ("-").
>
> > We're running off of a Windows port provided by Edouard A here:
>

> >http://groups.google.com/group/leveldb/browse_thread/thread/56e10264c...

IM46

unread,

Jul 7, 2011, 6:16:57 PM7/7/11

to leveldb

One more comment. I was under the assumption that because LevelDB
batches writes, I wouldn't need to... hence I am currently inserting
rows into LevelDB one at a time. Is this not the case? If not, I could
easily batch them up as this is exactly what we were doing with
BerkeleyDB.

As a test, I put the Get() call back in and commented out the Put()
call. Throughput jumped slightly to 65,480 rows/sec (21%) which seems
to be in line with your 20% increase from batching.

I'll see if I can get db_bench built and see what it comes back with.

Paul Davis

unread,

Jul 7, 2011, 6:20:41 PM7/7/11

to lev...@googlegroups.com

On Thu, Jul 7, 2011 at 6:05 PM, IM46 <internetm...@gmail.com> wrote:
> Hi Jeff,
>
> Good suggestions. Let me address your latter one first. Our database
> is intended to hold about 1 billion keys and a typical insert will
> consist of around 10 million rows. As your example indicated, we found
> sequential scan to be much faster with smaller database sizes.
> However, when we hit around 200 million keys performance quickly
> degraded (to the point where we couldn't use it). This is what is
> pushing us to prefer the individual seek model.
>
> Also, we're working in a simulated testing environment which is
> intended to be "worst case". By that, I mean the inserts are randomly
> distributed throughout the keyspace with no duplicates (although they
> are pre-sorted). In a real environment, I would expect around 85%
> duplication so there will be a nice performance gain from that. I do
> like the suggestion of iterating through chunks of the keyspace but I
> don't think it will work in a real world scenario. We are using
> CityHash64 as a hashing function and it appears that it does a great
> job of distributing keys randomly throughout the keyspace.
>
> Back to your first suggestion. I got a nice boost in performance by
> increasing the block size to 256k (from 64k). This got me to around
> 54,000 rows/second. Then I commented out the Get() operation and
> throughput increased to 130,000 rows/second.
>
> -Rich
>

There's also an optimization to check if your key exists by using an
iterator instead of the call to Get(). Here's an example of it in use:

https://github.com/basho/eleveldb/blob/master/c_src/eleveldb.cc#L272

Not sure how much of an impact that overhead has on your case with
such small values, but it might an acceptable middle ground if you do
need the existence check.

IM46

unread,

Jul 7, 2011, 7:40:55 PM7/7/11

to leveldb

Paul,

This was a very good tip. It resulted in another 20% performance
improvement (and I suspect less memory thrash as well which is always
a good thing!).

Thank you!

On Jul 7, 5:20 pm, Paul Davis <paul.joseph.da...@gmail.com> wrote:

Paul Davis

unread,

Jul 7, 2011, 7:46:38 PM7/7/11

to lev...@googlegroups.com

On Thu, Jul 7, 2011 at 7:40 PM, IM46 <internetm...@gmail.com> wrote:
> Paul,
>
> This was a very good tip. It resulted in another 20% performance
> improvement (and I suspect less memory thrash as well which is always
> a good thing!).
>
> Thank you!
>

Thank Dave Smith, that's who I stole it from. :D

IM46

unread,

Jul 7, 2011, 10:03:59 PM7/7/11

to leveldb

Thanks, Dave Smith. ;-)

After this fix and installing Snappy, we're up to about 72,000 rows/
sec (more than double from this morning).

Does anyone know if there's a way to check the cache hit rate? We had
some endian issues with BDB and I'm wondering if perhaps we're looking
keys up in reverse order now. The reason I ask is that after
installing Snappy, I noticed that the library tended to load a block
from disk on every lookup. That doesn't seem right...

On Jul 7, 6:46 pm, Paul Davis <paul.joseph.da...@gmail.com> wrote:

IM46

unread,

Jul 8, 2011, 12:40:38 PM7/8/11

to leveldb

Just wanted to let everyone know that I've completed the first set of
benchmarks and LevelDB solidly outperforms Berkeley DB for larger
databases. I've implemented Snappy, TCMalloc, batch writes, and the
Iterator fix mentioned above.

Thanks for your help!

David Yu

unread,

Jul 8, 2011, 12:47:21 PM7/8/11

to lev...@googlegroups.com

On Sat, Jul 9, 2011 at 12:40 AM, IM46 <internetm...@gmail.com> wrote:

Just wanted to let everyone know that I've completed the first set of
benchmarks and LevelDB solidly outperforms Berkeley DB for larger
databases. I've implemented Snappy, TCMalloc, batch writes, and the
Iterator fix mentioned above.

Thanks for your help!

Nice work. You got it done under 24hrs.

Would be nice if you can summarize your experiences in a blog :-)

--
When the cat is away, the mouse is alone.
- David Yu

Ehsan

unread,

Jul 14, 2011, 4:42:39 AM7/14/11

to leveldb

Hi,
Just out of curiosity, have you tried with larger value sizes (1K
or above) and larger number of records (20+ Million). I have done some
performance tests to compare Sleepycat with levelDB, and in my results
I found levelDB out performing Sleepycat but for smaller value sizes
(less than 256 bytes). For larger values levelDB reads were very poor
as compared to Sleepycat, although writes were fine.
In my tests I have inserted a large number of records (20-30
Million) with fillsync flag on, and reading the same records in a
random order. I have tried increasing the cache and buffer sizes but
the performance for reading did not improve.

Has anybody else tried these kinds of performance tests?

-Ehsan

On Jul 8, 6:47 pm, David Yu <david.yu....@gmail.com> wrote:

Michi Mutsuzaki

unread,

Aug 10, 2011, 4:42:55 PM8/10/11

to lev...@googlegroups.com

Hi Ehsan,

I did a leveldb/mysql comparison with 10 million 4KB records. You can see my benchmark result here:

https://github.com/m1ch1/mapkeeper/wiki/MySQL-vs.-LevelDB

Let me know if you have any questions/comments.

Thanks!

--Michi

corncanon

unread,

Aug 20, 2011, 12:33:34 PM8/20/11

to lev...@googlegroups.com

Hi Michi,

Have you compare the size of db files between LevelDB and MySQL after writing 10 million 4KB records? With and without the compress option. (well, it will be data pattern related to some extent)

In a test between LevelDB and a BDB like db, each writed 100 million records, LevelDB file size is about half of the compared db's.

Reply all

Reply to author

Forward