Various GUID Schemes Insert Performance

205 views
Skip to first unread message

Mark Lewis

unread,
Sep 28, 2011, 8:00:46 PM9/28/11
to mongod...@googlegroups.com
(executive summary: see pretty graph at http://i.imgur.com/clm9D.png)

I was recently trying to decide which GUID implementation to use in Mongo for one of our Java projects.

In the past, we have used RFC-4122 random UUID values (i.e. what you get from java.util.UUID.randomUUID()).  I know that ObjectId is recommended for best performance, but using UUIDs keeps Mongo-specific bits out of our business layer and plays nicely with other infrastructure components we've got.

But this new project could have a lot more data than our previous Mongo projects, and I wondered how bad insert performance would get with our random UUIDs, once the index size gets too big to fit in memory.

So I wrote a benchmark and ran it on a somewhat memory-constrained system to see how insert performance scales with collection size.  

Some details about the benchmark:

Insert 10 million documents, each with only an _id field.  The test machine is running 2.0 64-bit on a Fedora 15 VM with 4 cores and 512M of memory.  Used the Java driver to drive the test.

I used the JUG utility (http://wiki.fasterxml.com/JugHome) to generate both type 1 (timestamp) and type 4 (random) UUIDs.  On the graph, you can see that as expected, random UUIDs fall down badly once the collection size gets large.  One thing that I didn't expect was the odd cyclic timing performance of timestamp-based UUIDs.  It turns out that's because the Mongo Java driver stores UUIDs in Java's native big-endian byte order, so the timestamp is in the wrong byte order for Mongo, which loses some of the nice mostly-increasing behavior.  So for kicks I tried reversing the byte order of the timestamp-- that's what the "Timestamp (Order Swap)" line is for on the result graph.  I was a bit surprised at how significant the difference was-- just changing the byte order on the insert reduced the time to run through the benchmark by 27%.

I don't see any easy way to do it, but it would be interesting to see if there's a way to include that change in the Java driver without breaking existing binary compatibility.

Anyway, just thought that others might appreciate seeing the benchmark results.

Octavian Florescu

unread,
Sep 30, 2011, 4:55:39 PM9/30/11
to mongod...@googlegroups.com
I also saw an increase in performance when switching from UUID to ObjectId.
What is concerning me, is that in our tests, using a UUID as a shard key resulted in missing collection entries under high load in a sharded configuration (I also tested with a String field - we are using the java driver - as a shard key, and same data loss, about 1-3 missed entries for 100K inserts).

Since switching to ObjectId for the shard key, we have not seen (yet!) any data loss. Observed this behaviour with both 1.8.3 and 2.0
I might attempt to create a test harness (our code is fairly complex) to repro this and post it.


--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To view this discussion on the web visit https://groups.google.com/d/msg/mongodb-user/-/zgpPkUPMUuwJ.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.



--


Octavian Florescu
oflo...@gmail.com

Dwight Merriman

unread,
Oct 1, 2011, 11:10:21 AM10/1/11
to mongodb-user
yes id's starting with a timestamp, in the right byte order, will
perform better on inserts with a large index as entire index does not
have to be in ram for good insert speed.

be sure you store the UUID's in BinData format for good performance.

note if you later query or update those documents and those operations
don't correlate with a certain "id space region", it will be hitting
disk a lot (like any database). in that situation consider a
different schema or more ram or ssd.


On Sep 28, 8:00 pm, Mark Lewis <m...@lewisworld.org> wrote:
> (executive summary: see pretty graph athttp://i.imgur.com/clm9D.png)

Dwight Merriman

unread,
Oct 1, 2011, 11:11:13 AM10/1/11
to mongodb-user
no data should be lost. please start a new thread with lots of
details to drill on that if need be.

if any collections on the IDs, entries would be missing as _id is
unique and those inserts will return
an error. not sure that was the case just guessing.

On Sep 30, 4:55 pm, Octavian Florescu <oflore...@gmail.com> wrote:
> I also saw an increase in performance when switching from UUID to ObjectId.
> What is concerning me, is that in our tests, using a UUID as a shard key
> resulted in missing collection entries under high load in a sharded
> configuration (I also tested with a String field - we are using the java
> driver - as a shard key, and same data loss, about 1-3 missed entries for
> 100K inserts).
>
> Since switching to ObjectId for the shard key, we have not seen (yet!) any
> data loss. Observed this behaviour with both 1.8.3 and 2.0
> I might attempt to create a test harness (our code is fairly complex) to
> repro this and post it.
>
>
>
>
>
>
>
>
>
> On Wed, Sep 28, 2011 at 5:00 PM, Mark Lewis <m...@lewisworld.org> wrote:
> > (executive summary: see pretty graph athttp://i.imgur.com/clm9D.png)
> oflore...@gmail.com
Reply all
Reply to author
Forward
0 new messages