I was recently trying to decide which GUID implementation to use in Mongo for one of our Java projects.
In the past, we have used RFC-4122 random UUID values (i.e. what you get from java.util.UUID.randomUUID()). I know that ObjectId is recommended for best performance, but using UUIDs keeps Mongo-specific bits out of our business layer and plays nicely with other infrastructure components we've got.
But this new project could have a lot more data than our previous Mongo projects, and I wondered how bad insert performance would get with our random UUIDs, once the index size gets too big to fit in memory.
So I wrote a benchmark and ran it on a somewhat memory-constrained system to see how insert performance scales with collection size.
Some details about the benchmark:
Insert 10 million documents, each with only an _id field. The test machine is running 2.0 64-bit on a Fedora 15 VM with 4 cores and 512M of memory. Used the Java driver to drive the test.
I used the JUG utility (
http://wiki.fasterxml.com/JugHome) to generate both type 1 (timestamp) and type 4 (random) UUIDs. On the graph, you can see that as expected, random UUIDs fall down badly once the collection size gets large. One thing that I didn't expect was the odd cyclic timing performance of timestamp-based UUIDs. It turns out that's because the Mongo Java driver stores UUIDs in Java's native big-endian byte order, so the timestamp is in the wrong byte order for Mongo, which loses some of the nice mostly-increasing behavior. So for kicks I tried reversing the byte order of the timestamp-- that's what the "Timestamp (Order Swap)" line is for on the result graph. I was a bit surprised at how significant the difference was-- just changing the byte order on the insert reduced the time to run through the benchmark by 27%.
I don't see any easy way to do it, but it would be interesting to see if there's a way to include that change in the Java driver without breaking existing binary compatibility.
Anyway, just thought that others might appreciate seeing the benchmark results.