Ruby C driver is faster than native Java driver...???

Chuck Remes

unread,

Oct 5, 2010, 3:51:40 PM10/5/10

to mongodb-dev

I was looking at doing some optimizations for the new JRuby branch, so I thought I would baseline my performance using the real Java driver. So I wrote a small Java program (using the Quickstart.java as a framework) to read a subset of documents from one of my collections, sort it, and tally up the total number of ticks (market data) for the subset. These programs were run multiple times against an unloaded dev server with 12GB RAM; the entire server is dedicated to mongodb stuff.

http://gist.github.com/612176

I then wrote a small Ruby program that does the exact same thing.

I ran the Java program using the standard settings (regular heap, regular stack size, etc) and mongo driver 2.1-master on Mac OSX 10.6.4 using the 64-bit JVM. I also ran the Ruby program using the 1.1 ruby driver on MRI 1.9.2p0 and JRuby 1.5.1 (same JVM).

Imagine my surprise when the Ruby program blew away the Java program reading over 150 million documents! I find it shocking that the Ruby C extension was able to deserialize some of these documents, many of which contain a "vals" array with thousands of elements, in a shorter timespan. I am so surprised that I am certain I did something wrong.

Please check the gist and tell me what I did wrong with this comparison.

cr

PS - I'm not including results for the 2.1 release driver (from 20100819), but I did notice that it was consistently about 10% faster than the current java git master.

Scott Hernandez

unread,

Oct 5, 2010, 4:25:47 PM10/5/10

to mongo...@googlegroups.com

Can you post your sample data and the stats from your jvm (version,
options, etc)? It is hard to tell what is being tested.

If you are just trying to test the bson parsing (de-serialization)
then it seems like sorting on the server is not needed and will only
add more variance to the results.

Also, this test doesn't test the serialization (saving) of data from
the language to bson. Just something to keep in mind.

I would not be surprised if the java code needs to be optimized to
reduce the number of object creations and such. It would be
interesting to see what a profiler has to say is the hot spot in the
java code.

> --
> You received this message because you are subscribed to the Google Groups "mongodb-dev" group.
> To post to this group, send email to mongo...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-dev...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mongodb-dev?hl=en.
>
>

Chuck Remes

unread,

Oct 5, 2010, 5:00:05 PM10/5/10

to mongo...@googlegroups.com

On Oct 5, 2010, at 3:25 PM, Scott Hernandez wrote:

> Can you post your sample data and the stats from your jvm (version,
> options, etc)? It is hard to tell what is being tested.
>
> If you are just trying to test the bson parsing (de-serialization)
> then it seems like sorting on the server is not needed and will only
> add more variance to the results.
>
> Also, this test doesn't test the serialization (saving) of data from
> the language to bson. Just something to keep in mind.
>
> I would not be surprised if the java code needs to be optimized to
> reduce the number of object creations and such. It would be
> interesting to see what a profiler has to say is the hot spot in the
> java code.

I have updated the gist to include the options as well as the JVM details.

I also modified the test programs to remove the sort operation. Results were unchanged.

BTW, I do realize this is really a deserialization test. That's by design. If there is a same program already written in Java that exercises only the BSON stuff, please point it out and I'll use it instead.

cr

Fabian Becker

unread,

Oct 6, 2010, 5:36:31 AM10/6/10

to mongo...@googlegroups.com

Why is it surprising that C is faster than Java? C is a system language whereas Java sits on top of its virtual machine. The virtual machine introduces some overhead to code execution and that is reflected in the execution time..

--

Chuck Remes

unread,

Oct 6, 2010, 8:41:02 AM10/6/10

to mongo...@googlegroups.com

On Oct 6, 2010, at 4:36 AM, Fabian Becker wrote:

> Why is it surprising that C is faster than Java? C is a system language whereas Java sits on top of its virtual machine. The virtual machine introduces some overhead to code execution and that is reflected in the execution time..

Because it isn't just C. As the BSON types are decoded, the runtime needs to create Ruby strings, hashes, arrays, etc. The deserialization has to work through the Ruby runtime's C API which has some significant overhead compared to what you would find in the C or C++ mongo driver.

So yes, it is still surprising to me.

cr

Kyle Banker

unread,

Oct 6, 2010, 1:20:22 PM10/6/10

to mongo...@googlegroups.com

Chuck,

I put together a sample data set like yours, ran the scripts, and got a very similar result: the Ruby driver was much faster. Here are my slightly-modified scripts, plus a script to generate the sample data, in case anyone is interested in trying to reproduce:

http://gist.github.com/613715

Kyle

cr

will

unread,

Oct 6, 2010, 1:45:05 PM10/6/10

to mongodb-dev

My 2c - the unexpected Java slowness probably isn't Java per say but
possibly one or two places in the Java driver that do something sub-
optimal in a non-obvious way that only a profiler will find. Maybe
something in the BSON callbacks, serialization hooks, creating too
many strings, reflection, could be anything.

-will

> > mongodb-dev...@googlegroups.com<mongodb-dev%2Bunsubscribe@googlegr oups.com>
> > .

Chuck Remes

unread,

Oct 6, 2010, 3:09:01 PM10/6/10

to mongo...@googlegroups.com

On Oct 6, 2010, at 12:20 PM, Kyle Banker wrote:

Chuck,

I put together a sample data set like yours, ran the scripts, and got a very similar result: the Ruby driver was much faster. Here are my slightly-modified scripts, plus a script to generate the sample data, in case anyone is interested in trying to reproduce:

http://gist.github.com/613715

Excellent!

Any chance someone could translate this example program to C so we could use the C or C++ driver and do an across-the-board comparison?

cr

Kyle Banker

unread,

Oct 6, 2010, 3:59:32 PM10/6/10

to mongo...@googlegroups.com

Added the C equivalent to my gist. It's a _bit_ faster.

http://gist.github.com/613715

hmeiser

unread,

Oct 6, 2010, 4:22:08 PM10/6/10

to mongodb-dev

I submitted a patch to BSONDecoder which improves performance of
decoding (http://github.com/theunique/mongo-java-driver/commit/
efb91dbc42cff1d9c138bf27eda4062d27458741)

Could someone repeat benchmark with the patch?

ciao.hans.

> > mongodb-dev...@googlegroups.com<mongodb-dev%2Bunsubscribe@googlegr oups.com>
> > .

Chuck Remes

unread,

Oct 6, 2010, 4:43:37 PM10/6/10

to mongo...@googlegroups.com

On Oct 6, 2010, at 3:22 PM, hmeiser wrote:

> I submitted a patch to BSONDecoder which improves performance of
> decoding (http://github.com/theunique/mongo-java-driver/commit/
> efb91dbc42cff1d9c138bf27eda4062d27458741)
>
> Could someone repeat benchmark with the patch?
>
> ciao.hans.

I get a fatal error when trying to run a driver built from that commit.

http://gist.github.com/614028

I cloned your repository and reset to that commit before building. The build was clean but the run wasn't.

cr

hmeiser

unread,

Oct 7, 2010, 1:57:29 PM10/7/10

to mongodb-dev

I'll try to find the error. take some time.

ciao.hans.

hmeiser

unread,

Oct 11, 2010, 8:40:00 AM10/11/10

to mongodb-dev

Sorry, was a bug.
New BSONDecoder does readahead but beyond object boundary which is not
allowed with multiple objects.
So I had to limit the readahead to object boundary.

Look at new commit of BSONDecoder
http://github.com/theunique/mongo-java-driver/commit/c30892cc06baef0a7df1877f1d2efc941ec15142

ciao.hans.

Chuck Remes

unread,

Oct 11, 2010, 11:13:24 AM10/11/10

to mongo...@googlegroups.com

On Oct 11, 2010, at 7:40 AM, hmeiser wrote:

> Sorry, was a bug.
> New BSONDecoder does readahead but beyond object boundary which is not
> allowed with multiple objects.
> So I had to limit the readahead to object boundary.
>
> Look at new commit of BSONDecoder
> http://github.com/theunique/mongo-java-driver/commit/c30892cc06baef0a7df1877f1d2efc941ec15142
>
> ciao.hans.

Running your code versus the 2.2 release showed some sizable differences.

2.2 took 342 seconds.

Your patch took 216 seconds.

I make that out to roughly 40% faster.

However, the java driver is still slower than the Ruby C extension. It ran the same test in 180 seconds.

cr

Eliot Horowitz

unread,

Oct 11, 2010, 11:15:35 AM10/11/10

to mongo...@googlegroups.com

The reason the ruby driver is faster on that test is that ruby 1.9 has
a native internal utf-8 representation and java does not.
Almost all the time in java is spent converting utf-8 into java's
string classes.
On larger objects, or other types of tests, the result will be
different of course.

> --
> You received this message because you are subscribed to the Google Groups "mongodb-dev" group.
> To post to this group, send email to mongo...@googlegroups.com.

> To unsubscribe from this group, send email to mongodb-dev...@googlegroups.com.

Chuck Remes

unread,

Oct 11, 2010, 11:59:25 AM10/11/10

to mongo...@googlegroups.com

On Oct 11, 2010, at 10:15 AM, Eliot Horowitz wrote:

> The reason the ruby driver is faster on that test is that ruby 1.9 has
> a native internal utf-8 representation and java does not.
> Almost all the time in java is spent converting utf-8 into java's
> string classes.
> On larger objects, or other types of tests, the result will be
> different of course.

I'm not sure I follow. The only strings in the test are the key names '_id', 'ts', 'drn', 'cid' and 'vals'.

Why would keys added by a Ruby program be saved as utf-8 if that is going to cause performance problems for other drivers? Shouldn't there be one string representation enforced for keys across all drivers?

Or am I misunderstanding this?

cr

Scott Hernandez

unread,

Oct 11, 2010, 12:07:14 PM10/11/10

to mongo...@googlegroups.com

All string in bson are utf8. http://bsonspec.org/#/specification

Chuck Remes

unread,

Oct 11, 2010, 12:11:37 PM10/11/10

to mongo...@googlegroups.com

Maybe I don't know how to interpret that page, but e_name is clearly marked as a C string. Elements that contain strings are utf-8, but that doc shows that the key names should be C strings.

So now I'm really confused. I don't think the test I have been doing uses utf-8 strings at all, so I think Eliot's explanation for the performance difference is in error.

cr

Eliot Horowitz

unread,

Oct 11, 2010, 12:17:14 PM10/11/10

to mongo...@googlegroups.com

C string is a subset and the explanation is the same. Going from ASCII or utf 8 to java is a lot expensive than ruby 1.9

Kristina Chodorow

unread,

Oct 11, 2010, 12:17:58 PM10/11/10

to mongo...@googlegroups.com

By C string, it just means that it's terminated by a '\0' char. It should still use the UTF-8 set.

Java has convert every string from whatever the native encoding is to UTF-8 before sending it to the database.

Scott Hernandez

unread,

Oct 11, 2010, 12:18:36 PM10/11/10

to mongo...@googlegroups.com

If you hover over the circle with an I in it next to cstring it says:
CString - Zero or more modified UTF-8 encoded characters followed by
'\x00'. The (byte*) MUST NOT contain '\x00', hence it is not full
UTF-8.

I don't know about the performance issue Eliot speaks of.

Chuck Remes

unread,

Oct 11, 2010, 12:42:07 PM10/11/10

to mongo...@googlegroups.com

Okay, so I was misunderstanding.

BTW, I did a quick google on 'java utf-8' and ran across an article about faster *encoding* of Java strings to utf-8.

http://blog.rapleaf.com/dev/2010/04/26/faster-string-to-utf-8-encoding-in-java/

I don't know if that's helpful or not. I imagine the driver is already using whatever tricks it can to minimize this cost. Too bad that String.getBytes("utf-8") is so slow.

cr

Dwight Merriman

unread,

Oct 11, 2010, 1:34:36 PM10/11/10

to mongo...@googlegroups.com

i wonder if we should do a "trivial string detector" which then short
circuits and does something smarter in java.

trivial would be all low ascii chars. then maybe some other java
function is fast.

i don't know details; brainstorming...

hmeiser

unread,

Oct 11, 2010, 1:44:57 PM10/11/10

to mongodb-dev

My patch has an optimized loop for CString because most of the
CStrings are ascii chars. So the inner CString loop is very fast.

ciao.hans.

On 11 Okt., 19:34, Dwight Merriman <dwi...@10gen.com> wrote:
> i wonder if we should do a "trivial string detector" which then short
> circuits and does something smarter in java.
>
> trivial would be all low ascii chars. then maybe some other java
> function is fast.
>
> i don't know details; brainstorming...
>

> On Mon, Oct 11, 2010 at 12:42 PM, Chuck Remes <cremes.devl...@mac.com> wrote:
> > Okay, so I was misunderstanding.
>
> > BTW, I did a quick google on 'java utf-8' and ran across an article about faster *encoding* of Java strings to utf-8.
>

> >http://blog.rapleaf.com/dev/2010/04/26/faster-string-to-utf-8-encodin...

>
> > I don't know if that's helpful or not. I imagine the driver is already using whatever tricks it can to minimize this cost. Too bad that String.getBytes("utf-8") is so slow.
>
> > cr
>
> > On Oct 11, 2010, at 11:17 AM, Eliot Horowitz wrote:
>
> >> C string is a subset and the explanation is the same. Going from ASCII or utf 8 to java is a lot expensive than ruby 1.9
>

> >> On Oct 11, 2010, at 12:11 PM, Chuck Remes <cremes.devl...@mac.com> wrote:
>
> >>> Maybe I don't know how to interpret that page, but e_name is clearly marked as a C string. Elements that contain strings are utf-8, but that doc shows that the key names should be C strings.
>
> >>> So now I'm really confused. I don't think the test I have been doing uses utf-8 strings at all, so I think Eliot's explanation for the performance difference is in error.
>
> >>> cr
>
> >>> On Oct 11, 2010, at 11:07 AM, Scott Hernandez wrote:
>

> >>>> All string in bson are utf8.http://bsonspec.org/#/specification

Chuck Remes

unread,

Nov 15, 2010, 1:27:21 PM11/15/10

to mongo...@googlegroups.com

FYI, I tested again against the latest master this weekend. My original test showed that the Ruby C extension completed the test in 3 minutes (give or take 1 or 2 seconds). The latest Java driver clocks in at 5:45 which makes it roughly twice as slow. It's a tad better than 2.1 but still far behind.

See links from earlier in this thread to results, test programs, etc.

cr

Chuck Remes

unread,

Dec 14, 2010, 11:18:54 AM12/14/10

to mongo...@googlegroups.com

On Nov 15, 2010, at 12:27 PM, Chuck Remes wrote:

> FYI, I tested again against the latest master this weekend. My original test showed that the Ruby C extension completed the test in 3 minutes (give or take 1 or 2 seconds). The latest Java driver clocks in at 5:45 which makes it roughly twice as slow. It's a tad better than 2.1 but still far behind.
>
> See links from earlier in this thread to results, test programs, etc.

I just tested the latest Java driver 2.4rc0 against the same dataset. *Vast* improvement in the results very likely due to the DBList change [1]. It's nearly at parity with Ruby and its C extension; it's only about 10% slower now.

Nice work! Hopefully this improvement will make its way into the next Ruby driver release for JRuby.

cr

[1] http://jira.mongodb.org/browse/JAVA-226

Kyle Banker

unread,

Dec 14, 2010, 11:24:32 AM12/14/10

to mongo...@googlegroups.com

It's on the queue:
http://jira.mongodb.org/browse/RUBY-213

I'll integrate as soon as we have a final 2.4 release.

Reply all

Reply to author

Forward