Some benchmarks & bottlenecks

mongotime

unread,

Jan 10, 2012, 9:10:09 AM1/10/12

to mongodb...@googlegroups.com

Hi guys. Thanks for the C# driver work.

Have started playing with it and decided I would do some random inserts tests to start off with. I present to you with the results in case it helps you optimize anything like serialization or whatever.

Inserting empty-ish objects (one field, 50 bytes)

48K inserts/second

Inserting a 5000-byte object with no structure (one field)

19K inserts/second

Inserting a simple 5000-byte object of type:

{ field1: value,

field2: value,

field3: value,

.... 100 elements total

}

15K inserts/second

Inserting a 5000-byte object of type:

{ field1: value,

array1: [

{ field2: value},

....100 elements total

]

}

9K.inserts/second

Inserting a 5000-byte object of type:

{ field1: value,

subobj1: { field2: value}

subobj2: { field2: value}

subobj3: { field2: value}

....100 elements total

]

}

7K inserts/second

Inserting a 5000-byte .NET object that ends up having following structure:

{ field1: value,

array1: [

{ field2: value},

....100 elements total

]

}

5K inserts/second

Notes:

* Inserts were all done in a multithreaded fashion and client was bottleneck

* Mongo and client running on same machine

* I made sure that the payload size was always around 5K regardless of structure.

* No indexes, no sharding, no replica sets

* Database dropped before each run

Key results:

- You lose around 50% moving from a blob to a 100 field structure with one level of nesting (19K to 9K)

- You lose another 50% by letting the driver build the BsonDoc from .NET objects (9K to 5K)

If this doesn't seem like more or less what you would expect, let me know and I will try to figure out what I did wrong.

Thanks!

Robert Stam

unread,

Jan 10, 2012, 10:50:14 AM1/10/12

to mongodb...@googlegroups.com

I would say most of your results seem normal.

Losing only 50% performance when making the document 100 times more complicated doesn't seem too bad.

Losing another 50% when moving from serializing a BsonDocument to a .NET object is a little bigger than I would have expected.

How did you verify that the client was the bottleneck?

If you are willing to share your test code I can run it with the Visual Studio profiler and look for any low hanging fruit.

mongotime

unread,

Jan 10, 2012, 12:18:04 PM1/10/12

to mongodb...@googlegroups.com

Thanks for responding Robert. I've posted the test code on pastebin. I'll warn you that it's ugly, but since it's just performance testing it's not worth prettifying.

http://pastebin.com/0MJYNZyL

I just ran it again and got the following output:

Empty-ish Inserts completed in: 00:00:02.1382805

( 46763.3145912204 inserts/second)

5000-byte (1 field) Inserts completed in: 00:00:04.9939312

( 20023.7176931332 inserts/second)

5000-byte (in 100 simple flat fields) Inserts completed in: 00:00:06.9638145

BsonDoc creation time:00:00:03.0097405

( 14359.582645623 inserts/second)

5000-byte (with 100 fields in nested Array) Inserts completed in: 00:00:11.5500828

BsonDoc creation time: 00:00:08.1388590

( 8657.81826096638 inserts/second)

5000-byte Page (with 100 subdocs) Inserts completed in: 00:00:15.9124533

BsonDoc creation time: 00:00:07.9837140

( 6284.31222692817 inserts/second)

5000-byte .NET Objects Inserts completed in: 00:00:19.9585423

( 5010.35721001024 inserts/second)

Guess you would only be interested in results 4 compared with 6.

H.

mongotime

unread,

Jan 12, 2012, 4:49:01 PM1/12/12

to mongodb...@googlegroups.com

I ran it out through the VS analyzer just for 'fun'.

You can get a 15-20% boost across the board by replacing:

name.StartsWith("$") -----------> name.StartsWith("$", StringComparison.Ordinal))

name.Contains('.') ----------> name.IndexOf('.') >= 0

That doesn't affect the .NET/bsondoc performance ratio, which seems to be coming from the read lock in LookupSerializer(). If you remove it, the gap goes from 50% to 10%. Of course I'm not sure if it's safe to remove it :-)

H.

mongotime

unread,

Jan 12, 2012, 4:57:34 PM1/12/12

to mongodb...@googlegroups.com

Actually the aforementioned string manipulation boost is more like 30-40% for complex documents, so definitely worth adding, if it's not breaking anything.

Robert Stam

unread,

Jan 13, 2012, 3:59:48 PM1/13/12

to mongodb...@googlegroups.com

I also found that name[0] == '$' is even faster than name.StartsWith("$", StringComparison.Ordinal); about 4x faster.

Let me know if you'd like to send a pull request with these changes. If not, I can just change it myself later today.

Thanks for investigating this. These are good performance improvements.

mongotime

unread,

Jan 13, 2012, 4:51:08 PM1/13/12

to mongodb...@googlegroups.com

All yours Robert.

Glad to have helped.

H.

Robert Stam

unread,

Jan 13, 2012, 5:53:06 PM1/13/12

to mongodb...@googlegroups.com

OK. I'll do the edits then. And thanks again.

mongotime

unread,

Jan 16, 2012, 9:52:44 PM1/16/12

to mongodb...@googlegroups.com

BTW, not sure if you want to mess with concurrency optimization, but there seems to be two solutions to the reading lock bottleneck that is causing a 50% drop in perf when using .NET serialization,

[1] .NET 4 has a ConcurrentDictionary equivalent of Dictionary.

[2] Use HashTables instead of Dictionary, which is thread safe for multiple readers + 1 writer.

__serializer changes below, but you can do the same with idgenerator I suppose.

Old code:

private static Dictionary<Type, IBsonSerializer> __serializers = new Dictionary<Type, IBsonSerializer>();

[........]

 public static IBsonSerializer LookupSerializer(Type type)
        {
            __configLock.EnterReadLock();
            try
            {
                IBsonSerializer serializer;
                if (__serializers.TryGetValue(type, out serializer))
                {
                    return serializer;
                }
            }
            finally
            {
                __configLock.ExitReadLock();
            }

            __configLock.EnterWriteLock();
            try
            {
                IBsonSerializer serializer;
                if (!__serializers.TryGetValue(type, out serializer))

[........]

New Code:

private static Hashtable __serializers = new Hashtable();

[........]

public static IBsonSerializer LookupSerializer(Type type)

{

try

{

IBsonSerializer serializer = (IBsonSerializer) __serializers[type];

if (serializer != null)

{

return serializer;

}

finally

{

}

__configLock.EnterWriteLock();

try

{

IBsonSerializer serializer = (IBsonSerializer)__serializers[type]; ;

if (serializer == null)

[........]

Hrair Mekhsian

unread,

Jan 16, 2012, 9:56:20 PM1/16/12

to mongodb...@googlegroups.com

Before:

5000-byte .NET Objects Inserts completed in: 00:00:08.2458765

( 5963.47426767632 inserts/second)

After:

5000-byte .NET Objects Inserts completed in: 00:00:05.2379726

( 9545.43857853911 inserts/second)

Robert Stam

unread,

Jan 16, 2012, 11:12:36 PM1/16/12

to mongodb...@googlegroups.com

I wouldn't want to change the concurrency code without careful thought.

I would be reluctant to use Hashtable. It's pretty obsolete. Up until now we are only requiring .NET Framework 3.5 so ConcurrentDictionary is not an option until we make .NET Framework 4 required.

Robert Stam

unread,

Jan 17, 2012, 11:00:13 AM1/17/12

to mongodb...@googlegroups.com

I think there is something wrong with your benchmarks, because the numbers don't seem right.

I'm attaching a single threaded benchmark I used to attempt to reproduce your results. I'm using a class with 6 properties:

public class C

{

public string S { get; set; }

public string T { get; set; }

public string U { get; set; }

public int X { get; set; }

public int Y { get; set; }

public int Z { get; set; }

}

I'm initializing an instance of the class like this:

var c = new C

{

S = "stuvwxyz",

T = "tuvwxyz",

U = "uvwxyz",

X = 1,

Y = 2,

Z = 3

};

And the benchmarking loop looks like this:

var stopwatch = new Stopwatch();

var iterations = 1000000;

stopwatch.Start();

for (int i = 0; i < iterations; i++)

{

var bytes = c.ToBson(); // serializes c to binary BSON

}

stopwatch.Stop();

Console.WriteLine("{0} elapsed", stopwatch.Elapsed);

Console.WriteLine("{0}/second", (int)(iterations * 1000 / stopwatch.ElapsedMilliseconds));

On my laptop I get approx 100,000 serializations per second. If I REMOVE the lock in LookupSerializer that number goes up to approx 110,000 per second. Of course we can't remove the lock but this is just for testing purposes. The point is that the lock is at most causing a 10% overhead. In real life the overhead would be much less because there is a lot more going on (your code, networking calls, etc...).

Program.cs

mongotime

unread,

Jan 17, 2012, 11:08:48 AM1/17/12

to mongodb...@googlegroups.com

Well, it makes sense that you would not see a performance drop-off that's related to concurrency/locking in a single-threaded test. Try again with a Parallel.For if you want.

That being said, your previous concerns about messing with the locking are all reasonable. I was just optimizing for my own education, and thought I would share it just in case... As you say, in a real-life scenario you'd probably not notice.

mongotime

unread,

Jan 17, 2012, 11:17:49 AM1/17/12

to mongodb...@googlegroups.com

FYI, with your code:

Single-threaded: 120K/second

Multi-Threaded: 259K/second

Multi with hashtable: 460K/second

Robert Stam

unread,

Jan 17, 2012, 12:22:35 PM1/17/12

to mongodb...@googlegroups.com

A single threaded test is useful because even that single thread is acquiring and releasing the lock.

When I switched from a regular for loop to a Parallel.For loop throughput dropped from approx 100,000 to approx 40,000. I attribute that to the overhead of Parallel.For, not to locking. Parallel.For is probably not a good choice when each step of the For loop is very very small, as the overhead of calling the lambda starts to be significant.

Interesting that I would see a drop in throughput with Parallel.For and you would see an increase. I wonder why. Here's my Parallel.For loop:

//for (int i = 0; i < iterations; i++)

Parallel.For(0, iterations, i =>

{

var bytes = c.ToBson();

});

Robert Stam

unread,

Jan 17, 2012, 1:08:16 PM1/17/12

to mongodb...@googlegroups.com

My calculation of serializations per second was wrong because 32-bit arithmetic was overflowing.

Using 64-bit arithmetic to compute serializations per second I get:

approx 175,000/second with ReaderWriterLockSlim

approx 280,000/second with NO lock (not practical, just for testing)

while pegging 8 CPUs instead of just 1.

I suspect the reason I don't see an 8x performance improvement with 8 CPUs using Parallel.For is the overhead of invoking the lambda as well as any overhead internal to Parallel.For.

mongotime

unread,

Jan 17, 2012, 1:39:58 PM1/17/12

to mongodb...@googlegroups.com

Hehe. I just noticed the overflow as well :-)

Numbers look closer to mine now. I get 1.5x going from single to multi-threaded with the current locking mechanism, 3x with no read lock using a hashtable.

3x is not bad since I have 4 cores. I suspect you also have a 4 core machine with hyper-threading so 8x would be an unreasonable target.

Reply all

Reply to author

Forward