I think there is something wrong with your benchmarks, because the numbers don't seem right.
I'm attaching a single threaded benchmark I used to attempt to reproduce your results. I'm using a class with 6 properties:
public class C
{
public string S { get; set; }
public string T { get; set; }
public string U { get; set; }
public int X { get; set; }
public int Y { get; set; }
public int Z { get; set; }
}
I'm initializing an instance of the class like this:
var c = new C
{
S = "stuvwxyz",
T = "tuvwxyz",
U = "uvwxyz",
X = 1,
Y = 2,
Z = 3
};
And the benchmarking loop looks like this:
var stopwatch = new Stopwatch();
var iterations = 1000000;
stopwatch.Start();
for (int i = 0; i < iterations; i++)
{
var bytes = c.ToBson(); // serializes c to binary BSON
}
stopwatch.Stop();
Console.WriteLine("{0} elapsed", stopwatch.Elapsed);
Console.WriteLine("{0}/second", (int)(iterations * 1000 / stopwatch.ElapsedMilliseconds));
On my laptop I get approx 100,000 serializations per second. If I REMOVE the lock in LookupSerializer that number goes up to approx 110,000 per second. Of course we can't remove the lock but this is just for testing purposes. The point is that the lock is at most causing a 10% overhead. In real life the overhead would be much less because there is a lot more going on (your code, networking calls, etc...).