I haven't done 4.0 yet, but here are some interesting results in a couple of new 3.5 tests:
DatabaseCommands_Single_Read|esent: 7745 reads in 5.0002523 seconds (1548 per second).
DatabaseCommands_Single_Read|esent: 7826 reads in 5.0000266 seconds (1565 per second).
DatabaseCommands_Single_Read|esent: 7779 reads in 5.0001305 seconds (1555 per second).
DatabaseCommands_Single_Read|esent: 7841 reads in 5.0006551 seconds (1567 per second).
DatabaseCommands_Single_Read|esent: 7794 reads in 5.0003173 seconds (1558 per second).
AsyncDatabaseCommands_Single_Read|esent: 7987 reads in 5.000086 seconds (1597 per second).
AsyncDatabaseCommands_Single_Read|esent: 8001 reads in 5.0001393 seconds (1600 per second).
AsyncDatabaseCommands_Single_Read|esent: 8052 reads in 5.0004048 seconds (1610 per second).
AsyncDatabaseCommands_Single_Read|esent: 8055 reads in 5.0000439 seconds (1610 per second).
AsyncDatabaseCommands_Single_Read|esent: 7987 reads in 5.0002991 seconds (1597 per second).
DatabaseCommands_Concurrent_Read|esent: 12362 reads in 5.0011508 seconds (2471 per second).
DatabaseCommands_Concurrent_Read|esent: 38874 reads in 5.0010916 seconds (7773 per second).
DatabaseCommands_Concurrent_Read|esent: 38697 reads in 5.0005325 seconds (7738 per second).
DatabaseCommands_Concurrent_Read|esent: 38837 reads in 5.0006701 seconds (7766 per second).
AsyncDatabaseCommands_Concurrent_Read|esent: 42172 reads in 5.0009754 seconds (8432 per second).
AsyncDatabaseCommands_Concurrent_Read|esent: 42403 reads in 5.0009181 seconds (8479 per second).
AsyncDatabaseCommands_Concurrent_Read|esent: 41499 reads in 5.0009974 seconds (8298 per second).
AsyncDatabaseCommands_Concurrent_Read|esent: 42411 reads in 5.0009181 seconds (8480 per second).
AsyncDatabaseCommands_Concurrent_Read|esent: 42053 reads in 5.0009025 seconds (8409 per second).
As you can see, async doesn't improve much, however:
DatabaseCommands_Single_MultiGet|sequential|100|esent: 31800 reads in 5.0137222 seconds (6342 per second).
DatabaseCommands_Single_MultiGet|sequential|100|esent: 31500 reads in 5.0020688 seconds (6297 per second).
DatabaseCommands_Single_MultiGet|sequential|100|esent: 31800 reads in 5.0026138 seconds (6356 per second).
DatabaseCommands_Single_MultiGet|sequential|100|esent: 31900 reads in 5.0126452 seconds (6363 per second).
DatabaseCommands_Single_MultiGet|sequential|100|esent: 31400 reads in 5.0037426 seconds (6275 per second).
DatabaseCommands_Single_MultiGet|parallel|100|esent: 83100 reads in 5.002608 seconds (16611 per second).
DatabaseCommands_Single_MultiGet|parallel|100|esent: 83700 reads in 5.0047585 seconds (16724 per second).
DatabaseCommands_Single_MultiGet|parallel|100|esent: 90700 reads in 5.0039441 seconds (18125 per second).
DatabaseCommands_Single_MultiGet|parallel|100|esent: 93200 reads in 5.0030808 seconds (18628 per second).
DatabaseCommands_Single_MultiGet|parallel|100|esent: 92000 reads in 5.0012391 seconds (18395 per second).
DatabaseCommands_Concurrent_MultiGet|sequential|100|esent: 86500 reads in 5.0075925 seconds (17273 per second).
DatabaseCommands_Concurrent_MultiGet|sequential|100|esent: 146400 reads in 5.0197814 seconds (29164 per second).
DatabaseCommands_Concurrent_MultiGet|sequential|100|esent: 151500 reads in 5.0452868 seconds (30028 per second).
DatabaseCommands_Concurrent_MultiGet|sequential|100|esent: 176100 reads in 5.0341453 seconds (34981 per second).
DatabaseCommands_Concurrent_MultiGet|sequential|100|esent: 186100 reads in 5.0254123 seconds (37031 per second).
DatabaseCommands_Concurrent_MultiGet|parallel|100|esent: 56800 reads in 5.1071263 seconds (11121 per second).
DatabaseCommands_Concurrent_MultiGet|parallel|100|esent: 121100 reads in 5.0275505 seconds (24087 per second).
DatabaseCommands_Concurrent_MultiGet|parallel|100|esent: 162400 reads in 5.0243965 seconds (32322 per second).
DatabaseCommands_Concurrent_MultiGet|parallel|100|esent: 182100 reads in 5.0201605 seconds (36273 per second).
DatabaseCommands_Concurrent_MultiGet|parallel|100|esent: 195100 reads in 5.0337229 seconds (38758 per second).
Using multiget brings performance back into the realm of acceptable, "parallel" means ¶llel=yes is used in the request and the 100 is how many GET's per multi-get.
This suggests most of the performance loss is in the http request, rather than serialization. It is localhost so network latency should be minimal, but is nagle algorithm enabled?
I will get round to trying 4.0 soon, but for now I think I can work some kind of client batching and multi-get requests for our high throughput use-cases.