you're right, it sucks. I was never happy with it, but haven't had time to
add adjustments to the protocol for this. To note, with .19 some
inefficiencies with the protocol were lifted, and most network cards are
fast enough for most situations, even if it's one packet per response (and
for large enough responses they split into multiple packets, anyway).
The reason why this was done is for latency and streaming of responses:
- In ascii multiget, I can send 10,000 keys, then I'm forced to wait for
the server to look up all of the keys before sending its responses, this
isn't typically very high but there's some latency to it.
- In binary multiget, the responses are sent back as it receives them from
the network more or less. This reduces the latency to when you start
seeing responses, regardless of how large your multiget is. this is useful
if you have a kind of client which can start processing responses in a
streaming fashion. This potentially reduces the total time to render your
response since you can keep the CPU busy unmarshalling responses instead
of sleeping.
However, it should have some tunables: One where it at least does one
write per complete packet (TCP_CORK'ed, or similar), and one where it
buffers up to some size. In my tests I can get ascii multiget up to 16.2
million keys/sec, but (with the fixes in .19) binprot caps out at 4.6m and
is spending all of its time calling sendmsg(). Most people need far, far
less than that, so the binprot as is should be okay though.
The code isn't too friendly to this and there're other higher priority
things I'd like to get done sooner. The relatively few number of people
who do 500,000+ requests per second in binprot (they're almost always
ascii at that scale) is the other reason.