Multi-get implementation in binary protocol

194 views
Skip to first unread message

Byung-chul Hong

unread,
May 7, 2014, 4:24:58 AM5/7/14
to memc...@googlegroups.com
Hello,

For now, I'm trying to evaluate the performance of memcached server by using several client workloads.
I have a question about multi-get implementation in binary protocol.
As I know, in ascii protocol, we can send multiple keys in a single request packet to implement multi-get.

But, in a binary protocol, it seems that we should send multiple request packets (one request packet per key) to implement multi-get.
Even though we send multiple getQ, then sends get for the last key, we only can save the number of response packets only for cache miss.
If I understand correctly, multi-get in binary protocol cannot reduce the number of request packets, and
it also cannot reduce the number of response packets if hit-ratio is very high (like 99% get hit).

If the performance bottleneck is on the network side not on the CPU, I think reducing the number of packets is still very important,
but I don't understand why the binary protocol doesn't care about this.
I missed something?

Thanks in advance,
Byungchul.

Ryan McElroy

unread,
May 7, 2014, 5:10:15 PM5/7/14
to memc...@googlegroups.com
At least in my experience at Facebook, 1 request != 1 packet. That is, if you send several/many requests to the same memcached box quickly, they will tend to go out in the same packet or group of packets, so you still get the benefits of fewer packets (and in fact, we take advantage of this because it is very important at very high request rates -- eg, over 1M gets per second). The same thing happens on reply -- the results tend to come back in just one packet (or more, if the replies are larger than a packet). At Facebook, our main way of talking to memcached (mcrouter) doesn't even support multi-gets on the client side, and it *doesn't matter* because the batching happens anyway.

I don't have any experience with the memcached-defined binary protocol, but I think there's probably something similar going on here. You can verify by using a tool like tcpdump or ngrep to see what goes into each packet when you do a series of gets of the same box over the binary protocol. My bet is that you'll see them going in the same packet (as long as there aren't any delays in sending them out from your client application). That being said, I'd love to see what you learn if you do this experiment.

Cheers,

~Ryan


--

---
You received this message because you are subscribed to the Google Groups "memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email to memcached+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

dormando

unread,
May 7, 2014, 8:30:39 PM5/7/14
to memc...@googlegroups.com
you're right, it sucks. I was never happy with it, but haven't had time to
add adjustments to the protocol for this. To note, with .19 some
inefficiencies with the protocol were lifted, and most network cards are
fast enough for most situations, even if it's one packet per response (and
for large enough responses they split into multiple packets, anyway).

The reason why this was done is for latency and streaming of responses:

- In ascii multiget, I can send 10,000 keys, then I'm forced to wait for
the server to look up all of the keys before sending its responses, this
isn't typically very high but there's some latency to it.

- In binary multiget, the responses are sent back as it receives them from
the network more or less. This reduces the latency to when you start
seeing responses, regardless of how large your multiget is. this is useful
if you have a kind of client which can start processing responses in a
streaming fashion. This potentially reduces the total time to render your
response since you can keep the CPU busy unmarshalling responses instead
of sleeping.

However, it should have some tunables: One where it at least does one
write per complete packet (TCP_CORK'ed, or similar), and one where it
buffers up to some size. In my tests I can get ascii multiget up to 16.2
million keys/sec, but (with the fixes in .19) binprot caps out at 4.6m and
is spending all of its time calling sendmsg(). Most people need far, far
less than that, so the binprot as is should be okay though.

The code isn't too friendly to this and there're other higher priority
things I'd like to get done sooner. The relatively few number of people
who do 500,000+ requests per second in binprot (they're almost always
ascii at that scale) is the other reason.

Byung-chul Hong

unread,
May 9, 2014, 9:12:33 AM5/9/14
to memc...@googlegroups.com
Hello, Ryan, dormando,

Thanks a lot for the clear explanation and the comments.
I'm trying to find out how many requests I can batch as a muli-get within the allowed latency.
I think multi-get has many advantages, the only penalty is the longer latency as pointed out in the above answer.
But, the longer latency may not be a real issue unless it exceeds some threshold that the end users can notice.
So, now I'm trying to use multi-get as much as possible.

Actually, I have thought that Binary protocol would be always better than ascii protocol since binary protocol
can reduce the burden of parsing in the Server side, but it seems that I need to test both cases.

Thanks again for the comments, and I will share the result if I get some interesting or useful data.

Byungchul.




--

---
You received this message because you are subscribed to a topic in the Google Groups "memcached" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/memcached/QwjEftFhtCY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to memcached+...@googlegroups.com.

dormando

unread,
May 9, 2014, 2:25:46 PM5/9/14
to memc...@googlegroups.com
Unfortunately binprot isn't that much faster processing wise... what it
does give you is a bunch of safe features (batching set's, mixing
sets/gets and the like).

You *can* reduce the packet load on the server a bit by ensuring your
client is actually batching the binary multiget packets together, then
it's only the server increasing the packet load...
> You received this message because you are subscribed to the Google Groups "memcached" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to memcached+...@googlegroups.com.

Yongming Shen

unread,
Jan 5, 2015, 4:30:39 PM1/5/15
to memc...@googlegroups.com
Hi Ryan, by "mcrouter doesn't even support multi-gets on the client side", do you mean mcrouters won't send multi-gets to memcached servers, or frontend servers won't send multi-gets to mcrouters, or both?

Ryan McElroy

unread,
Jan 6, 2015, 1:32:36 AM1/6/15
to memc...@googlegroups.com
The first is correct -- mcrouters won't send out multi-gets. Specfically, mcrouter will accept multi-gets on the server-side. That is, it will correctly parse a command like "get key1 key2 key3\r\n"), but when it sends the requests out, it will send them out as "get key1\r\nget key2\r\nget key3\r\n", even if they all go to the same memcached server. We considered changing this a few times, but found out it increased complexity significantly and really didn't matter for the way we used memcache at Facebook.
Reply all
Reply to author
Forward
0 new messages