> This worked! However it seems like TCP and UDP latency now is about the same with my code as well as with a real
> benchmarking tool (memaslap).
I don't use memaslap so I can't speak to it. I use mc-crusher for the
"official" testing, though admittedly it's harder to configure.
> Not sure I understand the scalability point. From my observations, if I do a multiget, I get separate packet
> sequences for each response. So each get value could be about 2^16 * 1400 bytes big and still be ok via UDP
> (assuming everything arrives)? One thing that seemed hard is each separate sequence has the same requestId, which
> makes deciding what to do difficult in out-of-order arrival scenarios.
mostly RE: kernel/syscall stuff. Especially after the TCP optimizations in
1.6, UDP mode will just be slower at high request rates. It will end up
running a lot more syscalls.
> SO_REUSEPORT seems to be supported in the linux kernel in 3.9. But I definitely understand the decision to not
> spend much time optimizing the UDP protocol. I did see higher rusage_user and much higher rusage_system when
> using UDP, which maybe corresponds to what you are saying. I tried with memaslap and observed the same thing.
Yeah, see above.
> No pressing issue really. We saw this (admittedly old) paper discussing how Facebook was able to reduce get
> latency by 20% by switching to UDP. Memcached get latency is a key factor in our overall system latency so we
> thought it would be worth a try, and it would ease some pressure on our network infrastructure as well. Do you
> know if Facebook's changes ever made it back into the main memcached distribution?
I wish there was some way I could make that paper stop existing. Those
changes went into memcached 1.2, 13+ years ago. I'm reasonably certain
facebook doesn't use UDP for memcached and hasn't in a long time. None of
their more recent papers (Which also stop around 2014) mention UDP at all.
The best performance you can get is by ensuring multiple requests are
pipelined at once, and there are a reasonable number of worker threads
(not more than one per CPU). If you see anything odd or have quetions
please bring up specifics, share server settings, etc.