facebook memcached on github

marc kwiatkowski

unread,

Dec 13, 2008, 8:24:58 AM12/13/08

to memcached

As some of you may have noticed we've put facebook's version of
memcached up on github

http://github.com/fbmarc/facebook-memcached

We've done this for a couple of reasons.

First, as far as we know, we run the largest installation of memcached
in the world. We think this is true in terms of ops/secs, number of
objects, amount of RAM, and number of servers. To get memcached to
support these levels we've had to make a lot of customizations. We
believe the memcache community would appreciate seeing what we've
done.

Second, a less complete repository of our version of memcached was
already up on github. We thought it would be better to have a
repository with our complete revision history both for our production
version and our attempt at integration with the 1.2.5 release.

As far as licensing goes. memcached was released under the BSD
license. We cannot change that even if we wanted to. It goes without
saying that we don't want to.

Finally, I produced these branches by running git-svn and git-filter-
branch against our internal SVN repositories. The addition of the
root "src" directory and the lack of a clear branch from 1.2.3 make
comparison with existing branches unnecessarily difficult. I
apologize for the inconvenience. I will redo the git svn extraction
to correct those issues later this week.

In the mean time I hope you will let us know what you think and what
else we can do to make memcached even better. We at facebook would be
happy to discuss questions and issues about our release in this group.

Thanks

dormando

unread,

Dec 13, 2008, 6:45:15 PM12/13/08

to memcached

Hey,

> Finally, I produced these branches by running git-svn and git-filter-
> branch against our internal SVN repositories. The addition of the
> root "src" directory and the lack of a clear branch from 1.2.3 make
> comparison with existing branches unnecessarily difficult. I
> apologize for the inconvenience. I will redo the git svn extraction
> to correct those issues later this week.

Thank you for the clarification. Though that's similar to what our own git
repo was until dustin rewrote it, and what mine presently is.

What we have right now are actually three trees with unrelated history.
Mine - the original cleaned up git-svn import. Dustin's, the rewritten
binary tree, and now FB's, which is a weird rewrite based off of 1.2.3.

So right now I have all the benefits of a nice distributed RCS, but none
of the benefits of being able to manage these branches, since we're all
touting to have the same thing with unrelated histories. My work in the
short term is to get the binary and stable trees related again, since even
if the binary protocol goes out, we will have to maintain the 1.2 series
for a while.

> In the mean time I hope you will let us know what you think and what
> else we can do to make memcached even better. We at facebook would be
> happy to discuss questions and issues about our release in this group.

Sounds good. I'll reiterate the things we've gone over privately
already and we'll work from there.

First, for clarity, a few small bits that were discernable from the last
merge request were committed and credited in the stable tree. I don't
think that's been in a release yet, however. Apologies on my part.

As Dustin said, the facebook tree, in its current state, is basically
unmergable. For the handful of us who know the codebase well enough can
readily tell that this tree is actually a rewrite. There's no items.c at
all, there're new files spread about, #ifdef's everywhere. Also as Dustin
said the original diffs were longer than the original codebase.

During our last discussion at the hackathon in october, we talked about
our inability to work with the original code dump. I had directly asked
for "at least a tree with history", so we could get a *hint* of insight
into what the changes were, and get smaller diffs to examine. The github
repository is exactly this. I even okay'ed the usage of github instead of
an SVN branch. Easier for all of us to work against.

The hope was that we'd be able to work together on messaging and figuring
out /what parts/ of the tree were mergable. Instead I've been reading a
flood from users chattering about the amazingly performant facebook fork
of memcached. - while it's an implementation of memcached provided by
facebook, I'm hard pressed to consider it a fork at this point. As another
poster has pointed out, FB isn't obligated to do anything, even a code
dump is nice. However, most people don't put up code dumps and claim them
as merge requests.

Even on other projects that I effectively own now (see: MogileFS) I
actually bother to rewrite topic branches into clean mail-able, documented
patchsets, which go up for review. This is a project where I really don't
have to do that at all, and the changes are all internal
performance/feature fixes for livejournal and typepad.

A lot of things stand between us and mergability:

- The FB tree is not compatible with either tree, as CAS was removed.
Counters are also very different, but that's minor. Trond has a patch
making the CAS requirement a runtime option... I haven't examined it yet,
but I recall the FB complaint was not wanting to waste 8 bytes per item
storing the CAS identifier.

- The FB tree contains a binary protocol that was proprietary up until the
release. The memcached binary protocol we have was designed and written by
a consortium of users, and has existing clients.

- The FB tree contains a weird mishmash of storage engines. Flat
allocator? Ok. Memory pools, ok... #ifdef's for the old engine. Fair
enough. The memcached community (mainly Toru and Trond) have been working
towards a pluggable storage engine interface for memcached. With the
intention of pulling in a lot of these forks, and allowing us to ship
multiple storage engines which can be runtime pluggable.

I recall this was brought up during the hackathon, and FB had serious
concerns about the usage of indirect function calls for a storage engine
interface. You'd prefer to #ifdef and compile it all out in order to save
a few cycles from some primary calls. This goes against what the
community has been working towards. (there're even handy public drafts of
the interface: http://code.google.com/p/memcached/wiki/EngineInterface).

- The FB tree contains dead ends, debug code, and dangerous features. We
would never release 'flush_regex' on the main release. Not on purpose,
anyway.

- All of the stats output code was rewritten into long, ugly
'append_to_buffer' lines. This doesn't seem necessary, and fills in the
gap of what little code wasn't rewritten elsewhere...

- The FB tree appears to have written very few, if any, tests around its
changes. Do your tests exist in a different tree, or do they not exist?

- Being nitpicky, the changelog wasn't even updated :)

---

There are certainly interesting things in the branch that we'd like. Maybe
the flat allocator? I don't know. Certainly bugfixes, the stats lock
changes, and the per-thread memory pools.

During the hackathons, in IRC, and elsewhere, we've been discussing
changes around the client memory buffers and the stats locks. I've been
personally nitpicky about the connection buffers. In my MySQL Proxy I'd
prototyped a shared connection buffer, so idle connections would use a
minimal amount of memory. This seemed like a good idea, but hadn't come up
on the 'super important' radar for memcached - the binary protocol seemed
more important, as it increases flexiblity for users. In the grand scheme
of things they need that more than extra speed. That's dangerous for me to
say, I know, but true.

The other thing are the stats locks. It'll be fun to see how you folks did
it, since I couldn't figure out how it was done without memory barriers on
the thread-local level. Or if that just worked okay. The "correct way" is
to use atomic CAS around a many-writer single-reader or single-writer
single-reader pattern. Which suck a bit for 8 byte values, and aren't
perfectly fast since they are still memory barriers.

What else is merge worthy? I'm not sure. The facebook article seems to
mention a lot of linux kernel hacks were required in order to achieve that
speed. How does the FB branch benchmark without those kernel
modifications? Are those changes also public?

If you folks are truly uninterested in maintaining a "fork" of the
software, you'll have to decide what distinct changes actually go
upstream. We have a binary protocol, we have a storage engine interface on
the way, and many of the other changes are unmergable. Will you work with
us to get the most beneficial performance changes upstream and adopt the
main tree, or continue to use your internal tree? We have to decide what
work will be merged at all before flocking to a "merge-a-thon".

have fun,
-Dormando

Tony Tung

unread,

Dec 14, 2008, 6:21:33 AM12/14/08

to memc...@googlegroups.com

Comments inline.

Thanks,
Tony

On 12/13/08 3:45 PM, "dormando" <dorm...@rydia.net> wrote:

- The FB tree contains a binary protocol that was proprietary up until the
release. The memcached binary protocol we have was designed and written by
a consortium of users, and has existing clients.

I don’t think it would be difficult to bring the FB tree’s binary protocol to a compatible state. It would also be pretty easy to remove it altogether, as it is pretty modular.

- The FB tree contains a weird mishmash of storage engines. Flat
allocator? Ok. Memory pools, ok... #ifdef's for the old engine. Fair
enough. The memcached community (mainly Toru and Trond) have been working
towards a pluggable storage engine interface for memcached. With the
intention of pulling in a lot of these forks, and allowing us to ship
multiple storage engines which can be runtime pluggable.

I recall this was brought up during the hackathon, and FB had serious
concerns about the usage of indirect function calls for a storage engine
interface. You'd prefer to #ifdef and compile it all out in order to save
a few cycles from some primary calls. This goes against what the
community has been working towards. (there're even handy public drafts of
the interface: http://code.google.com/p/memcached/wiki/EngineInterface).

Flat allocator is a storage engine. Memory pools are for finding memory hogs/leaks.

In any case, we started our work on supporting multiple storage engines before we heard of Toru and Trond’s work. When we did hear about it, we examined their proposal but found that it did not fit our model for two reasons:

As you mentioned, we don’t see any benefit to runtime selection of the storage engine; thus the indirect call is unnecessary in our environment.
It does not support spreading the data out to multiple blocks as we ended up doing with our flat allocator. While it is true that it could be managed by copying the data to a contiguous buffer, that’s very expensive.

- The FB tree contains dead ends, debug code, and dangerous features. We
would never release 'flush_regex' on the main release. Not on purpose,
anyway.

Besides flush_regex, could you be more specific?

- All of the stats output code was rewritten into long, ugly
'append_to_buffer' lines. This doesn't seem necessary, and fills in the
gap of what little code wasn't rewritten elsewhere...

It’s actually quite relevant. If someone were to add to the stats output without adjusting the size of the output buffers, we could easily overrun them. Tracking down memory corruption is such a pain and append_to_buffer guarantees that will not happen (at least not for stats).

Tony Tung

unread,

Dec 14, 2008, 6:36:19 AM12/14/08

to memc...@googlegroups.com

On 12/14/08 3:21 AM, "Tony Tung" <tt...@facebook.com> wrote:

In any case, we started our work on supporting multiple storage engines before we heard of Toru and Trond’s work. When we did hear about it, we examined their proposal but found that it did not fit our model for two reasons:

As you mentioned, we don’t see any benefit to runtime selection of the storage engine; thus the indirect call is unnecessary in our environment.
It does not support spreading the data out to multiple blocks as we ended up doing with our flat allocator. While it is true that it could be managed by copying the data to a contiguous buffer, that’s very expensive.

One more thing -- I am certainly not saying that the way we implemented multiple storage engines is The Right Way. I’m just saying that this was right for us; YMMV. :)

Thanks,
Tony

Anatoly Vorobey

unread,

Dec 14, 2008, 6:40:49 AM12/14/08

to memc...@googlegroups.com

Tony,

As you mentioned, we don't see any benefit to runtime selection of the storage engine; thus the indirect call is unnecessary in our environment.

I'm curious about this choice. Have you tried to benchmark the benefit of using a compile-time choice of a storage engine vs. indirect calls at runtime? Do you happen to have any statistics on hand regarding whether that change consistently influences latency/throughput?

--
Anatoly Vorobey, avor...@gmail.com
http://avva.livejournal.com (Russian)
http://www.lovestwell.org (English)

Aaron Stone

unread,

Dec 15, 2008, 4:28:43 AM12/15/08

to memc...@googlegroups.com

On Sun, Dec 14, 2008 at 3:40 AM, Anatoly Vorobey <avor...@gmail.com> wrote:
> Tony,
>>
>> As you mentioned, we don't see any benefit to runtime selection of the
>> storage engine; thus the indirect call is unnecessary in our environment.
>
> I'm curious about this choice. Have you tried to benchmark the benefit of
> using a compile-time choice of a storage engine vs. indirect calls at
> runtime? Do you happen to have any statistics on hand regarding whether that
> change consistently influences latency/throughput?

If it turns out in benchmarks to really honestly be painful enough to
warrant compiling with direct function calls, maybe a macro could be
used that maps at compile-time either directly to one storage engine
or to the run-time selected function pointer.

Aaron

Tony Tung

unread,

Dec 15, 2008, 2:14:12 PM12/15/08

to memc...@googlegroups.com

On 12/14/08 3:40 AM, "Anatoly Vorobey" <avor...@gmail.com> wrote:

Tony,

As you mentioned, we don't see any benefit to runtime selection of the storage engine; thus the indirect call is unnecessary in our environment.

I'm curious about this choice. Have you tried to benchmark the benefit of using a compile-time choice of a storage engine vs. indirect calls at runtime? Do you happen to have any statistics on hand regarding whether that change consistently influences latency/throughput?

We haven’t done any benchmarks. However, as I understand Toru and Trond’s design, I suspect it would not be particularly expensive. Because the key is at a fixed location no matter the storage engine, the assoc_find call would not need to call into the storage engine.

Thanks,
Tony

Anatoly Vorobey

unread,

Dec 17, 2008, 4:18:28 AM12/17/08

to memc...@googlegroups.com

Thanks.

I would be surprised if using direct vs indirect calls to the storage engine registered in any meaningful way at all, even if it were done inside assoc_find.

Advt

unread,

Jan 4, 2009, 10:43:15 AM1/4/09

to memcached

On Dec 13 2008, 11:24 am, marc kwiatkowski
<marc.kwiatkow...@gmail.com> wrote:
> As some of you may have noticed we've putfacebook'sversion of

> else we can do to make memcached even better. We atfacebookwould be

> happy to discuss questions and issues about our release in this group.
>
> Thanks

Hi Marc,

Thanks for making this public and congratz for the wonderful work !
Any idea on when will the PHP client that uses UDP be available, if
ever ?

Thanks,
Daniel

Du Song

unread,

Jan 8, 2009, 11:22:30 PM1/8/09

to memc...@googlegroups.com

It's already there.
in PECL-memcache 3.x
http://pecl.php.net/package/memcache

Regards,
Du Song

Arjen van der Meijden

unread,

Jan 9, 2009, 1:52:21 AM1/9/09

to memc...@googlegroups.com

On 9-1-2009 5:22 Du Song wrote:
> It's already there.
> in PECL-memcache 3.x
> http://pecl.php.net/package/memcache
>
> Regards,
> Du Song

That code is still beta, which afaik it should be, since at least we
repeatedly run into segfaults in php, which dissappear when using the
stable 2.2-version.
Unfortunately we can't produce a reliable test-case that also segfaults
on the developer's systems.

So if you have complex object-structures in your memcached, especially
when they do some work in the __wakeup(), you should be very careful
with choosing the 3.x-versions. Or hope you find another test case for
the same bug...
http://pecl.php.net/bugs/bug.php?id=13623

Best regards,

Arjen

Daniel Fiore

unread,

Jan 9, 2009, 10:56:03 AM1/9/09

to memc...@googlegroups.com

Erick Dennis

unread,

Jan 9, 2009, 11:14:08 AM1/9/09

to memc...@googlegroups.com

Josef Finsel

unread,

Jan 9, 2009, 11:22:10 AM1/9/09

to memc...@googlegroups.com

To unsubscribe, please go to http://googlegroups.com/group/memcached/subscribe

On Fri, Jan 9, 2009 at 11:14 AM, Erick Dennis <erick....@epublica.de> wrote:

--
"If you see a whole thing - it seems that it's always beautiful. Planets, lives... But up close a world's all dirt and rocks. And day to day, life's a hard job, you get tired, you lose the pattern."
Ursula K. Le Guin

http://www.finsel.com/words,-words,-words.aspx (My blog) - http://www.finsel.com/photo-gallery.aspx (My Photogallery) -http://www.reluctantdba.com/dbas-and-programmers/blog.aspx (My Professional Blog)

Advt

unread,

Jan 9, 2009, 3:05:49 PM1/9/09

to memcached

I see;
I'll check this 3.0.2b and see how it goes.

On another hand, by any chance, would it be possible for you Facebook
guys to explain better the changes you guys made on the kernel to
optimize it for UDP ? Diffs would be perfect =)
I'm in the process of getting some machines up to use exclusively with
memcached (8x2.5GHz w/ 64GB RAM) and it would be great if I could poke
around with the optimizations you guys made (based on what I read on
http://www.facebook.com/note.php?note_id=39391378919)

Thanks!

Reply all

Reply to author

Forward