Geographically distributed memcached clusters

903 views
Skip to first unread message

Chris

unread,
Sep 11, 2008, 3:27:11 PM9/11/08
to memcached
I was wondering if anyone had any better solutions for cache
consistency with geographically distributed memcached clusters.

The problem: Having just one big memcached cluster is great if you
only have one datacenter, but if you have datacenters in a couple
different locations around the world, latency becomes a big problem.
Making a couple memcached queries from US -> Europe for a single
client request can make page loads unacceptably slow.

Our current solution to this problem is to have multiple memcached
clusters, one for each geographic region (Europe/US/Asia).
Unfortunately, keeping them in sync with the underlying data (mysql,
using replication) in an unpleasant problem.

Facebook had a solution to this that they wrote about on their
engineering blog ( http://www.facebook.com/note.php?note_id=23844338919
). They modified the mysql query grammar to support a list of keys to
invalidate.

Does anyone have any other interesting solutions to this problem?
(Keeping in mind that "only using one memcached cluster" likely won't
work because there is too much latency)

Gavin M. Roy

unread,
Sep 11, 2008, 3:44:43 PM9/11/08
to memc...@googlegroups.com
You could use something like Apache ActiveMQ and consumer scripts to accomplish this.  You could support the whole memcache grammar and have a consumer that just repeats commands into distributed memcached clusters.

Regards,

Gavin

Chris

unread,
Sep 11, 2008, 3:56:25 PM9/11/08
to memcached
Another "nice to have" - security. Each datacenter has its own
private network, and memcached only listens on the private networks.

It would be nice to configure any services listening on public
interfaces to only accept connections from a specific IP address
range, or only from a list of users who authenticate themselves with a
signed certificate.

ActiveMQ may accomplish these goals - I'll take a look.


On Sep 11, 3:44 pm, "Gavin M. Roy" <g...@myyearbook.com> wrote:
> You could use something like Apache ActiveMQ and consumer scripts to
> accomplish this.  You could support the whole memcache grammar and have a
> consumer that just repeats commands into distributed memcached clusters.
> Regards,
>
> Gavin
>
> On Thu, Sep 11, 2008 at 3:27 PM, Chris <christem...@gmail.com> wrote:
>
> > I was wondering if anyone had any better solutions for cache
> > consistency with geographically distributed memcached clusters.
>
> > The problem: Having just one big memcached cluster is great if you
> > only have one datacenter, but if you have datacenters in a couple
> > different locations around the world, latency becomes a big problem.
> > Making a couple memcached queries from US -> Europe for a single
> > client request can make page loads unacceptably slow.
>
> > Our current solution to this problem is to have multiple memcached
> > clusters, one for each geographic region (Europe/US/Asia).
> > Unfortunately, keeping them in sync with the underlying data (mysql,
> > using replication) in an unpleasant problem.
>
> > Facebook had a solution to this that they wrote about on their
> > engineering blog (http://www.facebook.com/note.php?note_id=23844338919

James Ranson

unread,
Sep 11, 2008, 4:20:03 PM9/11/08
to memc...@googlegroups.com
I disagree with employing native IP filtering within the memcached server. If your environment utilizes multiple datacenters, it's probably best to manage access control lists at the physical network layer (e.g., your ingress router). Otherwise, on linux, IPTables exists for host kernel-level IP restriction. If it's absolutely necessary to become an application-based restriction, work should be done to incorporate TCP Wrappers on the linux builds. To encrypt cross-datacenter traffic, you can employ out-of-the-box IP Sec tunneling, or even something more basic like TCP tunneling over SSH. Lots of standardized options here that don't require any modification to the memcached server.

PlumbersStock.com

unread,
Sep 11, 2008, 4:28:40 PM9/11/08
to memcached
Wouldn't it be easiest just to employ local caches that pull from a
central cache that pulls from your original data? That's probably what
I'd do but I don't do anything fancy with the cache. Maybe with a
little extra logic to cause the local cache items to expire when the
central cache expires.

dormando

unread,
Sep 11, 2008, 10:13:33 PM9/11/08
to memcached
I would hope that your datacenters are connected by some form of private
link...

Anyway, I *do* in fact like keeping the mysql updates paired with the
memcached updates. This only applies if your remote datacenters are
allowed to update their cache using local read slaves (which they should,
I'd guess). Odds are whatever other system you implement will at least
sometimes replay a remote mysql delete before the mysql update hits the
remote datacenter. This lag is also why facebook's article talks about
having that cookie which'll pin you to your active datacenter for a few
seconds after an update...

There's some text on this... was hoping to put up a nice FAQ entry or post
on it, but.. time... urgh.. anyway.

Instead of modifying the MySQL grammar to do it, you can use the
libmemcached MySQL UDF's by brian and patrick and embed DELETE or SET
commands in with your INSERT's. So long as all of your clients are
libmemcached based, your server lists should hash correctly. It's a little
complicated but doable.

-Dormando

Chris

unread,
Sep 11, 2008, 10:16:38 PM9/11/08
to memcached
Whenever a database write occurs, you then need to invalidate every
cache. Or, as you mentioned, invalidate your local cache, the central
cache, and then have some additional application to notify all the
other local caches that something has changed.

That's along the lines of what Gavin recommended above - using
ActiveMQ to send those notification messages to all the appropriate
caches.

On Sep 11, 4:28 pm, "PlumbersStock.com" <micha...@plumbersstock.com>
wrote:

wenxing zheng

unread,
Sep 11, 2008, 10:48:31 PM9/11/08
to memc...@googlegroups.com
Hello guys,

    We design our application on the Sun Solaris 10 and Oracle 10g under the Real Application Cluster. For the high cost of the memory db Timesten from Oracle, we decide to implement our own memory cache to improve the efficiency of the db access in pure C++.
 
    The RAC is a 2-node system, and we deploy our application respecitively onto these 2 nodes. Each node has 20 active processes and 20 standby processes,standby processes are the backup of the active processes on the other node.  As you know, under the RAC, actually there is only one physical database and the synchronization between the nodes is the clusterware's staff.
    I have googled the Memory db on the network, and find that the topic is not so rich. It's said that there are many open sources on the memory db cache, but i can't find any except the memcached.
    As to our application scenario, can anyone here give me some hints? Apprecicated for your help.
 
--
Best regards, Zheng wenxing


Reply all
Reply to author
Forward
0 new messages