save/restore memcache db?

1,352 views
Skip to first unread message

PlumbersStock.com

unread,
Sep 9, 2008, 9:04:04 PM9/9/08
to memcached
Is there any technical reason that memcache shouldn't dump it's db to
disk when shutdown and restore it when started again? If save/restore
were options I could rewrite my start/stop scripts to do it - those
who didn't want it wouldn't have to have it.

This would be a handy option for me as it takes me hours to rebuild
after a system shutdown. My backend system is proprietary and slow
which is the main reason for caching everything in memcached in the
first place. I was caching everything in a MySQL db before but
memcached is quite a bit faster and less intensive on my server.

Josef Finsel

unread,
Sep 9, 2008, 9:15:34 PM9/9/08
to memc...@googlegroups.com
If you really want to do this, there are a couple of people who have created versions that do such a thing.

Or you could switch to Microsoft's Velocity.

One problem with reloading data from disk when you restart the data is the question of whether or not the data has changed between the stop and start of the memcache instance? If the data has changed then your cache is bad. Does that make sense?

Josef
--
"If you see a whole thing - it seems that it's always beautiful. Planets, lives... But up close a world's all dirt and rocks. And day to day, life's a hard job, you get tired, you lose the pattern."
Ursula K. Le Guin

PlumbersStock.com

unread,
Sep 9, 2008, 9:21:31 PM9/9/08
to memcached
Do you know how to find these variants? Name? I doubt Velocity works
on my Linux servers but I could be wrong. Not that I'd purposely use a
Microsoft product anyway.

I don't see how it matters if the data has changed or not. Expire what
is outdated as you'd normally do and assume the rest is valid until
it's updated. For that matter you never know if data is valid unless
you check it which sort of defeats the concept of caching it.

> If you *really* want to do this, there are a couple of people who have

Joseph Engo

unread,
Sep 9, 2008, 9:24:48 PM9/9/08
to memc...@googlegroups.com
You could have a secondary memcache pool that you warm up during
maintenance periods, then switch over temporarily (using a load
balancer).

Josef Finsel

unread,
Sep 9, 2008, 9:34:59 PM9/9/08
to memc...@googlegroups.com
No, Velocity wouldn't run on Linux.

As for the other variants, scan the lists and you'll find it quickly enough. Tugela has a disk based component you could try.

PlumbersStock.com

unread,
Sep 9, 2008, 9:38:14 PM9/9/08
to memcached
That sounds more complex than simply dumping a copy of what's in
memory to disk and restoring it five minutes later when the system is
restarted. If I understand correctly, memcached doesn't offer
replication so I'd have to somehow make it do so myself - right?

Josef Finsel

unread,
Sep 9, 2008, 9:52:04 PM9/9/08
to memc...@googlegroups.com
That's correct. memcached doesn't offer a way to dump itself to disk, primarily because the process of listing all of the existing entries is a blocking process.

If you can't find what you're looking for in the lists, you can always mod memcached to do this for you.




Josef Finsel

unread,
Sep 9, 2008, 9:53:41 PM9/9/08
to memc...@googlegroups.com
FYI... Brad expounds on why he didn't implement a save to disk function here (http://lists.danga.com/pipermail/memcached/2003-November/000368.html)


On Tue, Sep 9, 2008 at 9:52 PM, Josef Finsel <carp...@gmail.com> wrote:
That's correct. memcached doesn't offer a way to dump itself to disk, primarily because the process of listing all of the existing entries is a blocking process.

If you can't find what you're looking for in the lists, you can always mod memcached to do this for you.







Joseph Engo

unread,
Sep 9, 2008, 10:00:00 PM9/9/08
to memc...@googlegroups.com
I played around with a version of memcached that supported replication
and it didn't work very well. Normal get and set were ok, but
increments / decrement got all messed up during fail overs.

Just curious, how much data are you storing in your memcache ?

PlumbersStock.com

unread,
Sep 9, 2008, 10:03:12 PM9/9/08
to memcached
I don't see how it matters if it's a blocking process if it only
happens when memcached is being started or stopped. Sure I could mod
stuff or use other people's mods but then I have to keep changes in
sync. In general I hate trying to maintain patched copies of stuff.

I can understand there being some issues to think through with regard
to multiple cache servers and not using cache as persistent storage
but those issues wouldn't be issues for me. I'm just trying to avoid
slow data access. I don't store anything very important in cache but I
store a lot of slow to retrieve stuff in cache. I forgot it'd wipe my
cache when I slapped more RAM in the server last night - kind of
sucked. ;)

And sure you could ask why the backend ERP software on a nice IBM AIX
server takes forever to reload and uses up all our user licenses while
doing so but like I said it's a nasty 3rd party app I have no choice
over using.

PlumbersStock.com

unread,
Sep 9, 2008, 10:04:53 PM9/9/08
to memcached
Usually store maybe between 1 and 4GB of data but the backend the data
comes from is unreasonably slow. Nothing we can do about it. We've
bought a new $40,000 server for it this year just to make it bearable.

Clint Webb

unread,
Sep 9, 2008, 10:43:39 PM9/9/08
to memc...@googlegroups.com
With open-source software you basically get the features that programmers over time have wanted, and it was good enough to be included for others.  One of the cool things about open-source is that if the product is mostly what you want but misses a key feature that you want (but others dont seem to), you can normally find someone familiar with the code who would be willing to work on your feature for some dollars.

If you are willing to spend $40k on a server, put down a bounty of a few hundred bucks and see if anyone will pick it up.
--
"Be excellent to each other"

Steve Clay

unread,
Sep 10, 2008, 12:39:17 AM9/10/08
to memc...@googlegroups.com
PlumbersStock.com wrote:
> slow data access. I don't store anything very important in cache but I
> store a lot of slow to retrieve stuff in cache. I forgot it'd wipe my

Here's an idea you could use to avoid maintaining a forked memcache...

Request handling:

generate cache id
if mc->get(id)
serve from cache
else
generate data
serve data (possibly close client connection here)
mc->store(timestamp + data, id)
*store id to a file*

(*This basically becomes the key list that memcache doesn't have. You'd
want to save the id as fast as possible.)

Shutdown:

switch app to use different memcached server
run script to fetch every id in file and save to disk
shutdown memcached

Startup:

start memcached
run script to re-store() all data from disk
for any data you feel is too old (you'll have the timestamp)
generate it now (or just don't store it)
repoint app to use this memcached

If you want to make shutdown/startup quicker you could have a separate
process (not on user time, and throttled down in peak hours) whose job
is to watch the id list and sync memcached data to more permanent storage.

--
Steve Clay
http://mrclay.org/

PlumbersStock.com

unread,
Sep 10, 2008, 1:13:45 AM9/10/08
to memcached
For now I just created a second memcached server that gets a copy of
everything and which will automatically get hit when the first server
doesn't respond. Still seems less than ideal to me but we'll see how
it goes.

PlumbersStock.com

unread,
Sep 10, 2008, 10:10:03 PM9/10/08
to memcached
The problem with any software though is that if you can't convince the
owner of the software to add your feature into the main tree then
you're forever playing catchup. I could pay someone to add the feature
I want but I'd have to keep paying someone to keep it up-to-date. I
could write it myself but then I'd have to invest my own time into it
on an on-going basis. Neither is really practical. Oh well - I have a
sort of hack job to fix the problem now but I'm afraid it'll have a
small negative speed impact all the time rather than just taking care
of things on the rare time I shutdown and restart memcached.

Stephen Johnston

unread,
Sep 10, 2008, 10:23:17 PM9/10/08
to memc...@googlegroups.com
One would hope that any important contribution, written by a competent developer, would find it's way into the main trunk of code instead of forking. This is the catch-22 of open source. It really becomes a project managment excercise.

PlumbersStock.com

unread,
Sep 10, 2008, 10:32:11 PM9/10/08
to memcached
It sounds as if this feature has been added by others already and
hasn't found it's way into the main tree. Maybe it just wasn't coded
well enough to make it in. It seems it should be a really simple
feature to add, that shouldn't interfere with the running cache in any
way, and those whose use of the cache would make using save/restore a
bad thing could just choose not to use it. I can understand not adding
features that would have a negative impact on the project but stuff
that is essentially painless I can't understand leaving out. Project
management is never fun. At least with open source projects if I have
to have the feature I don't have to pay $300+ an hour and wait months
to get it done.

On Sep 10, 8:23 pm, "Stephen Johnston"
<stephen.johns...@guildlaunch.com> wrote:
> One would hope that any important contribution, written by a competent
> developer, would find it's way into the main trunk of code instead of
> forking. This is the catch-22 of open source. It really becomes a project
> managment excercise.
>
> On Wed, Sep 10, 2008 at 10:10 PM, PlumbersStock.com <
>

Clint Webb

unread,
Sep 10, 2008, 11:02:30 PM9/10/08
to memc...@googlegroups.com
I've not seen any patches submitted that do this.  The other things that do this are either different products, supplemental products, or forks that attempt entirely different things (but happen to do what you are after).

I expect a well implemented patch that does this would be accepted into the main branch. 

PlumbersStock.com

unread,
Sep 10, 2008, 11:10:58 PM9/10/08
to memcached
In that case.. Would anyone be interested in collecting a bounty on
getting a save/restore feature created that would be accepted into the
main branch? What would be a fair bounty for something like this?

Option to save all items in memory to hdd on shutdown of memcached.
Option to load saved items from hdd to memory on start of memcached.
Option to load, in addition to memory dump, a changes list from a text
file (some simple to produce format - up to you) on start of
memcached. Changes would include anything memcached can be asked to
do.

Chris Goffinet

unread,
Sep 10, 2008, 11:15:54 PM9/10/08
to memc...@googlegroups.com, JOHN JAWED
John Jawed from Yahoo has created a patch for this exact thing, that
we run in our version. CC'ing John.

John any word from legal? I would say just release the damn thing ;)


--
Chris Goffinet
MyBlogLog Senior Performance Engineer

Yahoo!
San Francisco, CA
United States

Trond Norbye

unread,
Sep 11, 2008, 1:04:51 AM9/11/08
to memc...@googlegroups.com

On Sep 11, 2008, at 5:10 AM, PlumbersStock.com wrote:

>
> In that case.. Would anyone be interested in collecting a bounty on
> getting a save/restore feature created that would be accepted into the
> main branch? What would be a fair bounty for something like this?
>
> Option to save all items in memory to hdd on shutdown of memcached.
> Option to load saved items from hdd to memory on start of memcached.
> Option to load, in addition to memory dump, a changes list from a text
> file (some simple to produce format - up to you) on start of
> memcached. Changes would include anything memcached can be asked to
> do.

We are working on creating a "storage interface" to memcached, so that
you can create you own back-end. This sounds like a pretty easy task
to implement in the prototype we have....

Cheers,

Trond

Toru Maesaka

unread,
Sep 11, 2008, 2:05:01 AM9/11/08
to memc...@googlegroups.com
Hi!

> We are working on creating a "storage interface" to memcached, so that you
> can create you own back-end. This sounds like a pretty easy task to
> implement in the prototype we have....

Indeed, with the pluggable storage engine that is going on, whats
been debated in this thread would be trivial to achieve. I'm sure
heaps of people would like this asap but at the moment, bug fixes and
the binary protocol takes priority so it would be awesome if people
could resist from taking action, such as forking.

One problem with memcached forks is that some forks have really nice
features but most (if not all) forks are unlikely to be
noticed/exposed/trusted/used by a lot of web developers/shops. This is
sad since this means that so much effort and knowledge is going to
waste. We should hopefully be able to eliminate that waste and the
likelihood of people forking memcached with the pluggable storage
engine architecture :)

Cheers,
Toru

dormando

unread,
Sep 11, 2008, 9:40:27 PM9/11/08
to memc...@googlegroups.com
Yeah, what Toru/Trond said.

dormando

unread,
Sep 11, 2008, 9:54:26 PM9/11/08
to memcached
I don't actually know how yahoo's imeplentation works, but before anyone
goes off the deep end:

- dumping in protocol format is a good idea.
- because you don't have to write code to reload that data.
- and you can edit it.
- but you might have to translate the expires times into explicit dates.

PlumbersStock.com ... if that is your real name, can you please be a
little more polite with the list?

Memcached has a long history (notable excerpts of which have been pasted
for your reading pleasure) of hashing out this feature. While we've agreed
that one adhering to the constraints listed above (ie; no explicit funky
dump/reload format) might be accepted, I would hope you don't view us (and
hope others don't get the impression of us as) a bunch of idiots who don't
understand your problems.

This approach simply doesn't work for most of us. I've already seen some
good suggestions of alternatives you could implement with technology that
exists today, without paying a bounty, and would work just fine. No need
to complain.

One idea would be to actually use an existing fork, like memcachedb. Sure,
our ultimate goal is to fold that feature back in, but hey it exists now
and someone spent some effort into making that fork.

Another method would be to just cache into MySQL and use the
libmemcached-based MySQL UDF's to update memcached. I suspect you've only
got the one box? Maybe two?

If not, it's not terribly hard to switch to a libmemcached based client.
If it is hard you can still do this without the UDF's... They just need to
be consistent on both ends so the key hashing lines up. Which doesn't
matter if you only have the one box.

Anyway, nothing fancy. Just create tables with the key, flags, expiredate,
value. INSERT into MySQL and SET into memcached at the same time. Then
read off of memcached. Then you just need a simple wrapper that SELECT *
FROM table WHERE expiredate > NOW(); and reload your data. From the
description of your awful horrible terrible backend system, it sounds like
it'd benefit you more from being able to fall back to MySQL on a memcached
cache miss anyway.

Most of us view these various scalability tools more like UNIX programs
than a batshit huge enterprise system. The idea is to take simple concepts
and plug them together on until they work.

-Dormando

dormando

unread,
Sep 11, 2008, 9:58:18 PM9/11/08
to memcached
Almost everybody who initially asks for this feature later figures out
that restarting a memcached with stale date doesn't work for their
application.

So at least with the dozens of people I've talked with about this subject,
the demand drops quickly. No big push, no follow up. I guess one or two
people have implemented it, but not gotten the code back to us.
Personally, I still don't see a use for it so I won't be writing that code
myself.

On Wed, 10 Sep 2008, PlumbersStock.com wrote:

Reply all
Reply to author
Forward
0 new messages