Expiration. Again.

92 views
Skip to first unread message

WP in Canada

unread,
Mar 18, 2010, 3:36:30 PM3/18/10
to memcachedb
It seems bizarre to me that MemCacheDB still requires that a SET
command have an expiration parameter,
which it ignores.

I would like this to be a command-line switch (--enable-expiration).
Doing your own is a pain. What if you want client information to be
persisted for 4 days, but deleted after that. It would be great if
when the record was created, a queue somewhere else that said "hey,
delete this if it hasn't been touched by now" was created.

It's been mentioned as a possible future item. Also, it would be
great if you could give an error on non-zero expiration. That way the
client knows that their request was denied. (--enforce-zero-expiration-
set) (SET command with nonzero expiration would fail).

These seem not only a matter of completeness but correctness. Just
as C/C++ compilers generate warnings when the code you write is
obviously wrong here, so should this server.

Warren

Clint Byrum

unread,
Mar 18, 2010, 3:59:06 PM3/18/10
to memca...@googlegroups.com
In the past I've felt strongly that because MemcacheDB is a database, and not a cache, it shouldn't implement the expiration features. However, seeing how it has been used now, it seems that since we have an expire parameter, we might as well use it.

It would be pretty simple to, at set time, store the unix timestamp of when this record ceases to be valid. Upon read of a record where that timestamp is in the past, memcachedb would see that, and return as if it were not in existence. Most applications will immediately come along and replace the record, so no need to delete it. Further, this time stamp could be used as the key to a secondary index b-tree to allow an expunger thread to simply get rid of them periodically (tunables to prevent this from deleting too fast would be nice, so that users can choose to let the db grow a bit rather than overwhelm their system with deletes).

So, the next question is, who wants to sponsor such an effort? I've already done this all in a PHP wrapper, but yeah, it would be cool to have the daemon doing it.

> --
> You received this message because you are subscribed to the Google Groups "memcachedb" group.
> To post to this group, send email to memca...@googlegroups.com.
> To unsubscribe from this group, send email to memcachedb+...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/memcachedb?hl=en.
>

Brad Bendy

unread,
Apr 17, 2010, 1:17:58 PM4/17/10
to memcachedb
Anyone willing to go in on a bounty for this? Im having to make some
custom wrapper for another app since no expiration in memcachedb,
would sure be easier if it was in it.

Id attempt to add it myself but im not a C/C++ coder and it would take
much longer then it's worth to try it myself.

I think Clint's approach sounds like a winner, having a thread come
around and delete the data would be ideal and even at a certrain time,
our cache is hit heavy for about 16 hours then dead for 4-6 then back
on fire again.

Gregory Burd

unread,
Apr 17, 2010, 2:30:12 PM4/17/10
to memca...@googlegroups.com
Brad, et. al.,

We have a new access method coming along in the next release
specifically designed for caching applications called HEAP. Here's a
bit from the spec:

"""
1. Introduction
The HEAP access method project will add a brand new access method to
Berkeley DB, storing records in a heap file. This access method will
avoid the use of a b-tree structure entirely, records will be referenced
solely by the page and offset at which they are written. The goal of the
heap access method is to provide better reuse of space following
deletes. Further optimizations are being considered which would use bulk
load to improve write speeds.

2. Use cases
The motivating use case for the heap access method came from [a large
cached content provider] looking for a way to run BDB on multiple,
identically configured machines without needing to manage the space
consumed by BDB. Simply aging old data out of a b-tree database,
however, does not immediately free up reusable space in the database.
The database must be frequently compacted to consolidate free space on
leaf pages, but compaction is an expensive operation. The heap access
method will eliminate the need for compaction and will typically be used
when the application runs in an environment with constrained disk space.
"""

Help me understand more about your use case, and if this might cover it,
so that I can feed that into engineering. Perhaps a future version of
MemcacheDB could optionally use HEAP rather than BTREE.

Let me know your thoughts.

regards,

-greg
Product Manager, Oracle Berkeley DB

Brad Bendy

unread,
Apr 17, 2010, 2:47:19 PM4/17/10
to memca...@googlegroups.com
Greg,

This sounds ideal. We are using memcachedb to store phone numbers then every 30 days we want to remove the entries then our other application will re-add if needed then remove after another 30 days.

I did think about the issue with BDB and then how to free up the space once we remove the numbers and figure a way to cross that bridge when we get there. We are going to be storing around 500,000 entries per day so the cache store will get quite large and each month it will get more and more inserts into it, at some point I think we will need to run multiple stores and break them down somehow so each store has a data set, in our case we would break them down by area code, have all the 2xx area codes on one store, 3xx on another, etc.

Memcache would work great for this except the amount of RAM we would need is costly and we have the issue if we ever have to reboot it would take a very long to re-populate this data from our MySQL DB and is a bad option, our sofrware works with memcached directly do memcachedb is the perfect solution minus these few issues.

Do you know the performance that this new access method will have, im assuming it's the same speed as b-tree just a different way it acces's data?

Thanks for the info and let me know if I can be of any assistance in testing or anything.

Steve Chu

unread,
Apr 18, 2010, 12:14:38 AM4/18/10
to memca...@googlegroups.com
Hi, greg,

Really nice feature! One of most important reasons that memcachedb
does not support expiration is the poor deletion performance of btree.
It is not suitable for cache. Compaction really gives performance a
hit. Once you guys support the HEAP access method, I will support it
in memcachedb immediately.

Another requirement for HEAP is expiration, based on LRU or others.
Let us call this a persistent cache engine, which has a fixed-size
memory cache with a fixed size swap file in backend. It is
self-maintaining and can failover from a power failure, etc. Let me
know, if you need more infos.

Regards,

Steve
--
Best Regards,

Steve Chu
http://stvchu.org
Reply all
Reply to author
Forward
0 new messages