membase + memcached

171 views
Skip to first unread message

Matthew John

unread,
Jul 8, 2011, 7:40:53 AM7/8/11
to Couchbase
Hi all,

I am pretty new with memcached + membase. I tried using a combination
of both to serve my app's storage. Previously I used to work with
memcached + mysql. In that scenario, I used to frequently commit my
cache data to the database inorder to avoid data loss in the cache and
mysql is designed for that (immediate write).

In this context, I have two queries:

1) But now, with membase advocating Lazy writes, how would I save the
data in my memcached in case of a server failure? Is there some way of
committing the memcached data into membase at some regular intervals?

2) Every time I recover from a failure, I would want to fill up my
memcached with data. In case of mysql, I could comfortably do a
"select * from *" inorder to fill up the memcached with the required
data. How would I do such a "warm up" in case of memcached + membase
combo?

Please help me with these queries! :) :)

Thanks,
Matthew

Chad Kouse

unread,
Jul 8, 2011, 11:58:07 AM7/8/11
to couc...@googlegroups.com
Matthew,

Membase can basically replace memcached if you want to persist all of
your data to disk. Someone else will need to answer your question
about "lazy writes" because as far as I know it only waits a very
small amount of time before flushing to disk ( < 1/2 a second?)

Secondly, you don't need to do anything to refill your cache in the
case of a crash, membase does that automatically at startup.

--chad

Matthew John

unread,
Jul 8, 2011, 12:21:16 PM7/8/11
to couc...@googlegroups.com

Hi Chad and others!

Thanks for the reply!

I have been following the tutorial: http://sujee.net/tech/articles/membase-tutorial-1/ which talks about using spymemcached (v 2.5) with java inorder to write data to disk using Membase.

I set up membase and followed the tutorial.
MemcachedClient cache = new MemcachedClient(new InetSocketAddress(server, port));
where server = localhost and port = 11211.

I used the following specs for the buckets: memcached bucket - 100MB, membase bucket - 50 GB.
I performed continuous writes (exceeding 100MB) and I got to understand that the cache evicted records did get written to disk (not sure about the less than half a second factor). Every batch of write (say 25MB), I did the MemcachedClient shutdown gracefully so that it gets time to commit the date to the disk. But still what I came to understand is that the last few write batches stayed in the cache and dint get commited. I came to this conclusion because I restarted the Membase server again and got to see that  those records which were supposed to reside in the cache returned NULL. Please enlighten me on such a behavior :).

I am a newbie at Membase. Kindly let me know what could be the problem or how to ensure that data is actually written to membase.

Thanks,
Matthew

Perry Krug

unread,
Jul 8, 2011, 12:33:16 PM7/8/11
to couc...@googlegroups.com
I think the confusion here is that there is no direct relationship between a memcached bucket and a Membase bucket.

Any data you write into a memcached bucket will be subject to the memcached semantics of eviction (when the cache fills up, it will throw data away to make room for more) and volatility (it's only stored in RAM, if the server reboots, that data is gone).

However, any data you write into the Membase bucket has slightly different semantics.  A Membase bucket combines a memcached caching layer with a disk persistence layer.  This is all transparent to you.  Data access is done via the same memcached protocol, but that's pretty much where the similarities end.  Membase buckets "cache" their data within the RAM layer, and asynchronously replicate and persist it to disk.

In your case, it sounds like you only need the Membase bucket.  You should see the same read/write performance as long as your data is held in RAM (it's written to disk automatically in the background, but cached in RAM for performance).  If you put more data in than you have RAM available, it will "eject" some data.  This means that the value cached in RAM is replaced with a pointer to it's location on disk (meaning it's already been written) so that the RAM can be reclaimed for other data.  A request for this data would then generate a "disk fetch", pulling it in from disk and re-caching it in RAM.

Try only writing data into and out of a Membase bucket, and then repeat your test of rebooting the server...let me know how it goes!

Perry Krug
Solutions Architect
direct: 831-824-4123
emailpe...@couchbase.com

Matthew John

unread,
Jul 11, 2011, 3:00:58 AM7/11/11
to couc...@googlegroups.com
Hi Perry!

You are right! I never used memcached in my configuration. The misunderstood the RAM allotted in the Membase bucket as the Memcached :) !

So this time I did try to write data chunks to the membase buckets:
1) as you said, the data was getting written to the disk (persistence)

2) after server restart, I was able to retreive all the data since everything got persistent.

Certain doubts still looming around in my mind: 

1) After restart, I tried reading a certain chunk of data again and again. I was expecting the RAM associated with the Membase bucket to store this data chunk the first time I read it from the disk. But this did not happen. I had to hit the disk everytime I wanted to read the same the data set. Why is this happening? Has this got something to do with the RAM/Caching semantics within Membase?

2) As I mentioned earlier I was using a JAVA api (spymemcachedclient) for connecting to Membase bucket:
MemcachedClient cache = new MemcachedClient(new InetSocketAddress(server, port));
where server = localhost and port = 11211.
But when I gave a "cache.flush" it deleted all the pointers to the data residing on the disk too. That means I was not able to access any data after that (persistent or not). Is this behavior expected?

3) I understand that its the port number and the ip address that is used to configure the membase/memcached bucket in the Membase server GUI. Now if I wanted to intergrate a memcached bucket also into my dataflow how would I do that? I understand that I will need to provide a separate port number for the memcached bucket. And then, while initializing a MemcachedClient instance I can only provide a single port number (the membase or memcached one). How can I ensure that both the buckets come in the flow (data to memcached first and then to membase (persistence)) .

Please help me with these queries,

Thanks,
Matthew

Perry Krug

unread,
Jul 13, 2011, 11:32:13 AM7/13/11
to couc...@googlegroups.com
Matthew, responses inline:

1) After restart, I tried reading a certain chunk of data again and again. I was expecting the RAM associated with the Membase bucket to store this data chunk the first time I read it from the disk. But this did not happen. I had to hit the disk everytime I wanted to read the same the data set. Why is this happening? Has this got something to do with the RAM/Caching semantics within Membase?
[pk] - It's possible that you don't have enough RAM configured to effectively "cache" the data as you're expecting.  By design, that data SHOULD be kept in RAM so that you don't have to hit disk each time.  Can you post a screenshot of the statistics screen for your bucket with the "summary" and "vbucket resources" tabs open? 

2) As I mentioned earlier I was using a JAVA api (spymemcachedclient) for connecting to Membase bucket:
MemcachedClient cache = new MemcachedClient(new InetSocketAddress(server, port));
where server = localhost and port = 11211.
But when I gave a "cache.flush" it deleted all the pointers to the data residing on the disk too. That means I was not able to access any data after that (persistent or not). Is this behavior expected?
[pk] - "cache.flush" was originally designed for the memcached protocol and is designed to completely clear out all the data.  Membase does not understand this as anything different, and so will delete all the data contained within a bucket (both RAM and disk).  While it may be useful from a testing perspective, we certainly don't recommend using this in production.  Also, you may want to take a look at http://www.couchbase.org/products/sdk/membase-java which has links and information regarding the updated spymemcached which is designed to work directly against Membase (rather than treating those servers as regular memcached servers).  The main benefits come in performance gains and the ability to dynamically add/remove servers from a cluster.


3) I understand that its the port number and the ip address that is used to configure the membase/memcached bucket in the Membase server GUI. Now if I wanted to intergrate a memcached bucket also into my dataflow how would I do that? I understand that I will need to provide a separate port number for the memcached bucket. And then, while initializing a MemcachedClient instance I can only provide a single port number (the membase or memcached one). How can I ensure that both the buckets come in the flow (data to memcached first and then to membase (persistence)) .
[pk] - You won't be able to have both a memcached bucket AND a Membase bucket within the same client.  It is up to your code to determine which data to put in which bucket and when.  Thus, you can create multiple client instances and control the data flow with your application.  The above link will also show you how to effectively use multiple buckets.  As I mentioned before, there shouldn't be a need to have both a memcached bucket and a Membase bucket in the same workflow as Membase provides a RAM-based caching layer built-in.

Hope that helps, keep the questions coming! 
Reply all
Reply to author
Forward
0 new messages