Questions about RAM size and Persistent Storage limits

23 views
Skip to first unread message

trav

unread,
Mar 15, 2011, 1:40:14 PM3/15/11
to membase
I have some questions about the relationship between the size of the
working set of a node vs. the size of the node's RAM.

1. Does a Membase node work by keeping all keys of the working set
data in RAM? (I remember reading somewhere that it does this)

2. If question 1 is yes, does this mean there is a limit to the ratio
of RAM size to persistent storage? In other words, does the amount of
RAM in a node limit the total size of the working set?

3. If question 2 is yes, is there any way to work around this
limitation, using the ram as a cache for more frequently used data
while making slower requests to persistent storage for the rest of the
data (as opposed to the ram being the first point of i/o for sets/gets
while resorting to storage only when a value isn't in ram)?

4. Is there a way to immediately write to storage when doing a set,
rather than writing to ram first then having the persistence to
storage happen in the background?

5. Could you elaborate on the difference between memcached and
Membase? Is it that Membase is sort of like memcached but with the
ability to persist data in memory automatically?

I'd like to use a key-value store for my database, and I like Membase
because (if I understand it right) storage capacity can be added to
the "database" by adding nodes to a cluster. The only problem is I'm
starting with a limited budget and I want to get away with using as
little ram as possible on a VPS while not limiting the size of my data
set (since persistent storage is cheap). Is Membase the right fit for
this scenario (need to start with low ram, unlimited secondary
storage, desire to upgrade to more ram later on), or should I use a
more traditional cache / database solution? I apologize if I'm asking
obvious questions, but I don't have a ton of experience. Thanks in
advance for your help!

Perry Krug

unread,
Mar 15, 2011, 2:01:15 PM3/15/11
to mem...@googlegroups.com, trav
Hey Trav, inline:
 
I have some questions about the relationship between the size of the
working set of a node vs. the size of the node's RAM.
 

1. Does a Membase node work by keeping all keys of the working set
data in RAM? (I remember reading somewhere that it does this)
[pk] - Membase keeps as much data as possible in RAM.  This includes both replicas and active keys.  Replicas are the first to be "ejected" when space is needed, active items will come next.  When a request comes in for a key that is only on disk, it will be copied into RAM and remain there until space is needed again.  Also remember that we keep ALL our metadata and indexes in RAM for the best performance.  The sizing guideline above takes this into account, but you'll need to make sure you have enough RAM to track all your keys (active and replica) as well as having enough RAM left over to cache enough of your data for the best performance.   Some applications can deal with some data residing on disk, whereas others require the entire dataset to remain in RAM.  This is the concept of a "working set" and is very application dependent.

2. If question 1 is yes, does this mean there is a limit to the ratio
of RAM size to persistent storage?  In other words, does the amount of
RAM in a node limit the total size of the working      set?
[pk] - There is no limit to the ratio of RAM to disk, but the amount of RAM in a node does limit your "working set".

3. If question 2 is yes, is there any way to work around this
limitation, using the ram as a cache for more frequently used data
while making slower requests to persistent storage for the rest of the
data (as opposed to the ram being the first point of i/o for sets/gets
while resorting to storage only when a value isn't in ram)?
[pk] - There's no way to control this currently, though we are investigating different algorithms that will let us adapt better to the workloads of different applications. 

4. Is there a way to immediately write to storage when doing a set,
rather than writing to ram first then having the persistence to
storage happen in the background?
[pk] - No.  In the next major release (1.7) we are planning to implement a "synchronous replication" feature.  The actual 'set' operation will still be asynchronous to RAM, but you will be able to have a separate command that "waits" for an item (or list of items) to be replicated so you can have synchronous replication within your application while maintaining the pure performance that Membase provides.  Our 2.0 release plans to provide the same capability for persistence. 

5. Could you elaborate on the difference between memcached and
Membase?  Is it that Membase is sort of like memcached but with the
ability to persist data in memory automatically?
[pk] - Membase is designed to be a database.  It works quite well as a cache, and is completely compatible with the memcached protocol...but all the design considerations and achitecture are driven by Membase being a database.  The higher-level feature differences are replication, persistence and dynamic scalability (our vbucket and rebalancing concepts).  The last one is probably one of the most widely sought after features because it eliminates any "cold-cache" or "rehashing" problems when adding or removing servers. 

I'd like to use a key-value store for my database, and I like Membase
because (if I understand it right) storage capacity can be added to
the "database" by adding nodes to a cluster.  The only problem is I'm
starting with a limited budget and I want to get away with using as
little ram as possible on a VPS while not limiting the size of my data
set (since persistent storage is cheap).  Is Membase the right fit for
this scenario (need to start with low ram, unlimited secondary
storage, desire to upgrade to more ram later on), or should I use a
more traditional cache / database solution?  I apologize if I'm asking
obvious questions, but I don't have a ton of experience.  Thanks in
advance for your help!
[pk] - No apologies necessary!  I hope I answered your questions well enough above.  Whether Membase is right for you will depend a bit on the performance requirements of your application.  The software will work very well at whatever level you use it at, but you need to make sure that it is sized appropriately for your application.  We have a number of customers with relatively small amounts of RAM compared to their overall size, but the tradeoff is pretty significant in performance.  I can't say whether that it acceptable to you, but I can say that you will not find any better-performing database on the market right now when Membase is serving data from RAM.

Perry

Perry Krug

unread,
Mar 15, 2011, 3:59:30 PM3/15/11
to mem...@googlegroups.com, trav
Trav, to follow up a bit more on #3 below...

The interraction between RAM and disk should result in the behavior that you're looking for.  More frequently used data will stay in RAM while less frequently used will reside on disk.  There are cases where two datasets from differing workloads may conflict with each other...and that's the case where multiple buckets makes sense so that each one can be managed and sized independently.  The dataset with very low latency requirements could be given a bucket with more RAM while a different dataset may have a different working set and its bucket can be sized differently.

Perry Krug
System Engineering, Couchbase Inc.
direct: 831-824-4123
emailpe...@couchbase.com

Reply all
Reply to author
Forward
0 new messages