Few questions about new Couchbase Server 2.0

714 views
Skip to first unread message

yojimbo87

unread,
Jul 30, 2011, 3:32:48 PM7/30/11
to Couchbase
Recently I stumbled upon two articles [1][2] about the release of new
Couchbase Server 2.0 and I have a few questions:

1. If I understand it correctly Couchbase Server is a combination of
CouchDB (or Couchbase Single Server) and Membase merged into single
NoSQL solution which offers advantages of both products - Membase
Server as a key-value in-memory cache store and the CouchDB as disk
persistent document database?

2. What will happen with the separate database products - Couchbase
Single Server and Membase - will they eventually be replaced by the
Couchbase Server or do they continue to be offered as standalone
solutions?

3. Will Couchbase Server in the future be offered both as enterprise
and also in free/open source/community editions? What would be the
marginal difference?

4. Regarding the new UnQL stuff - will (or is) Couchbase Server (2.0+)
support adhoc queries based on this technology alongside with it's map/
reduce views? Are there any resources availabable yet?

5. Will Couchbase Server require to write new set of client libraries
for different languages/platforms? For example I'm using cradle
node.js module to communicate both with CouchDB and Couchbase Single
Server, but what about the new Couchbase Server?

Thanks in advance for your time and patience when answering my
questions.

[1]: http://www.readwriteweb.com/cloud/2011/07/couchbase-releases-developer-preview.php
[2]: http://gigaom.com/cloud/couchbase-2-0-unql-sql-nosql/

Jan Lehnardt

unread,
Jul 30, 2011, 8:38:16 PM7/30/11
to couc...@googlegroups.com
Hi,

thanks for writing, I'll add my replies inline:

On 30 Jul 2011, at 12:32, yojimbo87 wrote:

> Recently I stumbled upon two articles [1][2] about the release of new
> Couchbase Server 2.0 and I have a few questions:
>
> 1. If I understand it correctly Couchbase Server is a combination of
> CouchDB (or Couchbase Single Server) and Membase merged into single
> NoSQL solution which offers advantages of both products - Membase
> Server as a key-value in-memory cache store and the CouchDB as disk
> persistent document database?

Membase has had persistence functionality for a while now. Prior to
2.0 we used SQLite for that, with 2.0 and onward we are using CouchDB
as the persistence layer.


> 2. What will happen with the separate database products - Couchbase
> Single Server and Membase - will they eventually be replaced by the
> Couchbase Server or do they continue to be offered as standalone
> solutions?

The Membase product line will be superseded by Couchbase 2.0.

Couchbase Single Server 2.0 is our edition for users that do not need
the advanced cluster functionality of Couchbase 2.0.

So Membase is going to be replaced, while Couchbase Single Server is
the (duh) single server edition of our Couchbase 2.0 product line
(there is also a mobile edition, to round off our offering).


> 3. Will Couchbase Server in the future be offered both as enterprise
> and also in free/open source/community editions? What would be the
> marginal difference?

Yes, we are very much committed to the open source community as well
as our paying customers. We believe in open source and open participation
to build better products. Enterprise users and customers can purchase
commercial licenses to gain access to our support team that is there for
them in times of need.

At this point we do not plan to have any proprietary extensions that
wouldn't be available in the community edition. All our code is free
and open source if you choose to. (I can't stress enough how proud
this makes me personally :)


> 4. Regarding the new UnQL stuff - will (or is) Couchbase Server (2.0+)
> support adhoc queries based on this technology alongside with it's map/
> reduce views? Are there any resources availabable yet?

The current state of UnQL is documented on http://unqlspec.org/ — UnQL
will work alongside the traditional Map/Reduce views and other querying
options like GeoCouch and couchdb-lucene (and everything else that is
out there).

> 5. Will Couchbase Server require to write new set of client libraries
> for different languages/platforms? For example I'm using cradle
> node.js module to communicate both with CouchDB and Couchbase Single
> Server, but what about the new Couchbase Server?

We are aiming to make Couchbase a drop-in replacement for all memcached,
Membase and CouchDB users.That said, we are not there yet (bear with us :)

Couchbase 2.0 will be fully memcached and Membase compatible and you will
not need to rewrite anything written for those servers. If you want to
use new features, we encourage to use our (new, open source) SDK packages
for you programming language.

Couchbase Single Server 2.0 is 100% Apache CouchDB compatible and you won't
need to rewrite anything again.

Future versions (currently planned for 3.0, but let us get 2.0 out of the
door first :) will be compatible as outlined above (i.e. with everything :)

For you (node.js / cradle) this means:

- You can use Couchbase Single Server 2.0 just like you are using Apache
CouchDB (and earlier Couchbase Single Server releases) today.

- To use Couchbase 2.0 (the cluster version), you will need to use a
memcached or Membase compatible client library for k/v access and
a plain old HTTP library for view requests. In fact though, regular
CouchDB APIs should also work. We haven't tried cradle yet, but it's
definitely on our list of things to do.

We understand that the initial promise (just works) is not there yet,
but we are working hard on getting there. In addition, all this is open
source, so we are as grateful as we are interested in your feedback in
form of questions, code, or complaints :)

Please let us know if you have any more questions!

Cheers
Jan
--

yojimbo87

unread,
Jul 31, 2011, 5:14:47 AM7/31/11
to Couchbase
Hi Jan,

thank you very much for your comprehensive answers to my questions. I
would also like to thank the whole Couchbase team for making this
technology and products real.

4. So the UnQL will eventually be used for adhoc querying of data from
Couchbase Server (in other words what I want to know is if UnQL will
be used as *standalone* adhoc querying mechanism or it will be
dependent on the data populated by map/reduce views)? Will be UnQL in
the future supported also by Couchbase Single Server?

6. I have recently done some simple disk space consumption tests [1]
and I'm curious now when the Couchbase Server consists of both,
membase k/v store and CouchDB document based technologies, if the
situation with the frequently updated data and it's compaction changed
somehow.

7. Also regarding saving data and it's durability, since membase is in-
memory k/v store, how often are data between membase and couchdb layer
safely persisted to disk to prevent their loss in case of unexpected
node failure?

8. Is Couchbase Server similary durable as CouchDB/Couchbase single
server is in case of node failure - if node is unexpectedly shut down
or restarted is there no need for initial data restoration process
when the node is online again?

Thanks again for your effort and time when answering my questions.

[1]: http://yojimbo87.github.com/2011/07/15/couchdb-disk-space.html
> The current state of UnQL is documented onhttp://unqlspec.org/— UnQL
> > [1]:http://www.readwriteweb.com/cloud/2011/07/couchbase-releases-develope...
> > [2]:http://gigaom.com/cloud/couchbase-2-0-unql-sql-nosql/

James Phillips - Personal

unread,
Jul 31, 2011, 6:27:55 AM7/31/11
to couc...@googlegroups.com
Hi Jan,

thank you very much for your comprehensive answers to my questions. I
would also like to thank the whole Couchbase team for making this
technology and products real.

4. So the UnQL will eventually be used for adhoc querying of data from
Couchbase Server (in other words what I want to know is if UnQL will
be used as *standalone* adhoc querying mechanism or it will be
dependent on the data populated by map/reduce views)? Will be UnQL in
the future supported also by Couchbase Single Server?

>> UnQL is intended to provide ad hoc query capability not dependent on the pre-creation of map-reduce views. It *may* be the case that UnQL uses the underlying map-reduce engine to return the answers to these ad hoc queries, but that would be invisible to the user. As far as the user is concerned, an ad hoc UnQL query string is sent and results return. All else is internal implementation detail --- still working out those details. Yes, we currently intend to support this in single as well.
 
6. I have recently done some simple disk space consumption tests [1]
and I'm curious now when the Couchbase Server consists of both,
membase k/v store and CouchDB document based technologies, if the
situation with the frequently updated data and it's compaction changed
somehow.
 
>>>The Couchbase Server wont really consist of both a K/V store and the CouchDB document store. All items are stored in CouchDB underneath (whether it is a K-V pair or whether the V is actually a doument...). Couchbase Server does autocompaction, so if frequent changes are being made to the data, they will be compacted down regularly. In addition, if changes are being made more rapidly than the disk is capable of ingesting, there is an additional layer of de-duplication that occurs in the queue, minimizing the number of disk writes that occur.
 
7. Also regarding saving data and it's durability, since membase is in-
memory k/v store, how often are data between membase and couchdb layer
safely persisted to disk to prevent their loss in case of unexpected
node failure?
 
>>>As fast as the disk can ingest it, is the short answer. Things are immediately queued for persistence upon any change, and there are threads that exist solely to drain the persistence queue as fast as possible. Unless you are making changes to data faster than the disk permits writes, the write will occur immediately. You can easily see, on the Couchbase Admin console, precisely how much data is sitting in the write queue, how fast the queue is being filled, and how fast it is being drained. This can help you size your cluster appropriately based on your needs.
 
8. Is Couchbase Server similary durable as CouchDB/Couchbase single
server is in case of node failure - if node is unexpectedly shut down
or restarted is there no need for initial data restoration process
when the node is online again?
 
>>> Yes, you can pull the plug in the middle of a write and your data will be in a consistent state when the node powers back up. No need to run any special "restoration process" in that case. Not completely sure if that answered your question?

yojimbo87

unread,
Jul 31, 2011, 11:02:32 AM7/31/11
to Couchbase
Thanks James for your answers, they helped me a lot. I have briefly
gone through Couchbase Manual 2.0 [1] and I have several more
questions:

9. Is Couchbase Server suitable to be used in scenarios or use cases
in which data are not expected (or not necessarily needed) to fit into
memory? For example the database on disk would take 30 GB of disk
space and RAM would have only 2GB. I understand the performance
difference between RAM and disk, but I would like to know if Couchbase
Server is suitable also to be used in more general, "non-performance
centric" situations.

10. Will map/reduce views (in the future) be also queryable through
binary protocol or do they stay REST oriented?

11. Sorry if I'm too annoying with the UnQL questions, I understand
it's a work in progress and implementation details may not be ready
yet, but I would like to know if UnQL querying in Couchbase Server is
planned through binary or REST protocols (or maybe both).

[1]: http://docs.couchbase.org/couchbase-manual-2.0/index.html

On Jul 31, 12:27 pm, James Phillips - Personal

Perry Krug

unread,
Aug 1, 2011, 11:22:40 AM8/1/11
to couc...@googlegroups.com
Hi there!  While James and Jan are getting some much needed rest, I'll jump in here and answer these few questions:
 
9. Is Couchbase Server suitable to be used in scenarios or use cases
in which data are not expected (or not necessarily needed) to fit into
memory? For example the database on disk would take 30 GB of disk
space and RAM would have only 2GB. I understand the performance
difference between RAM and disk, but I would like to know if Couchbase
Server is suitable also to be used in more general, "non-performance
centric" situations.
[pk] - Absolutely!  Just like Membase, we fully support having more data on disk than you have RAM available.  We call this "disk>RAM" and you can read more about how it works here: http://www.couchbase.org/wiki/display/membase/Growing+Data+Sets+Beyond+Memory. This is currently documented for Membase, but the internals do not change with Couchbase Server 2.0 and will function the same way or better.  You may also want to review our sizing guidelines (http://www.couchbase.org/wiki/display/membase/Sizing+Guidelines) to get a better feeling of just how much RAM you will need. 

10. Will map/reduce views (in the future) be also queryable through
binary protocol or do they stay REST oriented?
[pk] - They will likely be REST oriented for the near to mid future.  REST works very well for cross-compatibility and extending of features, the binary protocol gives us a bit better performance in very latency sensitive environments (which is "usually" not the case for dealing with views/queries).  With our SDK's, we are abstracting the difference between the two protocols, you won't need to worry about where to send a request.

11. Sorry if I'm too annoying with the UnQL questions, I understand
it's a work in progress and implementation details may not be ready
yet, but I would like to know if UnQL querying in Couchbase Server is
planned through binary or REST protocols (or maybe both).
[pk] - Don't apologize!  We're happy to help and we know this is all new (and sometimes changing) so please keep the questions coming.  And thank you for asking in a public place so others can learn from your questions and our answers.  Unfortunately I don't have a firm answer to this one as it's still very much in development.  James and Jan might have better insight, but I would recommend waiting a bit until we've worked out more of the details.  I'll make sure your questions re: REST vs. binary get answered and documented. 

yojimbo87

unread,
Aug 1, 2011, 12:43:49 PM8/1/11
to Couchbase
Thanks for the answers Perry, I really appreciate that various members
of Couchbase team are agreedable to anwer my questions. Here is
another set:

9. That's great because during the brief study of Couchbase Server
manual I got a feeling that it's a bad/unwanted situation (in case
where performance is not critical) when database data stored on disk
outreach memory designated for Couchbase. I think that this situation
is normal in CMS, CRM or other similar systems which can manage large
amounts of various data so I was curious if Couchbase would be
suitable as a database backend also for these kind of systems.

12. I read in Couchbase manual that Couchbase Server is "always
consistent for any given item" [1]. If I'm not mistaken CouchDB is
eventually consistent thanks to it's MVCC model, so my question is -
if CouchDB technology is used for data persistence, how come Couchbase
Server is now (strongly?) consistent (it may be my fault that I have
misunderstood something during the study of the manual)?

13. There are now 3 client SDK libraries for .NET, Java and PHP for
communicating with Couchbase Server. Are there any plans to make
libraries also for other languages/platforms? As I mentioned in
question #5 I'm namely interested in node.js, but what I want to know
is if Couchbase (as a company) would support it's own library for
node.js or the community can/should make a move in this field. I know
there are already memcached/CouchDB oriented modules which are
probably compatible with Couchbase Server, but I'm interested in
something which would be dedicated especially for the Couchbase Server
(so that users wouldn't have to use different modules for different
functionality).

14. Is Couchbase Server supporting also changes feed similar to the
CouchDB's one?

[1]: http://docs.couchbase.org/couchbase-manual-2.0/couchbase-architecture.html#idp140451472765472

On Aug 1, 5:22 pm, Perry Krug <perryk...@gmail.com> wrote:
> Hi there!  While James and Jan are getting some much needed rest, I'll jump
> in here and answer these few questions:
>
> > 9. Is Couchbase Server suitable to be used in scenarios or use cases
> > in which data are not expected (or not necessarily needed) to fit into
> > memory? For example the database on disk would take 30 GB of disk
> > space and RAM would have only 2GB. I understand the performance
> > difference between RAM and disk, but I would like to know if Couchbase
> > Server is suitable also to be used in more general, "non-performance
> > centric" situations.
>
> [pk] - Absolutely!  Just like Membase, we fully support having more data on
> disk than you have RAM available.  We call this "disk>RAM" and you can read
> more about how it works here:http://www.couchbase.org/wiki/display/membase/Growing+Data+Sets+Beyon....
> ...
>
> read more »

Perry Krug

unread,
Aug 1, 2011, 1:06:25 PM8/1/11
to couc...@googlegroups.com
More inline:
9. That's great because during the brief study of Couchbase Server
manual I got a feeling that it's a bad/unwanted situation (in case
where performance is not critical) when database data stored on disk
outreach memory designated for Couchbase. I think that this situation
is normal in CMS, CRM or other similar systems which can manage large
amounts of various data so I was curious if Couchbase would be
suitable as a database backend also for these kind of systems.
[pk] - I would say it is definitely suitable, depending on your performance requirements.  If you can point me to the section of the manual you are reading I can help to either clarify the specific points or make the manual clearer.  We recommend having your "working set" (the portion of your overall dataset that is in use at any one given time) to be in RAM for the best performance (sub ms latency).  A request for data outside of this working set will be serviced from RAM and longer (~10ms) and less consistent latencies.  SSD's have been shown to help greatly here for what it's worth.

12. I read in Couchbase manual that Couchbase Server is "always
consistent for any given item" [1]. If I'm not mistaken CouchDB is
eventually consistent thanks to it's MVCC model, so my question is -
if CouchDB technology is used for data persistence, how come Couchbase
Server is now (strongly?) consistent (it may be my fault that I have
misunderstood something during the study of the manual)?
[pk] - Your understanding is correct, and this mostly comes out of our heritage as memcached/Membase.   The caching and clustering layer (Membase) is what controls this strong consistency.  We are in the process of developing a cross-datacenter replication feature which will allow for eventual consistency between two separate clusters.  Does that answer your question?

13. There are now 3 client SDK libraries for .NET, Java and PHP for
communicating with Couchbase Server. Are there any plans to make
libraries also for other languages/platforms? As I mentioned in
question #5 I'm namely interested in node.js, but what I want to know
is if Couchbase (as a company) would support it's own library for
node.js or the community can/should make a move in this field. I know
there are already memcached/CouchDB oriented modules which are
probably compatible with Couchbase Server, but I'm interested in
something which would be dedicated especially for the Couchbase Server
(so that users wouldn't have to use different modules for different
functionality).
[pk] - We're actually up to 5 (+ Ruby and Python) but nothing official for node.js yet.  I'll pass that request on to our SDK team, but we would most certainly invite any community effort to accelerate this. 

14. Is Couchbase Server supporting also changes feed similar to the
CouchDB's one?
[pk] - Not quite yet, but that's planned for a not-too-far-off release. 

yojimbo87

unread,
Aug 1, 2011, 2:50:50 PM8/1/11
to Couchbase
9. If I recall correctly my concern was in the "Consequences of Memory
faster than Disk" section [1] when I saw SERVER_ERROR being returned
"when there is not enough space" and around the three possible
approaches.

12. Yes, that's the answer I was looking for, however I have another
questions regarding this matter: is CouchDB's MVCC model still used
"behind the scenes" since membase takes care of data consistency? I'm
curious if the versioning of persisted data still takes place since
key values are not necessarily JSON documents now in Couchbase Server.

13. Are there any other resources apart from "Developing Couchbase
Clients" section [2] in the Couchbase Server manual which I should
read/study in case I would be interested in developing node.js based
client for Couchbase Server? Or maybe some documentation or (internal)
design principles of how are existing clients programmed (and how new
one should be coded).

16. During the browsing of Couchbase forums I stumbled upon a question
about "REST API for getting data" [3] which I'm also interested in.

17. How does existing clients store key values which consists of JSON
string? Are they stored as a string or encoded in other format for
efficiency [4]? Considering that I would probably want to store a lot
of JSON objects in Couchbase, is it better to use string value or
encoded format such as protocol buffers?

[1]: http://docs.couchbase.org/couchbase-manual-2.0/couchbase-architecture.html#idp140451472796688
[2]: http://docs.couchbase.org/couchbase-manual-2.0/couchbase-client-development.html
[3]: http://www.couchbase.org/forums/thread/rest-api-getting-data
[4]: http://docs.couchbase.org/couchbase-manual-2.0/couchbase-developing.html#couchbase-developing-bestpractices-objectstorage
> >http://docs.couchbase.org/couchbase-manual-2.0/couchbase-architecture...
> ...
>
> read more »

Perry Krug

unread,
Aug 1, 2011, 4:30:23 PM8/1/11
to couc...@googlegroups.com
9. If I recall correctly my concern was in the "Consequences of Memory
faster than Disk" section [1] when I saw SERVER_ERROR being returned
"when there is not enough space" and around the three possible
approaches.
[pk] - Now I understand.  The things to keep in mind with respect to that are:
-Memory is faster than disk.  If you fill up memory faster than we can get the data to disk, Membase will temporarily return "I'm out of memory, back off" messages.  It is up to the application to do the appropriate back-off and retry thing here.  We've only really seen it happen during bulk-loading situations.
-Certain pieces of data cannot be "ejected" to disk and so could end up filling up the available memory space.  You can assume roughly 150 bytes per item for this metadata and should size appropriately.  

Keep in mind, both of those situations are mitigated with more RAM...either more RAM on the nodes within the cluster or more nodes. 

12. Yes, that's the answer I was looking for, however I have another
questions regarding this matter: is CouchDB's MVCC model still used
"behind the scenes" since membase takes care of data consistency? I'm
curious if the versioning of persisted data still takes place since
key values are not necessarily JSON documents now in Couchbase Server.
[pk] - Yes, we still maintain that versioning under the hood, but it's not designed to be exposed or used by the user. There is also automatic compaction that can be configured to take place, which will remove any previous versions of an item.  Can you give me some more insight as to what you're looking for here?

13. Are there any other resources apart from "Developing Couchbase
Clients" section [2] in the Couchbase Server manual which I should
read/study in case I would be interested in developing node.js based
client for Couchbase Server? Or maybe some documentation or (internal)
design principles of how are existing clients programmed (and how new
one should be coded).
[pk] - We're still building up this information, but you can get more from http://couchbase.org

16. During the browsing of Couchbase forums I stumbled upon a question
about "REST API for getting data" [3] which I'm also interested in.
[pk] - This is something that we're planning on adding with the 3.0 version of Couchbase Server.  At the moment, data operations are only supported through the memcached interface. 

17. How does existing clients store key values which consists of JSON
string? Are they stored as a string or encoded in other format for
efficiency [4]? Considering that I would probably want to store a lot
of JSON objects in Couchbase, is it better to use string value or
encoded format such as protocol buffers?
[pk] - Couchbase Server 2.0 can store data in one of two ways: valid JSON, and everything else.  If valid JSON is used as the value for a key/document, then the server will parse it and allow you to perform map-reduce queries/indexes on that.  If valid JSON is not used (binary data for example, or anything else really) then you will only be able to query against the key names themselves, but will still have the same access characteristics as you do with Membase today.  Under the hood, we are actually doing a bit of compression to save on disk IO and space.  I would recommend using string value JSON.

yojimbo87

unread,
Aug 1, 2011, 5:39:43 PM8/1/11
to Couchbase
12. That answered my question completely. I was just curious if the
MVCC model is used in some way by the Couchbase Server when membase
technology is responsible for maintaining consistency.

I think I have exhausted all of my questions for now, so thanks again
for your time and effort and for having patience with me. You all
helped me a lot.
> [pk] - We're still building up this information, but you can get more fromhttp://couchbase.org
>
>
>
> > 16. During the browsing of Couchbase forums I stumbled upon a question
> > about "REST API for getting data" [3] which I'm also interested in.
>
> [pk] - This is something that we're planning on adding with the 3.0 version
> of Couchbase Server.  At the moment, data operations are only supported
> through the memcached interface.
>
>
>
> > 17. How does existing clients store key values which consists of JSON
> > string? Are they stored as a string or encoded in other format for
> > efficiency [4]? Considering that I would probably want to store a lot
> > of JSON objects in Couchbase, is it better to use string value or
> > encoded format such as protocol buffers?
>
> [pk] - Couchbase Server 2.0 can store data in one of two ways: valid JSON,
> and everything else.  If valid JSON is used as the value for a key/document,
> then the server will parse it and allow you to perform map-reduce
> queries/indexes on that.  If valid JSON is not used (binary data for
> example, or anything else really) then you will only be able to query
> against the key names themselves, but will still have the same access
> characteristics as you do with Membase today.  Under the hood, we are
> actually doing a bit of compression to save on disk IO and space.  I would
> recommend using string value JSON.
>
>
>
>
>
>
>
>
>
> > [1]:
> >http://docs.couchbase.org/couchbase-manual-2.0/couchbase-architecture...
> > [2]:
> >http://docs.couchbase.org/couchbase-manual-2.0/couchbase-client-devel...
> > [3]:http://www.couchbase.org/forums/thread/rest-api-getting-data
> > [4]:
> >http://docs.couchbase.org/couchbase-manual-2.0/couchbase-developing.h...
> ...
>
> read more »

yojimbo87

unread,
Aug 7, 2011, 6:07:30 AM8/7/11
to Couchbase
I have another few questions after some while spent with couchbase
server:

18. When I'm adding more nodes to the existing couchbase server
cluster - am I actually sharding bucket(s) data across the cluster in
order to distribute data to various nodes?

19. Can bucket(s) be replicated up to three instances or is there a
possibility to increase this number of replicas.

20. I had previously installed couchbase single server 2.0 deb package
on my ubuntu based machine and when I installed also couchbase server
2.0 on the very same machine, my single server 2.0 installation was
during the installation upgraded to server 2.0. Is this normal
behavior? I'm asking because server 2.0 had after the installation
some problems in logs and was constantly restarting some of it's
processes, so I proceeded with uninstalling both single server 2.0,
server 2.0 and then just installing server 2.0 version.
> ...
>
> read more »

Jan Lehnardt

unread,
Aug 8, 2011, 8:05:47 AM8/8/11
to couc...@googlegroups.com
Excellent questions, as usual!

On 7 Aug 2011, at 12:07, yojimbo87 wrote:

> I have another few questions after some while spent with couchbase
> server:
>
> 18. When I'm adding more nodes to the existing couchbase server
> cluster - am I actually sharding bucket(s) data across the cluster in
> order to distribute data to various nodes?

When you start out with a Couchbase setup, your data will already be
sharded over 1024 so called "vbuckets" (think virtual buckets) totally
transparently for the application. When the cluster changes in either
direction, only these vbuckets (which only hold 1/1024th of your total
data) are moved to their new respective servers, so rebalancing data
churn is minimised. For a more detailed explanation, see

http://dustin.github.com/2010/06/29/memcached-vbuckets.html


> 19. Can bucket(s) be replicated up to three instances or is there a
> possibility to increase this number of replicas.

When setting up a new bucket, you can specify the number of replicas
you want for that bucket. If I remember correctly changing that later
is not possible, but you can always make more buckets with different
replica counts, so you have places for "important data" and "less
important data".

> 20. I had previously installed couchbase single server 2.0 deb package
> on my ubuntu based machine and when I installed also couchbase server
> 2.0 on the very same machine, my single server 2.0 installation was
> during the installation upgraded to server 2.0. Is this normal
> behavior? I'm asking because server 2.0 had after the installation
> some problems in logs and was constantly restarting some of it's
> processes, so I proceeded with uninstalling both single server 2.0,
> server 2.0 and then just installing server 2.0 version.

Sorry, I can't help with that, but I'm forwarding this to our packaging
team.

Thanks for your patience :)

Cheers
Jan
--

yojimbo87

unread,
Aug 8, 2011, 6:08:41 PM8/8/11
to Couchbase
18. Thanks for the link Jan, I just actually wanted to know (or rather
make things clear for me) that if I'm doing (auto)sharding of the data
(and not replication like couchdb) when I add more nodes to the
couchbase cluster.

21. Will there be an option in couchbase server admin interface to add/
delete/modify data in buckets?
> ...
>
> read more »

Jan Lehnardt

unread,
Aug 8, 2011, 6:10:02 PM8/8/11
to couc...@googlegroups.com

On Aug 9, 2011, at 12:08 AM, yojimbo87 wrote:

> 18. Thanks for the link Jan, I just actually wanted to know (or rather
> make things clear for me) that if I'm doing (auto)sharding of the data
> (and not replication like couchdb) when I add more nodes to the
> couchbase cluster.

So your question is no sufficiently answered? :)

> 21. Will there be an option in couchbase server admin interface to add/
> delete/modify data in buckets?

Yes.

yojimbo87

unread,
Aug 10, 2011, 11:36:16 AM8/10/11
to Couchbase
18. Yes it is, I just need to read that article.

22. Is there somewhere some kind of a roadmap with planned
functionality for the future couchbase server releases?

23. (related to Q12) Is there still need for the couchdb's MVCC model
since membase takes care of data consistency?

24. (related to Q12) Also since couchbase server is strongly
consistent - does it affect high availability in some way?
> ...
>
> read more »

John L. Cheng

unread,
Aug 20, 2011, 2:39:24 AM8/20/11
to Couchbase
Great Q&A guys! I really want to thank Jan and Perry for being so
helpful and Yojimbo for starting this thread.

I have a couple of questions of my own after reading this.

1. Currently, when updating a document, CouchDB requires that you have
the correct rev id to update successfully. How does Couchbase 2.0
handle conflicts? Is it simply last writer wins? Is there an option to
fetch old revisions and let the client merge changes?

2. Tying into the question regarding write conflicts, does Couchbase
2.0 support in-place updates (i.e., write without read)? How about
document update handlers?

3. What happens when the server crashes. Is all the data that is
waiting to be flushed to disk lost?

Jan Lehnardt

unread,
Aug 23, 2011, 8:42:29 AM8/23/11
to couc...@googlegroups.com
Hi John,

thanks for writing. I pinged Frank Weigel, Product Manager of Couchbase 2.0 and he replied with this:

In order to get the best performance out of the clustered setup (latency
and throughput) for data ops, primary path data access for Couchbase
Server actually uses the memcached protocol and as a results data access
semantics are a little different from CouchDB. But in return you can do
well over 100k ops/s on a single node :)

As a result there is no versioning exposed at that interface, so "last
write wins" is the correct semantic and there is no support to read
specific versions via the memcached interface, instead the last written
version will be returned by the cluster.

Instead, memcached protocol offers CAS as its optimistic locking approach,
which can be used for the typical use cases of versioning, I.e. To avoid
and detect write conflicts from multiple writers with a read-modify-write
cycle. Just like the version id, the CAS changes if a document is updated
and if the CAS obtained at read time is passed along with the write, the
write will only succeed if the CAS matched, I.e. The document was
unchanged since the read.

There is some CouchAPI support for CRUD in 2.0, but it is slow and doesn't
support all Couch API operations at this point. We'll expand the Couch API
over time, however, for high performance use, we recommend using the new
SDKs, that leverage the memcached interface as per above. Please let us
know which of the Couch API features you guys see as most helpful to be
prioritized for support!


Persistence to disk is asynchronous is Couchbase Server 2.0, as is
replication to other nodes in the cluster.

So any data that has not been written to disk, won't be there after a
restart or crash.

Primary mean of data availability is to fall back onto the replica, I.e.
Do a failover (or use the auto failover) and the replica will take over.
As replication over the network is typically pretty fast, the delay window
is very small (it gets put on the replication queue straight I memory

Of course there are applications where you just need to know whether data
has been either replicated or persisted before you continue. For those
cases Couchbase Server 2.0 offers the SYNC instruction. The SYNC
instruction basically let's you wait until a document has been either
persisted or replicated, depending on which one you use. This way you can
block and make sure you have all your data on disk, but of course it means
you now have to wait for disk I/o. Nice thing is that you can chose which
documents you want to do synchronous writes for in this way, so you have
very fin grain control over where to incur the penalty.

Of course if you just want to stick with the CouchDB API, Couchbase Single Server 2.0 Developer Preview has the performance work Damien and Filipe have been doing and provides the full CouchDB API, so take a look there as well, if you don't need clustering or the built-in memory caching for the additional low latency and high throughput.

Cheers
Jan
--

John Cheng

unread,
Aug 23, 2011, 10:15:31 AM8/23/11
to couc...@googlegroups.com
Jan,

Thanks for your response. now I have a much better understanding of
Couchbase Server 2.0.

--
---
John L Cheng

John Cheng

unread,
Aug 30, 2011, 11:08:52 AM8/30/11
to couc...@googlegroups.com
>
> Instead, memcached protocol offers CAS as its optimistic locking approach,
> which can be used for the typical use cases of versioning, I.e. To avoid
> and detect write conflicts from multiple writers with a read-modify-write
> cycle. Just like the version id, the CAS changes if a document is updated
> and if the CAS obtained at read time is passed along with the write, the
> write will only succeed if the CAS matched, I.e. The document was
> unchanged since the read.

Hi another follow up here. When there are multiple servers serving the
same key (multiple servers per vbucket), how does a CAS ensure the
original document has not changed? In case of a split-brain scenario
[1], two concurrent writes can go to two different servers and both
succeed. I can think of a few way this is handled. Now you can have
inconsistent state in that cluster. How does Membase resolve this
inconsistency? Is it last writer wins? Does the data stay inconsistent
until a client issue an update to (once the network partition goes
away) to bring all servers in sync?

[1] - http://docs.couchbase.org/couchbase-manual-2.0/couchbase-architecture.html#couchbase-architecture-failover-automatic-considerations

Perry Krug

unread,
Aug 30, 2011, 1:51:58 PM8/30/11
to couc...@googlegroups.com
John, a single key can never be active in two places at once by design.  Therefore CAS only needs to worry about the authoritative copy.

Your statement about split-brain does throw a wrench into that though.  IF you have a split-brain scenario AND both sides get failed over, you will have inconsistencies.  Depending on your application and environment, this could lead to some nasty problems.

This is one of the main reasons we did not have an automatic failover option in the beginning...and also the reason we put so many restrictions on the automatic failover feature.

Basically, if you have a split-brain situation it's better to be missing access to some data rather than introduce inconsistencies, so don't failover.

Perry Krug
Solutions Architect
direct: 831-824-4123
emailpe...@couchbase.com

John Cheng

unread,
Aug 30, 2011, 5:19:10 PM8/30/11
to couc...@googlegroups.com
Your statement about split-brain does throw a wrench into that though.  IF you have a split-brain scenario AND both sides get failed over, you will have inconsistencies.  Depending on your application and environment, this could lead to some nasty problems.

This is one of the main reasons we did not have an automatic failover option in the beginning...and also the reason we put so many restrictions on the automatic failover feature.

HI Perry, 
 
I did read about the thinking over handling failovers, especially the consideration over the split brain scenario. When I read about CAS support in Couchbase, it made me question the consistency level of a CAS "set value" in light of Couchbase's design, i.e., if a cluster wide lock is implied. It does not sound like this is the case. Thanks for the clarification!


Perry Krug

unread,
Aug 30, 2011, 6:55:27 PM8/30/11
to couc...@googlegroups.com
Correct, there is 0 cluster-wide locking and we plan on keeping it that way.

As Matt is fond of saying "locking in a distributed system is the very definition of slow"...and we are not going to sacrifice that!

Perry Krug
Solutions Architect
direct: 831-824-4123
emailpe...@couchbase.com






Reply all
Reply to author
Forward
0 new messages