thanks for writing, I'll add my replies inline:
On 30 Jul 2011, at 12:32, yojimbo87 wrote:
> Recently I stumbled upon two articles [1][2] about the release of new
> Couchbase Server 2.0 and I have a few questions:
>
> 1. If I understand it correctly Couchbase Server is a combination of
> CouchDB (or Couchbase Single Server) and Membase merged into single
> NoSQL solution which offers advantages of both products - Membase
> Server as a key-value in-memory cache store and the CouchDB as disk
> persistent document database?
Membase has had persistence functionality for a while now. Prior to
2.0 we used SQLite for that, with 2.0 and onward we are using CouchDB
as the persistence layer.
> 2. What will happen with the separate database products - Couchbase
> Single Server and Membase - will they eventually be replaced by the
> Couchbase Server or do they continue to be offered as standalone
> solutions?
The Membase product line will be superseded by Couchbase 2.0.
Couchbase Single Server 2.0 is our edition for users that do not need
the advanced cluster functionality of Couchbase 2.0.
So Membase is going to be replaced, while Couchbase Single Server is
the (duh) single server edition of our Couchbase 2.0 product line
(there is also a mobile edition, to round off our offering).
> 3. Will Couchbase Server in the future be offered both as enterprise
> and also in free/open source/community editions? What would be the
> marginal difference?
Yes, we are very much committed to the open source community as well
as our paying customers. We believe in open source and open participation
to build better products. Enterprise users and customers can purchase
commercial licenses to gain access to our support team that is there for
them in times of need.
At this point we do not plan to have any proprietary extensions that
wouldn't be available in the community edition. All our code is free
and open source if you choose to. (I can't stress enough how proud
this makes me personally :)
> 4. Regarding the new UnQL stuff - will (or is) Couchbase Server (2.0+)
> support adhoc queries based on this technology alongside with it's map/
> reduce views? Are there any resources availabable yet?
The current state of UnQL is documented on http://unqlspec.org/ — UnQL
will work alongside the traditional Map/Reduce views and other querying
options like GeoCouch and couchdb-lucene (and everything else that is
out there).
> 5. Will Couchbase Server require to write new set of client libraries
> for different languages/platforms? For example I'm using cradle
> node.js module to communicate both with CouchDB and Couchbase Single
> Server, but what about the new Couchbase Server?
We are aiming to make Couchbase a drop-in replacement for all memcached,
Membase and CouchDB users.That said, we are not there yet (bear with us :)
Couchbase 2.0 will be fully memcached and Membase compatible and you will
not need to rewrite anything written for those servers. If you want to
use new features, we encourage to use our (new, open source) SDK packages
for you programming language.
Couchbase Single Server 2.0 is 100% Apache CouchDB compatible and you won't
need to rewrite anything again.
Future versions (currently planned for 3.0, but let us get 2.0 out of the
door first :) will be compatible as outlined above (i.e. with everything :)
For you (node.js / cradle) this means:
- You can use Couchbase Single Server 2.0 just like you are using Apache
CouchDB (and earlier Couchbase Single Server releases) today.
- To use Couchbase 2.0 (the cluster version), you will need to use a
memcached or Membase compatible client library for k/v access and
a plain old HTTP library for view requests. In fact though, regular
CouchDB APIs should also work. We haven't tried cradle yet, but it's
definitely on our list of things to do.
We understand that the initial promise (just works) is not there yet,
but we are working hard on getting there. In addition, all this is open
source, so we are as grateful as we are interested in your feedback in
form of questions, code, or complaints :)
Please let us know if you have any more questions!
Cheers
Jan
--
9. Is Couchbase Server suitable to be used in scenarios or use cases
in which data are not expected (or not necessarily needed) to fit into
memory? For example the database on disk would take 30 GB of disk
space and RAM would have only 2GB. I understand the performance
difference between RAM and disk, but I would like to know if Couchbase
Server is suitable also to be used in more general, "non-performance
centric" situations.
10. Will map/reduce views (in the future) be also queryable through
binary protocol or do they stay REST oriented?
11. Sorry if I'm too annoying with the UnQL questions, I understand
it's a work in progress and implementation details may not be ready
yet, but I would like to know if UnQL querying in Couchbase Server is
planned through binary or REST protocols (or maybe both).
9. That's great because during the brief study of Couchbase Server
manual I got a feeling that it's a bad/unwanted situation (in case
where performance is not critical) when database data stored on disk
outreach memory designated for Couchbase. I think that this situation
is normal in CMS, CRM or other similar systems which can manage large
amounts of various data so I was curious if Couchbase would be
suitable as a database backend also for these kind of systems.
12. I read in Couchbase manual that Couchbase Server is "always
consistent for any given item" [1]. If I'm not mistaken CouchDB is
eventually consistent thanks to it's MVCC model, so my question is -
if CouchDB technology is used for data persistence, how come Couchbase
Server is now (strongly?) consistent (it may be my fault that I have
misunderstood something during the study of the manual)?
13. There are now 3 client SDK libraries for .NET, Java and PHP for
communicating with Couchbase Server. Are there any plans to make
libraries also for other languages/platforms? As I mentioned in
question #5 I'm namely interested in node.js, but what I want to know
is if Couchbase (as a company) would support it's own library for
node.js or the community can/should make a move in this field. I know
there are already memcached/CouchDB oriented modules which are
probably compatible with Couchbase Server, but I'm interested in
something which would be dedicated especially for the Couchbase Server
(so that users wouldn't have to use different modules for different
functionality).
14. Is Couchbase Server supporting also changes feed similar to the
CouchDB's one?
9. If I recall correctly my concern was in the "Consequences of Memory
faster than Disk" section [1] when I saw SERVER_ERROR being returned
"when there is not enough space" and around the three possible
approaches.
12. Yes, that's the answer I was looking for, however I have another
questions regarding this matter: is CouchDB's MVCC model still used
"behind the scenes" since membase takes care of data consistency? I'm
curious if the versioning of persisted data still takes place since
key values are not necessarily JSON documents now in Couchbase Server.
13. Are there any other resources apart from "Developing Couchbase
Clients" section [2] in the Couchbase Server manual which I should
read/study in case I would be interested in developing node.js based
client for Couchbase Server? Or maybe some documentation or (internal)
design principles of how are existing clients programmed (and how new
one should be coded).
16. During the browsing of Couchbase forums I stumbled upon a question
about "REST API for getting data" [3] which I'm also interested in.
17. How does existing clients store key values which consists of JSON
string? Are they stored as a string or encoded in other format for
efficiency [4]? Considering that I would probably want to store a lot
of JSON objects in Couchbase, is it better to use string value or
encoded format such as protocol buffers?
On 7 Aug 2011, at 12:07, yojimbo87 wrote:
> I have another few questions after some while spent with couchbase
> server:
>
> 18. When I'm adding more nodes to the existing couchbase server
> cluster - am I actually sharding bucket(s) data across the cluster in
> order to distribute data to various nodes?
When you start out with a Couchbase setup, your data will already be
sharded over 1024 so called "vbuckets" (think virtual buckets) totally
transparently for the application. When the cluster changes in either
direction, only these vbuckets (which only hold 1/1024th of your total
data) are moved to their new respective servers, so rebalancing data
churn is minimised. For a more detailed explanation, see
http://dustin.github.com/2010/06/29/memcached-vbuckets.html
> 19. Can bucket(s) be replicated up to three instances or is there a
> possibility to increase this number of replicas.
When setting up a new bucket, you can specify the number of replicas
you want for that bucket. If I remember correctly changing that later
is not possible, but you can always make more buckets with different
replica counts, so you have places for "important data" and "less
important data".
> 20. I had previously installed couchbase single server 2.0 deb package
> on my ubuntu based machine and when I installed also couchbase server
> 2.0 on the very same machine, my single server 2.0 installation was
> during the installation upgraded to server 2.0. Is this normal
> behavior? I'm asking because server 2.0 had after the installation
> some problems in logs and was constantly restarting some of it's
> processes, so I proceeded with uninstalling both single server 2.0,
> server 2.0 and then just installing server 2.0 version.
Sorry, I can't help with that, but I'm forwarding this to our packaging
team.
Thanks for your patience :)
Cheers
Jan
--
> 18. Thanks for the link Jan, I just actually wanted to know (or rather
> make things clear for me) that if I'm doing (auto)sharding of the data
> (and not replication like couchdb) when I add more nodes to the
> couchbase cluster.
So your question is no sufficiently answered? :)
> 21. Will there be an option in couchbase server admin interface to add/
> delete/modify data in buckets?
Yes.
thanks for writing. I pinged Frank Weigel, Product Manager of Couchbase 2.0 and he replied with this:
In order to get the best performance out of the clustered setup (latency
and throughput) for data ops, primary path data access for Couchbase
Server actually uses the memcached protocol and as a results data access
semantics are a little different from CouchDB. But in return you can do
well over 100k ops/s on a single node :)
As a result there is no versioning exposed at that interface, so "last
write wins" is the correct semantic and there is no support to read
specific versions via the memcached interface, instead the last written
version will be returned by the cluster.
Instead, memcached protocol offers CAS as its optimistic locking approach,
which can be used for the typical use cases of versioning, I.e. To avoid
and detect write conflicts from multiple writers with a read-modify-write
cycle. Just like the version id, the CAS changes if a document is updated
and if the CAS obtained at read time is passed along with the write, the
write will only succeed if the CAS matched, I.e. The document was
unchanged since the read.
There is some CouchAPI support for CRUD in 2.0, but it is slow and doesn't
support all Couch API operations at this point. We'll expand the Couch API
over time, however, for high performance use, we recommend using the new
SDKs, that leverage the memcached interface as per above. Please let us
know which of the Couch API features you guys see as most helpful to be
prioritized for support!
Persistence to disk is asynchronous is Couchbase Server 2.0, as is
replication to other nodes in the cluster.
So any data that has not been written to disk, won't be there after a
restart or crash.
Primary mean of data availability is to fall back onto the replica, I.e.
Do a failover (or use the auto failover) and the replica will take over.
As replication over the network is typically pretty fast, the delay window
is very small (it gets put on the replication queue straight I memory
Of course there are applications where you just need to know whether data
has been either replicated or persisted before you continue. For those
cases Couchbase Server 2.0 offers the SYNC instruction. The SYNC
instruction basically let's you wait until a document has been either
persisted or replicated, depending on which one you use. This way you can
block and make sure you have all your data on disk, but of course it means
you now have to wait for disk I/o. Nice thing is that you can chose which
documents you want to do synchronous writes for in this way, so you have
very fin grain control over where to incur the penalty.
Of course if you just want to stick with the CouchDB API, Couchbase Single Server 2.0 Developer Preview has the performance work Damien and Filipe have been doing and provides the full CouchDB API, so take a look there as well, if you don't need clustering or the built-in memory caching for the additional low latency and high throughput.
Cheers
Jan
--
Thanks for your response. now I have a much better understanding of
Couchbase Server 2.0.
--
---
John L Cheng
Hi another follow up here. When there are multiple servers serving the
same key (multiple servers per vbucket), how does a CAS ensure the
original document has not changed? In case of a split-brain scenario
[1], two concurrent writes can go to two different servers and both
succeed. I can think of a few way this is handled. Now you can have
inconsistent state in that cluster. How does Membase resolve this
inconsistency? Is it last writer wins? Does the data stay inconsistent
until a client issue an update to (once the network partition goes
away) to bring all servers in sync?
Perry Krug
Solutions Architect
direct: 831-824-4123
email: pe...@couchbase.com
Your statement about split-brain does throw a wrench into that though. IF you have a split-brain scenario AND both sides get failed over, you will have inconsistencies. Depending on your application and environment, this could lead to some nasty problems.This is one of the main reasons we did not have an automatic failover option in the beginning...and also the reason we put so many restrictions on the automatic failover feature.