Newbie question: BigCouch data propagation schema? Data location

142 views
Skip to first unread message

Thomas

unread,
Oct 31, 2013, 11:52:25 AM10/31/13
to bigcou...@googlegroups.com
Hi,

I have recently started with BigCouch and have now my first newbie question:
* How are actually data propagated over nodes within BigCouch?
In my small test setup I have just two nodes, which I fed with some documents and reading between these two worked fine. So, I tested how the setup behaves, when I stop one of the instances. Against my (naive) expectations the remaining node still knew about the documents, that I added to the other node -- i.e., I assumed, that the data are only stored on the node, where they had been added to the db (without replication of course).

thus --> my second question:
* can I actually limit the data propagation between the nodes?
Ideally, I would like to make a query against the cluster and the query is only propagated. I would like to avoid propagaing data beforehand, since it would put the network under constant stress and the availability of a node specific document has not a high priority.

Cheers and thanks for ideas,
  Thomas

Mike Miller

unread,
Oct 31, 2013, 1:32:31 PM10/31/13
to bigcou...@googlegroups.com
Hi Thomas, 

Data is propagated for writes between nodes in two main ways:

1) Quorum writes.  With the defaults of n=3 storage, single document PUTs are scattered by the coordinating node to the three nodes that are responsible for the particular shard file that document belongs to (based on a hash of the doc._id).  When w=2 nodes succeed within a timeout limit, we return a 201 "OK".  When only 1 node succeeds within the timeout limit we return a 202 "Accepted".  

2) Anti entropy.  There is a continuous background process that uses a non-http based form of shard-shard replication to ensure that shards remain consistent on disk, even if there was a failure or a timeout somewhere in the past.   That is something that cannot be disabled and is an important part of the system.

In our operational experience, the network chatter is not a limiting factor up to at least 200 nodes in a single cluster.  There are optimizations beyond that (not yet in bigcouch/couchdb2.0 master) to deal with optimization of intra-cluster network traffic.

-M


--
You received this message because you are subscribed to the Google Groups "BigCouch Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bigcouch-use...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thomas

unread,
Nov 4, 2013, 4:10:51 AM11/4/13
to bigcou...@googlegroups.com
Hi Mike,

many thanks for the detailed information.

So, can I limit the storage replication to n=1 as well...?

Just to give an idea of what I would like to try:
I would like to track/log the system information of nodes in a way 'nice ' to the network. My idea (I am not sure if this would be possible at all with BigCouch) is to to run instances on each node in a rack, knowing the instance on the rack manager (and maybe each others). Depending on the stability/performance, the rack manager would know each other plus three head nodes -or- without intermediate communication between manager instances and only the three separate head instances know the rack managers and each know only the heads.
Thus,  I would like to ask one of the head instances for the state of one node with the query propagating the tree until reaching the right node (since it is not time critical, lags/delays for queries would be acceptable). If replication n>=2 is 'irrevocable', could I limit it to a subset, i.e., just between nodes within one rack?

Would such a tree-like setup be possible or would it be a misuse of BigCouch...? ;)

Cheers and many thanks,
  Thomas

Mike Miller

unread,
Nov 4, 2013, 3:51:40 PM11/4/13
to bigcou...@googlegroups.com
Hi,

That's an interesting idea, but in our experience that's not an optimization that you should/would want to do.  n=1 storage means that a single drive failure loses data, which is against the whole spirit of BigCouch (or any dynamo system).  Assumptions of n >= 3 are pretty heavy in BigCouch.  That said, you can define the n value at DB creation time by doing:

curl -X PUT 'https://..../dbname?n=1'

But, it's such a poor idea that we don't allow cloudant users to do it.  Further, if you're worried about network chatter, that hasn't been a limitation for us in production well into the range of billions of writes/day.  Generally speaking, you'll run out of IOPs before you run out of inter-node network.  You may be able to accomplish some of what you want with zone placement, see, e.g.:


-M

-M

Reply all
Reply to author
Forward
0 new messages