Post: Brandon Philips Explains etcd

Showing 1-7 of 7 messages
Post: Brandon Philips Explains etcd Phil Whelan 3/18/14 5:02 PM
Hi,

I just posted an interview with Brandon Philips that I did last week. It’s focused on etcd, with some discussion on how it relates to Docker.

Cheers,
Phil
Re: [docker] Post: Brandon Philips Explains etcd Evan Krall 3/18/14 6:47 PM
Hi Phil, 

I've got a couple questions / requests for clarification:

"It's a data-store for really important information. It's tolerant of nodes going down. It gives you a way to store configuration data with consistent changes and distributed locks. The data is always available and always correct."

Did he just claim that etcd is consistent, available, and partition tolerant? This is generally considered to be impossible - two nodes that can't talk to each other (are partitioned) cannot possibly both accept writes and still contain the same data.

In the face of quorum loss, does etcd try to stay available for reads (possibly returning stale data), available for both reads and writes (possibly having diverged views of the data), or refusing reads and writes, guaranteeing that nobody receives data that could be incorrect?

ZooKeeper is not recommended for virtual environments. This is the key reason ActiveState chose Doozerd over ZooKeeper when we added clustered configuration into our Cloud Foundry solution, Stackato.

You also brought this up in the blog post from last month about Docker, but you haven't provided much detail about why you think ZooKeeper is inappropriate for a virtual environment. What issues does ZooKeeper run into in virtual environments, and how do Doozerd and etcd avoid the same issues?

Thanks,
Evan
Re: [docker] Post: Brandon Philips Explains etcd Brandon Philips 3/18/14 9:00 PM
On Tue, Mar 18, 2014 at 6:47 PM, Evan Krall <kr...@yelp.com> wrote:
> I've got a couple questions / requests for clarification:
>
>> "It's a data-store for really important information. It's tolerant of
>> nodes going down. It gives you a way to store configuration data with
>> consistent changes and distributed locks. The data is always available and
>> always correct."
>
> Did he just claim that etcd is consistent, available, and partition
> tolerant? This is generally considered to be impossible - two nodes that
> can't talk to each other (are partitioned) cannot possibly both accept
> writes and still contain the same data.

You are right, etcd doesn't solve CAP. This is the problem with
discussing distributed systems in an informal chat. :)

The underlying consensus algorithm for etcd is Raft; it is consistent
and partition tolerant in the CAP terms. What I meant by "available"
is that the data is available for reads when quorum is lost.

> In the face of quorum loss, does etcd try to stay available for reads
> (possibly returning stale data), available for both reads and writes
> (possibly having diverged views of the data), or refusing reads and writes,
> guaranteeing that nobody receives data that could be incorrect?

In the face of quorum loss you can continue to read by default. If you
want to have consistent reads you can add the consistent=true flag.

Thank you,

Brandon
Re: [docker] Post: Brandon Philips Explains etcd Evan Krall 3/18/14 9:03 PM
Thanks for the clarification, Brandon; that's very helpful.

Evan
Re: [docker] Post: Brandon Philips Explains etcd Li Xiang 3/18/14 9:04 PM


On Tuesday, March 18, 2014 9:47:07 PM UTC-4, Evan Krall wrote:
Hi Phil, 

I've got a couple questions / requests for clarification:

"It's a data-store for really important information. It's tolerant of nodes going down. It gives you a way to store configuration data with consistent changes and distributed locks. The data is always available and always correct."

Did he just claim that etcd is consistent, available, and partition tolerant? This is generally considered to be impossible - two nodes that can't talk to each other (are partitioned) cannot possibly both accept writes and still contain the same data.
etcd is a CP system. As you states CAP system is impossible. I believe the practical assumption is that in most cases there is a majority of nodes are working properly.  
 

In the face of quorum loss, does etcd try to stay available for reads (possibly returning stale data), available for both reads and writes (possibly having diverged views of the data), or refusing reads and writes, guaranteeing that nobody receives data that could be incorrect?
Consistency means same data at the same time. And in zk, etcd or any similar system, we use logical clock to represent time. Doozer have version as a logical clock for each key.   
 

ZooKeeper is not recommended for virtual environments. This is the key reason ActiveState chose Doozerd over ZooKeeper when we added clustered configuration into our Cloud Foundry solution, Stackato.

You also brought this up in the blog post from last month about Docker, but you haven't provided much detail about why you think ZooKeeper is inappropriate for a virtual environment. What issues does ZooKeeper run into in virtual environments, and how do Doozerd and etcd avoid the same issues?

Thanks,
Evan

On Tue, Mar 18, 2014 at 5:02 PM, Phil Whelan <ph...@activestate.com> wrote:
Hi,

I just posted an interview with Brandon Philips that I did last week. It’s focused on etcd, with some discussion on how it relates to Docker.

Cheers,
Phil

Re: [docker] Post: Brandon Philips Explains etcd Phil Whelan 3/19/14 10:30 AM
On Mar 18, 2014, at 9:00 PM, Brandon Philips <bra...@ifup.co> wrote:

On Tue, Mar 18, 2014 at 6:47 PM, Evan Krall <kr...@yelp.com> wrote:
I've got a couple questions / requests for clarification:

"It's a data-store for really important information. It's tolerant of
nodes going down. It gives you a way to store configuration data with
consistent changes and distributed locks. The data is always available and
always correct."

Did he just claim that etcd is consistent, available, and partition
tolerant? This is generally considered to be impossible - two nodes that
can't talk to each other (are partitioned) cannot possibly both accept
writes and still contain the same data.

Great point Evan.

On Mar 18, 2014, at 9:00 PM, Brandon Philips <bra...@ifup.co> wrote:
The underlying consensus algorithm for etcd is Raft; it is consistent
and partition tolerant in the CAP terms. What I meant by "available"
is that the data is available for reads when quorum is lost.

Thanks Brandon. I’ll add an update to post.

Should also note, that this quote does not include certain assumptions that were mentioned later in the post, such as only using a small dataset. This is not your average key-value data-store. It designed to do a specific job well.

On Mar 18, 2014, at 6:47 PM, Evan Krall <kr...@yelp.com> wrote:
ZooKeeper is not recommended for virtual environments. This is the key reason ActiveState chose Doozerd over ZooKeeper when we added clustered configuration into our Cloud Foundry solution, Stackato.

You also brought this up in the blog post from last month about Docker, but you haven't provided much detail about why you think ZooKeeper is inappropriate for a virtual environment. What issues does ZooKeeper run into in virtual environments, and how do Doozerd and etcd avoid the same issues?

I don’t want to stray to far from the Docker path on this list, so will try to be brief…

This below reason was why we previously went with Doozerd over ZK. We’re creating a virtual appliance and we want it be able to run anywhere.


Virtual environments

We've seen situations where users run the entire zk cluster on a set of VMWare vms, all on the same host system. Latency on this configuration was >>> 10sec in some cases due to resource issues (in particular io - see the link I provided above, dedicated log devices are critical to low latency operation of the ZK cluster). Obviously no one should be running in this configuration in production - in particular there will be no reliability in cases where the host storage fails!

Virtual environments - "Cloud Computing"

In one scenario involving EC2 ZK was seeing frequent client disconnects. The user had configured a timeout of 5 seconds, which is too low, probably much too low. Why? You are running in virtualized environments on non-dedicated hardware outside your control/inspection. There is typically no way to tell (unless you are running on the 8 core ec2 systems) if the ec2 host you are running on is over/under subscribed (other vms). There is no way to control disk latency either. You could be seeing large latencies due to resource contention on the ec2 host alone. In addition to that I've heard that network latencies in ec2 are high relative to what you would see if you were running on your own dedicated environment. It's hard to tell the latency btw the servers and client->server w/in the ec2 environment you are seeing w/out measuring it.


Re: [docker] Post: Brandon Philips Explains etcd Ranjib Dey 3/19/14 11:29 AM
i want to add couple of other points on why etcd may be preferred over zookeeper:

1) operational efficiency: 
  a) as of now, it not possible to dynamically resize zookeeper cluster, .i.e adding more members without restarting the cluster,
  b) monitoring: till now its not possible to get stats about entrire zk cluster in one api call. trivial things like who is master requires multiple queries across cluster. etcd out of the box provide stats api (leader, followers, state etc)
  c) deployment: zookeeper deployment requires some more tooling (though its not specific to zk, but like most cluster based services) which can capture the context (who is leader, who all are existing members) during provisioning. etcd provides discovery/bootstrapping (this functionality is still being refined though) where one can bootstrap cluster by pointing it to a discovery endpoint (pre existing etcd cluster). etcd also provided dynamic configuration manipulation over api.

2) api: 
a ) zookeeper does not provide locking directly, it provides primitives to do so. this means, locks/barriers are implemented with help of some client side logic as well, which resulted in duplicated efforts across the client libs. etcd on the other hand provides a core set of modules (lock, leader election) out of the box. this does not restrict one to use the atomic key/value manipulation operations, but it does facilitate all bug fixes/efforts to be put in one place, which every client can use
b) simple http based REST like interface


zookeeper is awesome, and i think etcd address similar uses cases but with much better operational benefits (and with cloud based deployments in mind)

regards
ranjib