etcd discovery protocol "for dummies"

Kurt Yoder

unread,

Jul 25, 2014, 6:38:19 PM7/25/14

to coreo...@googlegroups.com

Hello again,

Can someone give me a step-by-step example of how to get etcd and CoreOS working together? My first attempt failed, possibly because I misunderstood the instructions.

Here's what I've done so far:

Set up an Ubuntu box, containing a compiled etcd. The box has IP 10.10.10.10, hostname etcd.private.net
Start up etcd on the ubuntu box like so:
./etcd/bin/etcd -bind-addr=0.0.0.0
(the next two steps are me trying to interpret discovery-protocol.md)
curl -X PUT "http://127.0.0.1:4001/v2/keys/_etcd/registry/cluster2/etcd?ttl=604800" -d value='10.10.10.10:7001'
- returns
- {"action":"set","node":{"key":"/_etcd/registry/cluster2/etcd","value":"10.10.10.10:7001","expiration":"2014-08-01T21:56:07.971412403Z","ttl":604800,"modifiedIndex":21,"createdIndex":21}}
curl -X PUT "http://127.0.0.1:4001/v2/keys/_etcd/registry/cluster2/_state?prevExist=false" -d value=started

returns
{"action":"create","node":{"key":"/_etcd/registry/cluster2/_state","value":"started","modifiedIndex":22,"createdIndex":22}}

Start up three CoreOS VMs in Openstack
cloud-config contains:
coreos: etcd: # generate a new token for each unique cluster from https://discovery.etcd.io/new discovery: http://etcd.private.net:4001/v2/keys/cluster2 # multi-region and multi-cloud deployments need to use $public_ipv4 addr: $private_ipv4:4001 peer-addr: $private_ipv4:7001 units: - name: etcd.service command: start - name: fleet.service command: start

However the above has resulted in my CoreOS VMs connecting very strangely (if at all) to my etcd.private.net host. See my previous post for the gory details.

Can somebody give me a hint how I can get this working? I'm currently dead in the water after several days of attempts :(

Thanks!

-Kurt

Brandon Philips

unread,

Jul 28, 2014, 3:01:20 PM7/28/14

to Kurt Yoder, coreos-user

On Fri, Jul 25, 2014 at 3:38 PM, Kurt Yoder <kyo...@data-tactics.com> wrote:
> Set up an Ubuntu box, containing a compiled etcd. The box has IP
> 10.10.10.10, hostname etcd.private.net
> Start up etcd on the ubuntu box like so:
> ./etcd/bin/etcd -bind-addr=0.0.0.0
> (the next two steps are me trying to interpret discovery-protocol.md)
> curl -X PUT
> "http://127.0.0.1:4001/v2/keys/_etcd/registry/cluster2/etcd?ttl=604800" -d
> value='10.10.10.10:7001'

> curl -X PUT
> "http://127.0.0.1:4001/v2/keys/_etcd/registry/cluster2/_state?prevExist=false"
> -d value=started

These step are not necessary and are very likely causing the problem.
If you want you can create the directory for discovery first:

curl -X PUT '127.0.0.1:4001/v2/_etcd/registry/cluster2?dir=true'

> coreos:
> etcd:
> discovery: http://etcd.private.net:4001/v2/keys/cluster2

> addr: $private_ipv4:4001
> peer-addr: $private_ipv4:7001
> units:
> - name: etcd.service
> command: start
> - name: fleet.service
> command: start

This all looks fine.

Brandon

Brandon Philips

unread,

Jul 28, 2014, 4:26:33 PM7/28/14

to Kurt Yoder, coreos-user

On Fri, Jul 25, 2014 at 3:38 PM, Kurt Yoder <kyo...@data-tactics.com> wrote:

> Can someone give me a step-by-step example of how to get etcd and CoreOS
> working together? My first attempt failed, possibly because I misunderstood
> the instructions.

BTW, this document should all work:
https://github.com/coreos/etcd/blob/master/Documentation/cluster-discovery.md#running-your-own-discovery-endpoint

Brandon

Kurt Yoder

unread,

Jul 28, 2014, 4:50:11 PM7/28/14

to coreo...@googlegroups.com, kyo...@data-tactics.com

Yes this is the document I was using. The _etcd/registry entry came from the instructions in https://github.com/coreos/etcd/blob/master/Documentation/discovery-protocol.md.

New piece of information: I reset my etcd database, which removed the _etcd/registry entry, as you said above.

Then I started a cluster using Vagrant, cluster name "cluster3". All CoreOS nodes connected to etcd immediately, with no errors!

Then I started another cluster in Openstack, cluster name "cluster1". It started throwing errors "501: All the given peers are not reachable", which is what I have been having trouble with. Despite the 501 error, I confirmed that the etcd registration URL is accessible from the CoreOS host on Openstack.

To me, this indicates the failure is not on the etcd discovery server. Is this correct? What is etcd on the CoreOS host actually looking for?

Kurt Yoder

unread,

Jul 28, 2014, 5:40:24 PM7/28/14

to coreo...@googlegroups.com, kyo...@data-tactics.com

I found the solution.

I installed a new etcd client on Ubuntu so I could more easily diagnose the problem. While starting etcd on Ubuntu, I ran a wire sniffer and found two DNS requests for my etcd.private.net host. The IPv4 responded, but the IPv6 did *not* respond.

Solution: in my cloud-config, connect to the etcd discovery IP address instead of its hostname. This worked immediately and flawlessly.

So this turns out to be a head-scratcher:

CoreOS instances on Vagrant + VirtualBox can use hostname lookups to the etcd server on Openstack (presumably both IPv4 and IPv6 queries return).
CoreOS instances on Openstack must use the ip address of the etcd server (maybe our Openstack networking is weird).

Take-away: it would be nice if etcd gave a more precise message when it fails to connect to the discovery server. "501: All the given peers are not reachable" totally threw me off the scent. In this case, it would have been very helpful if it had mentioned the failing DNS lookup.

Reply all

Reply to author

Forward