etcd discovery protocol "for dummies"

386 views
Skip to first unread message

Kurt Yoder

unread,
Jul 25, 2014, 6:38:19 PM7/25/14
to coreo...@googlegroups.com
Hello again,

Can someone give me a step-by-step example of how to get etcd and CoreOS working together? My first attempt failed, possibly because I misunderstood the instructions.

Here's what I've done so far:
However the above has resulted in my CoreOS VMs connecting very strangely (if at all) to my etcd.private.net host. See my previous post for the gory details.

Can somebody give me a hint how I can get this working? I'm currently dead in the water after several days of attempts :(


Thanks!

-Kurt

Brandon Philips

unread,
Jul 28, 2014, 3:01:20 PM7/28/14
to Kurt Yoder, coreos-user
On Fri, Jul 25, 2014 at 3:38 PM, Kurt Yoder <kyo...@data-tactics.com> wrote:
> Set up an Ubuntu box, containing a compiled etcd. The box has IP
> 10.10.10.10, hostname etcd.private.net
> Start up etcd on the ubuntu box like so:
> ./etcd/bin/etcd -bind-addr=0.0.0.0
> (the next two steps are me trying to interpret discovery-protocol.md)
> curl -X PUT
> "http://127.0.0.1:4001/v2/keys/_etcd/registry/cluster2/etcd?ttl=604800" -d
> value='10.10.10.10:7001'
These step are not necessary and are very likely causing the problem.
If you want you can create the directory for discovery first:

curl -X PUT '127.0.0.1:4001/v2/_etcd/registry/cluster2?dir=true'

> coreos:
> etcd:
> discovery: http://etcd.private.net:4001/v2/keys/cluster2
> addr: $private_ipv4:4001
> peer-addr: $private_ipv4:7001
> units:
> - name: etcd.service
> command: start
> - name: fleet.service
> command: start

This all looks fine.

Brandon

Brandon Philips

unread,
Jul 28, 2014, 4:26:33 PM7/28/14
to Kurt Yoder, coreos-user
On Fri, Jul 25, 2014 at 3:38 PM, Kurt Yoder <kyo...@data-tactics.com> wrote:
> Can someone give me a step-by-step example of how to get etcd and CoreOS
> working together? My first attempt failed, possibly because I misunderstood
> the instructions.

BTW, this document should all work:
https://github.com/coreos/etcd/blob/master/Documentation/cluster-discovery.md#running-your-own-discovery-endpoint

Brandon

Kurt Yoder

unread,
Jul 28, 2014, 4:50:11 PM7/28/14
to coreo...@googlegroups.com, kyo...@data-tactics.com
Yes this is the document I was using. The _etcd/registry entry came from the instructions in https://github.com/coreos/etcd/blob/master/Documentation/discovery-protocol.md.

New piece of information: I reset my etcd database, which removed the _etcd/registry entry, as you said above. 

Then I started a cluster using Vagrant, cluster name "cluster3". All CoreOS nodes connected to etcd immediately, with no errors!

Then I started another cluster in Openstack, cluster name "cluster1". It started throwing errors "501: All the given peers are not reachable", which is what I have been having trouble with. Despite the 501 error, I confirmed that the etcd registration URL is accessible from the CoreOS host on Openstack.

To me, this indicates the failure is not on the etcd discovery server. Is this correct? What is etcd on the CoreOS host actually looking for?

Kurt Yoder

unread,
Jul 28, 2014, 5:40:24 PM7/28/14
to coreo...@googlegroups.com, kyo...@data-tactics.com
I found the solution.

I installed a new etcd client on Ubuntu so I could more easily diagnose the problem. While starting etcd on Ubuntu, I ran a wire sniffer and found two DNS requests for my etcd.private.net host. The IPv4 responded, but the IPv6 did *not* respond.

Solution: in my cloud-config, connect to the etcd discovery IP address instead of its hostname. This worked immediately and flawlessly.

So this turns out to be a head-scratcher:
  • CoreOS instances on Vagrant + VirtualBox can use hostname lookups to the etcd server on Openstack (presumably both IPv4 and IPv6 queries return).
  • CoreOS instances on Openstack must use the ip address of the etcd server (maybe our Openstack networking is weird).
Take-away: it would be nice if etcd gave a more precise message when it fails to connect to the discovery server. "501: All the given peers are not reachable" totally threw me off the scent. In this case, it would have been very helpful if it had mentioned the failing DNS lookup.
Reply all
Reply to author
Forward
0 new messages