Consul multi data center with floating / elastic IP

400 views
Skip to first unread message

Saswat Praharaj

unread,
Aug 14, 2015, 6:17:39 PM8/14/15
to Consul
Hi All,

I was trying to enable WAN discovery with Consul and running into some issues.

I have two data centers running on Openstack -  dc-richardson and dc-texas.
dc-texas uses floating ip for communicating outside. dc-richardson has a public ip.

I did a consul join -wan from primary consul node in dc-richardson to primary node in dc-texas using consul join -wan "richardson_public_ip texas_floating_ip"

It responded with " Successfully joined cluster by contacting 2 nodes. "

Output of consul membes -wan below


[root@pb-dc-richardson-control-01 state846218620]# consul members -wan
Node                                       Address              Status  Type    Build  Protocol  DC
pb-dc-richardson-control-01.dc-richardson  <public_ip>:8302    alive   server  0.5.2  2         dc-richardson
pb-dc-texas-control-01.dc-texas            192.168.128.10:8302  alive   server  0.5.2  2         dc-texas



As you can see in the dc-texas IP it doesn't show floating / elastic ip , instead shows the private ip.
In the log message I see raft failure as it can't reach the private IP.

Aug 14 22:14:52 pb-dc-richardson-control-01 consul: 2015/08/14 22:14:52 [INFO] memberlist: Suspect pb-dc-texas-control-01.dc-texas has failed, no acks received
Aug 14 22:14:57 pb-dc-richardson-control-01 consul: 2015/08/14 22:14:57 [INFO] memberlist: Marking pb-dc-texas-control-01.dc-texas as failed, suspect timeout reached
Aug 14 22:14:57 pb-dc-richardson-control-01 consul: 2015/08/14 22:14:57 [INFO] serf: EventMemberFailed: pb-dc-texas-control-01.dc-texas 192.168.128.10
Aug 14 22:14:57 pb-dc-richardson-control-01 consul: 2015/08/14 22:14:57 [INFO] consul: removing server pb-dc-texas-control-01.dc-texas (Addr: 192.168.128.10:8300) (DC: dc-texas)
Aug 14 22:14:57 pb-dc-richardson-control-01 consul[13228]: consul: removing server pb-dc-texas-control-01.dc-texas (Addr: 192.168.128.10:8300) (DC: dc-texas)

Now I am not able to unjoin the two either.

consul force-leave 192.168.128.10
consul force-leave <floating_ip>

consul can't reach 192.168.128.10 , so that command fails.
force-leave for floating ip also fails.

Is this a configuration problem , bug or wan discovery using floating ip has not been implemented yet ?


Thanks,
Saswat

Lukas Grossar

unread,
Aug 14, 2015, 6:33:08 PM8/14/15
to Consul
Hi Saswat

While building my multiple datacenter cluster with floating ips I ran into similar issues, but they are pretty easy to fix.


On Saturday, August 15, 2015 at 12:17:39 AM UTC+2, Saswat Praharaj wrote:
I did a consul join -wan from primary consul node in dc-richardson to primary node in dc-texas using consul join -wan "richardson_public_ip texas_floating_ip"

It responded with " Successfully joined cluster by contacting 2 nodes. "

This is to be expected, because the consul agent answered correctly on the public/floating ips you provided.

As you can see in the dc-texas IP it doesn't show floating / elastic ip , instead shows the private ip.
In the log message I see raft failure as it can't reach the private IP.
[...]
Is this a configuration problem , bug or wan discovery using floating ip has not been implemented yet ?

The problem here is that by default consul advertises the first available private IP address, and as consul only sees the private address on the interface it advertises it. You need to use '-advertise' or '-advertise-wan' to advertise a specific ip address. Within OpenStack you can query the floating ip using the metadata service, this for example is what we're currently using:

consul agent -server -advertise $(curl -s http://169.254.169.254/2009-04-04/meta-data/public-ipv4) -join-wan ...

I hope this helps you.

Best regards
Lukas

Saswat Praharaj

unread,
Aug 14, 2015, 6:44:03 PM8/14/15
to Consul
Thanks . That should fix the problem.
Reply all
Reply to author
Forward
0 new messages