Multiple instances of consul on the same server

1,657 views
Skip to first unread message

Jeff Weeks

unread,
May 13, 2016, 10:10:27 AM5/13/16
to Consul
Hello,

I'd like to run multiple instances of consul on a server.  I believe consul always binds to port 8300, and so multiple instances require multiple Ethernet cards, correct?

I tried to work around this by using a virtual interface as follows:

neuralFour:/home/dcipher/Downloads/consul # ifconfig enp4s0:0 192.168.250.1
neuralFour:/home/dcipher/Downloads/consul # ifconfig
enp4s0    Link encap:Ethernet  HWaddr 54:42:49:65:FA:75  
          inet addr:192.168.214.95  Bcast:192.168.215.255  Mask:255.255.254.0
          inet6 addr: fe80::5642:49ff:fe65:fa75/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:29651 errors:0 dropped:0 overruns:0 frame:0
          TX packets:8475 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:15223001 (14.5 Mb)  TX bytes:1703315 (1.6 Mb)
          Interrupt:18 

enp4s0:0  Link encap:Ethernet  HWaddr 54:42:49:65:FA:75  
          inet addr:192.168.250.1  Bcast:192.168.250.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          Interrupt:18 

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:2366 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2366 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:319703 (312.2 Kb)  TX bytes:319703 (312.2 Kb)

vboxnet0  Link encap:Ethernet  HWaddr 0A:00:27:00:00:00  
          inet addr:172.20.20.1  Bcast:172.20.20.255  Mask:255.255.255.0
          inet6 addr: fe80::800:27ff:fe00:0/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:95 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 b)  TX bytes:16078 (15.7 Kb)


And I'm able to create a consul server bound to enp4s0, but not enp4s0:0.

The first works:

[dcipher@neuralFour 09:59:16 ~/Downloads/consul]$ ./consul agent -server -data-dir /tmp/consul -node=ldServer2 -config-dir ./server.2 -bind 192.168.214.95==> Starting Consul agent...
==> Starting Consul agent RPC...
==> Consul agent running!
         Node name: 'ldServer2'
        Datacenter: 'dc1'
            Server: true (bootstrap: false)
       Client Addr: 127.0.0.1 (HTTP: 8500, HTTPS: -1, DNS: 8600, RPC: 8400)
      Cluster Addr: 192.168.214.95 (LAN: 8301, WAN: 8302)
    Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false
             Atlas: <disabled>
<etc>...

But the second just hangs:

[dcipher@neuralFour 10:03:32 ~/Downloads/consul]$ ./consul agent -server -data-dir /tmp/consul -node=ldServer3 -config-dir ./server.3 -bind 192.168.250.1
==> Starting Consul agent...
<hang>

Is there some debug logging I can enable to determine what's going on?  (strace shows it waiting on a futex)

Is what I'm doing reasonable/correct?

Thanks,
Jeff

David Adams

unread,
May 13, 2016, 10:36:59 AM5/13/16
to consu...@googlegroups.com
Looks like the first agent is binding to localhost for ports 8400, 8500, and 8600, and so presumably the next agent is attempting the same thing.

By default the `-bind` argument only controls where the 8300-8302 RPC and Serf ports bind. You can control which IPs and ports each service binds to using the `addresses` and `ports` hashes in the configuration file along with the `bind` setting. See:



--
This mailing list is governed under the HashiCorp Community Guidelines - https://www.hashicorp.com/community-guidelines.html. Behavior in violation of those guidelines may result in your removal from this mailing list.
 
GitHub Issues: https://github.com/hashicorp/consul/issues
IRC: #consul on Freenode
---
You received this message because you are subscribed to the Google Groups "Consul" group.
To unsubscribe from this group and stop receiving emails from it, send an email to consul-tool...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/consul-tool/48710019-28a8-478e-8681-97fccfa31d38%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
David Adams | Systems Administrator

Jeff Weeks

unread,
May 13, 2016, 10:45:23 AM5/13/16
to Consul
Right; I discovered an article talking about specifying -client for this as well, and tried this:

[dcipher@neuralFour 10:45:47 ~/Downloads/consul]$ ./consul agent -server -bootstrap -data-dir /tmp/consul -node=ldServer2 -config-dir ./server.2 -bind 192.168.214.95 -client=192.168.214.95
<this succeeds>

[dcipher@neuralFour 10:46:09 ~/Downloads/consul]$ ./consul agent -server -data-dir /tmp/consul -node=ldServer3 -config-dir ./server.3 -bind 192.168.250.1 -client=192.168.250.1
==> Starting Consul agent...

It looks like the correct addresses are being used now, but still the second instance hangs here (I see that both 8300 sockets have the same file descriptor... is that expected?  I'm not really sure how virtual interfaces work under the covers, so wasn't sure if that's related to the issue, or normal/expected):

neuralFour:/home/dcipher/Downloads/consul # ss -antp | grep -i consul
LISTEN     0      128           192.168.250.1:8300                     *:*      users:(("consul",pid=7564,fd=4))
LISTEN     0      128          192.168.214.95:8300                     *:*      users:(("consul",pid=7550,fd=4))
LISTEN     0      128          192.168.214.95:8301                     *:*      users:(("consul",pid=7550,fd=8))
LISTEN     0      128          192.168.214.95:8302                     *:*      users:(("consul",pid=7550,fd=11))
LISTEN     0      128          192.168.214.95:8400                     *:*      users:(("consul",pid=7550,fd=13))
LISTEN     0      128          192.168.214.95:8500                     *:*      users:(("consul",pid=7550,fd=14))
LISTEN     0      128          192.168.214.95:8600                     *:*      users:(("consul",pid=7550,fd=17))

Jeff Weeks

unread,
May 13, 2016, 3:14:30 PM5/13/16
to Consul
Further info; hoping someone can aide in me in debugging this, as I'm running out of ideas.

1. If I close the server that works, the one that was hung starts up, so it appears as though the second/hung server is waiting on a resource which the first one owns.

2. If I try this in a vagrant vm (same as the ones provided in the getting started tutorial), the second server doesn't start at all:

# ifconfig eth1:0 172.20.30.10

# ifconfig
eth0      Link encap:Ethernet  HWaddr 08:00:27:31:e9:32  
          inet addr:10.0.2.15  Bcast:10.0.2.255  Mask:255.255.255.0
          inet6 addr: fe80::a00:27ff:fe31:e932/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:3199 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2020 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:267161 (260.8 KiB)  TX bytes:183709 (179.4 KiB)

eth1      Link encap:Ethernet  HWaddr 08:00:27:ad:d9:b5  
          inet addr:172.20.20.10  Bcast:172.20.20.255  Mask:255.255.255.0
          inet6 addr: fe80::a00:27ff:fead:d9b5/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:40 errors:0 dropped:0 overruns:0 frame:0
          TX packets:12 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:22574 (22.0 KiB)  TX bytes:1016 (1016.0 B)

eth1:0    Link encap:Ethernet  HWaddr 08:00:27:ad:d9:b5  
          inet addr:172.20.30.10  Bcast:172.20.255.255  Mask:255.255.0.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

# consul agent -server -bootstrap-expect 1     -data-dir /tmp/consul -node=agent-one -bind=172.20.20.10 -client=172.20.20.10    -config-dir /etc/consul.d
<the above command works>

# consul agent -server -data-dir /tmp/consul -node=agent-true -bind=172.20.30.15 -client=172.20.30.15 -config-dir /etc/consul.d
==> Starting Consul agent...
==> Error starting agent: Failed to start Consul server: Failed to start RPC layer: listen tcp 172.20.30.15:8300: bind: cannot assign requested address

--Jeff

David Adams

unread,
May 13, 2016, 3:57:12 PM5/13/16
to consu...@googlegroups.com
I guess my next suggestion would be to try setting all the services to different ports on each server and see if it works. That would not necessarily solve your problem but it would indicate where the problem may lie.


For more options, visit https://groups.google.com/d/optout.

Jeff Weeks

unread,
May 16, 2016, 2:51:15 PM5/16/16
to Consul
I'm not convinced its a socket thing.

So I've started server1 successfully, and it's using the following connections:

neuralFour:/home/dcipher/Downloads/consul # ss -antp | grep -i consul
LISTEN     0      128          192.168.214.97:8300                     *:*      users:(("consul",pid=5847,fd=4))
LISTEN     0      128          192.168.214.97:8301                     *:*      users:(("consul",pid=5847,fd=7))
LISTEN     0      128          192.168.214.97:8302                     *:*      users:(("consul",pid=5847,fd=11))
LISTEN     0      128          192.168.214.97:8080                     *:*      users:(("consul",pid=5847,fd=14))
LISTEN     0      128          192.168.214.97:8400                     *:*      users:(("consul",pid=5847,fd=13))
LISTEN     0      128          192.168.214.97:8500                     *:*      users:(("consul",pid=5847,fd=15))
LISTEN     0      128          192.168.214.97:8600                     *:*      users:(("consul",pid=5847,fd=18))

I've modified server2's consul.json to disable some services, and ensure that it has no duplicate ports for the remaining services:

{
    "server": true,
    "addresses": {
        "rpc": "192.168.250.1",
        "https": "192.168.250.1",
        "dns": "192.168.250.1",
        "http": "192.168.250.1"

    },
    "ports": {
        "dns": -1,
        "http": -1,
        "https": -1,
        "rpc": 9400,
        "serf_lan": 9301,
        "serf_wan": 9302,
        "server": 9300
    }
}

And yet, still, starting server2 hangs:

[dcipher@neuralFour 14:49:40 ~/Downloads/consul]$ ./consul agent -server -data-dir /tmp/consul -node=ldServer2 -config-dir ./server.2 -bind 192.168.250.1
==> Starting Consul agent...
<hang>

At this point, the socket make-up looks like this:

neuralFour:/home/dcipher/Downloads/consul # ss -antp | grep -i consul
LISTEN     0      128          192.168.214.97:8300                     *:*      users:(("consul",pid=5847,fd=4))
LISTEN     0      128          192.168.214.97:8301                     *:*      users:(("consul",pid=5847,fd=7))
LISTEN     0      128          192.168.214.97:8302                     *:*      users:(("consul",pid=5847,fd=11))
LISTEN     0      128          192.168.214.97:8080                     *:*      users:(("consul",pid=5847,fd=14))
LISTEN     0      128          192.168.214.97:8400                     *:*      users:(("consul",pid=5847,fd=13))
LISTEN     0      128           192.168.250.1:9300                     *:*      users:(("consul",pid=6075,fd=4))
LISTEN     0      128          192.168.214.97:8500                     *:*      users:(("consul",pid=5847,fd=15))
LISTEN     0      128          192.168.214.97:8600                     *:*      users:(("consul",pid=5847,fd=18))

There's no duplicates here... server2 should be able to bind to 192.168.251.1:9300 without contention... and, in fact, given the ss output, I suspect it has.  I think it's some post-processing after creating this socket which is causing the hang (although I'm not entirely sure how to debug further).

Does the provided 'consul' binary have any debug symbols in it?  Is it worth trying to run it in a 'go' debugger?

--Jeff

Uberwrensch

unread,
Nov 18, 2016, 1:32:41 PM11/18/16
to Consul
Hi Jeff,

You ever get this working? Looks like I might need to do the same/similar.

Thanks,
Song

Jeff Weeks

unread,
Nov 18, 2016, 6:00:06 PM11/18/16
to consu...@googlegroups.com

I didn't,  no.
I've been sidetracked for a while on other design work and haven't revisited this.
I belive I'll probably not need to solve this anymore as i  can use a single server and multiple providers of a service registering to it.

-Jeff


You received this message because you are subscribed to a topic in the Google Groups "Consul" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/consul-tool/bDAg4N62B3U/unsubscribe.
To unsubscribe from this group and all its topics, send an email to consul-tool+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/consul-tool/241653b5-aecb-4b51-860d-fdd3a7b16c6a%40googlegroups.com.

Uberwrensch

unread,
Nov 19, 2016, 1:19:47 AM11/19/16
to Consul

Thanks for the response.


I managed to get two instances -server running on same host binding to same IP--I didn't not go separate network interface route. I'm simply advertising and binding to differing ports. Seems to work ok. Continuing to test.


Thanks,

Song

To unsubscribe from this group and all its topics, send an email to consul-tool...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages