Scaling of WAN gossip communication with multiple datacenters

263 views
Skip to first unread message

Lukas Grossar

unread,
Aug 14, 2015, 8:16:51 AM8/14/15
to Consul
Hello

We have a project where we try to add multitenancy to an application that is not able to handle multitenancy on it's own by running multiple instances of the application within OpenStack tenants in combination with consul service discovery and a custom service gateway (nginx+lua).

Our first plan was to start a consul client agent in every tenant to announce the services and limit the capabilities the client agents using ACL tokens and service name prefixes. The whole consul cluster would be a single datacenter with 3-5 server agents and multiple (up to 100) client agents.

During development we also had the idea to take the separation of tenants a step further and run a single server agent as its own datacenter within every tenant and run a 3-5 node consul cluster in the "master" datacenter that would also be the acl_datacenter. The outcome would be that we have a master datacenter with only a small number of nodes and up to 100 datacenters connected to this master datacenter. This idea might sound completely crazy, but worked out pretty well in the first small (3-5 DCs) tests.

The question that is arose is about the scaling capability of the gossip WAN pool. Is consul able to handle such a high number of datacenters and what could be problems that we might face on scale? Keep in mind that the normal WAN considerations (latency, timeouts, ...) wouldn't apply here, because all tenants would be on the same private cloud.

I would be very grateful if some developer could provide some input on this idea. Also if you think that this questions is better suited for the serf mailinglist, just tell me.

Best regards
Lukas

Armon Dadgar

unread,
Aug 14, 2015, 2:16:33 PM8/14/15
to consu...@googlegroups.com, Lukas Grossar
Hey Lukas,

The WAN and LAN gossip layers both are built on the Serf gossip library. The San Diego Super Computing center
currently runs a 10K node cluster with Serf. So the limitation on the number of members in a gossip pool is very high.

In terms of other considerations, there is will be some additional TCP connections used to forward requests
between the different “datacenters”, but the impact should be negligible. The biggest concern in that setup is having
only a single server per DC could lead to data loss as there is no cross-DC replication.

Best Regards,
Armon Dadgar
--
This mailing list is governed under the HashiCorp Community Guidelines - https://www.hashicorp.com/community-guidelines.html. Behavior in violation of those guidelines may result in your removal from this mailing list.
 
GitHub Issues: https://github.com/hashicorp/consul/issues
IRC: #consul on Freenode
---
You received this message because you are subscribed to the Google Groups "Consul" group.
To unsubscribe from this group and stop receiving emails from it, send an email to consul-tool...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/consul-tool/bbb8dfaf-a24b-4d4c-bf3a-bf994aa358c4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Lukas Grossar

unread,
Aug 14, 2015, 6:43:35 PM8/14/15
to Consul, lukas....@gmail.com
Hi Armon


On Friday, August 14, 2015 at 8:16:33 PM UTC+2, Armon Dadgar wrote:
The WAN and LAN gossip layers both are built on the Serf gossip library. The San Diego Super Computing center
currently runs a 10K node cluster with Serf. So the limitation on the number of members in a gossip pool is very high.

That is very good news, thanks for the information
 
In terms of other considerations, there is will be some additional TCP connections used to forward requests
between the different “datacenters”, but the impact should be negligible. The biggest concern in that setup is having
only a single server per DC could lead to data loss as there is no cross-DC replication.

I'm aware of the additional TCP connections, and decided that they shouldn't cause much trouble. The single server per DC is a known problem, but currently we would only use the client to promote a small number of services within the tenants and probably won't make use of the K/V store or anything else and would only lose the service information for as long as the server is down.

Thanks a lot for your input.

Best regards
Lukas Grossar
Reply all
Reply to author
Forward
0 new messages