Nats.io multi datacenter architecture

1,362 views
Skip to first unread message

Alexander B

unread,
Oct 6, 2016, 5:13:28 PM10/6/16
to nats
Hi,
we are evaluating NATS as a messaging system for our company. Looks solid so far.
But we were unable to find anything in the docs regarding how cluster of NATS servers distributes workload between the nodes, and how to make multi datacenter clusters.
The immediate question we have is: is it a good idea to have a single cluster that consists of servers from multiple data centers?

For example:

server1@DC1 -> has NatsNode1
server2@DC1 -> has NatsNode2
... etc

server1@DC2 -> has NatsNodeN
server2@DC2 -> has NastNode(N+1)


and when we deploy a NATS client wrapper (that we have to write ourselves for DC-awareness) on DC1, we give it a set of "preferred" servers to connect to (DC1), and set of "backup" servers (DC2)
then, if DC1 NATS nodes are not available, clients from DC1 will try DC2.  

Is the above a good idea to build an "always on" system?

Thanks,
Alex


Colin Sullivan

unread,
Oct 6, 2016, 7:54:06 PM10/6/16
to nats
Alex,

Thank you for using NATS!

It sounds like you have a good design, particularly with the preferred servers approach.  You may want to ensure local clients use a different server order (within your preferred list) to distribute client more evenly across your local NATS servers.

On thing to be also be aware of in distributing your workload is that NATS servers will only propagate subject interest in a cluster as it needs to.  So, if you have a cluster of servers (A, B, and C), there are clients connected to server A subscribing to subject "foo", a publisher connected to B publishes to "foo", the message will only flow through server B to server A - no messages will be sent to server C. 

To leverage that in a case like yours, you could use the subject namespace in your clients to keep traffic "local" to servers in the datacenter near them - one approach could be to designate a datacenter as part of your subject.  When using your backup servers, you may send traffic between data centers, but when all is well, you'll keep traffic local yet maintain high availability with a NATS cluster crossing data centers.

Can you share more of your use case?

Thanks,
Colin

Alexander Buynyachenko

unread,
Oct 6, 2016, 8:55:54 PM10/6/16
to nats
Hi Colin,
thank you for the prompt response! So I conclude that having multi-datacenter cluster is a valid case. Topic-based data center separation approach looks interesting.
 
So we may have "DC1.topic1" and all DC1 clients publish/subscribe to it, and "DC2.topic1" for the DC2 clients. Then we have our local messages only propagated to the local cluster which is good. And if local cluster is not available in DC1, DC1 clients move to "DC2.topic1".
If we will need some "aggregator" subscriber in a single place, we can subscribe to both topics.

About our use cases:
We run a mid-size online marketplace. We'd like to use messaging system for inter-system data exchange, like:
  • Send event streams for product views, purchases etc to a future "data platform" that will store/aggregate all that for usage like recommendations, marketing etc
  • Inter system (service) communication - sending command messages to the search service so it executes some saved searches
  • Mailing (more info below)
Mailing (not sure how to do it with NATS) 
We need to be able to send message to a mailer consumer group. Message will contain email contents and one of the mailers will send it.
The problem is that if mailers are too slow to process mails in the group, they will need to have their own queue built in, is this right?
So their logic might be: When received message from NATS, add it to its internal queue, and process at it's own pace. But then those mailers have to be highly available themselves...
Maybe the mailing use case can be better achieved with NATS Streaming where you have persistence.

Thanks a lot
Alex

Colin Sullivan

unread,
Oct 7, 2016, 10:57:19 AM10/7/16
to nats
Alex,

Thank you for sharing more of your use case.

I'd like to further refine my answer about using the preferred server list, and made an assumption about your usage.  If it doesn't make much of a difference which data center clients connect to, simply use a server list containing servers in each data center and Apcera supported clients will randomly connect, evenly distributing the workload across the entire cluster (across data centers).  Do not worry about the subject namespace.  Simpler is better!  What clients are you using (e.g. Go, .NET, java)?

I'd suggest trying the simpler method above first.  However, if you do find your are sensitive to traffic crossing data centers (e.g. there is a slow WAN in-between them), you'll want to order your list of servers as you described in your the preferred list.  This would be considered advanced/more complex usage.  To target local servers, you'll want to be sure to enable the NoRandomize flag (disabling randomization of the server list) and then use your preferred list to ensure your clients will attempt connections to local servers first.  There is no need to change your topics if a client connects to a different datacenter in a failure scenario.

Regarding mailing, what you describe could be achieved using a NATS queue group.  Some buffering will help you with backlog, but you'll want to scale to avoid this by adding subscribers to the queue group.  NATS streaming may be a better fit for this case - it supports queue groups, will store messages for you and more intelligently distribute workload to queue subscribers that are most available to process messages (subscribers with the least number of outstanding acknowledgements). 

Thanks,
Colin

Alexander Buynyachenko

unread,
Oct 9, 2016, 9:57:40 PM10/9/16
to nats
Hi Colin,
in our case local-to-local communication is preferred, we don't want it to go over the internet if we can.
Yes I thought NATS groups or Streaming could be the way to go with mailing.
Thank you for great responses!

Regards,
Alex

Anumodh N.K

unread,
Feb 16, 2017, 2:52:08 PM2/16/17
to nats
Hi Colin,

Can a NATS queue group span across DC (data centers) ? 

Thanks,
Anumodh

Colin Sullivan

unread,
Feb 16, 2017, 6:30:53 PM2/16/17
to nats
Hi Anumodh,

Absolutely - NATS will work anywhere you have TCP connectivity.

Regards,
Colin
Reply all
Reply to author
Forward
0 new messages