RabbitMQ cluster on Azure virtual machine

1,708 views
Skip to first unread message

Davide Icardi

unread,
Nov 13, 2014, 6:07:52 PM11/13/14
to rabbitm...@googlegroups.com
We are trying to install a RabbitMQ cluster with 2 virtual machine on Azure. Unfortunately seems that the rabbitmq connection is not stable, from the client we have a lot of strange errors like:

Publisher did not confirm message
- Publish not confirmed before channel closed
SocketException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 104.40.186.27:5672

We also have noted a general slowness but sometime it works.
On rabbitmq logs we see errors like:

- closing AMQP connection <0.390.0> (100.73.204.90:61152 -> 100.73.205.2:5672):
   {handshake_timeout,handshake}

Here our procedure to setup the environment:

We first create the 2 Ubuntu 14.04 virtual machines (A3: 4 cores, 7 GB) on the same cloud services.
We create 2 public endpoints with a load balancer for port 5672 and 15672. Our clients are hosted inside Azure websites on the same region.

Then we run following script to install RabbitMQ on both machine:
 
sudo add-apt-repository 'deb http://www.rabbitmq.com/debian/ testing main'"
sudo apt-get update
sudo apt-get -q -y --force-yes install rabbitmq-server=3.4.1-1
sudo invoke-rc.d rabbitmq-server stop
echo 'MYCOOKIEVALUE' | sudo tee /var/lib/rabbitmq/.erlang.cookie
sudo chown rabbitmq:rabbitmq /var/lib/rabbitmq/.erlang.cookie
sudo chmod 400 /var/lib/rabbitmq/.erlang.cookie
sudo invoke-rc.d rabbitmq-server start
sudo rabbitmq-plugins enable rabbitmq_management
sudo invoke-rc.d rabbitmq-server stop
sudo invoke-rc.d rabbitmq-server start
sudo rabbitmqctl add_user user1 pwd1
sudo rabbitmqctl set_user_tags user1 administrator
sudo rabbitmqctl set_permissions -p / user1 '.*' '.*' '.*'
And then we create the cluster using:
 
sudo rabbitmqctl stop_app
sudo rabbitmqctl join_cluster rabbit@myhostname1
sudo rabbitmqctl start_app
sudo rabbitmqctl set_cluster_name my_cluster_name


We have not opened any other port (like 4369 and 25672) because we suppose that these are only used for internal communication between nodes. It is right?
We connect to rabbitmq from the client using the cloud service host name.

Do you have any idea?

Should I follow the instruction on http://www.rabbitmq.com/ec2.html, for example the one about hostnames?

thanks
davide

Michael Klishin

unread,
Nov 13, 2014, 11:55:50 PM11/13/14
to Davide Icardi, rabbitm...@googlegroups.com
This message, as the one from the client, suggest really bad latency from your client to Azure. Note that the cluster may be fine — given that you do not see other errors, of course — it's connectivity from your client that is the issue. RabbitMQ connection negotiation timeout is several seconds.

MK

Mariusz Wojcik

unread,
Nov 14, 2014, 4:11:04 AM11/14/14
to rabbitm...@googlegroups.com, davide...@gmail.com
Hi,

  As Michael said, the problem seemed to be a connection between client and RabbitMQ cluster.
  Now, your configuration seemed to be all right. As you noticed, ports 4369 and 25672 are for internal cluster communication hence there is no need to open them in the load balancer and assuming you do not have any firewall running on instances, it should work fine. Also, usually you do not need to worry about hostnames unless you plan to change machine names.

   You can run some simple test to pinpoint the problem:

* go to management plugin (https://www.rabbitmq.com/management.html) and publish few messages from it
* try group Azure VMs and Web Sites in the same Affinity Group
* simplify you message publish code - create a test page which just publishes a message, 100 messages, 10K etc. Keep the code as simple as possible

MW

Davide Icardi

unread,
Nov 14, 2014, 9:37:40 AM11/14/14
to Mariusz Wojcik, rabbitm...@googlegroups.com
Thanks Mariusz, 

Unfortunately we are trying to use Azure WebSites instead of VM for a better scalability...

Anyway I will try to contact Microsoft and see what they think.

Regards
Davide



On Fri, Nov 14, 2014 at 3:11 PM, Mariusz Wojcik <mariusz...@hotmail.co.uk> wrote:
Hi Davide,

   Sorry, my bad. What I meant was to create VM in the Affinity Group which will host your web site.
   Did you try asking Azure Customer Care whether they notice any connectivity problems?

Cheers,

Mariusz


Date: Fri, 14 Nov 2014 10:47:54 +0100
Subject: Re: [rabbitmq-users] RabbitMQ cluster on Azure virtual machine
From: davide...@gmail.com
To: mariusz...@hotmail.co.uk


Thanks Mariusz for your suggestions.

I'm worried because doesn't seem to be reliable, sometimes it works and sometime it doesn't.
On our on premises installation we don't have any of these problems, so I really suspect it is something in the azure network environment.

Do you have some more information on how to set the affinity group for a WebSite? I cannot find any setting.

Regards
Davide
 

Neil Mackenzie

unread,
Nov 15, 2014, 2:59:54 PM11/15/14
to rabbitm...@googlegroups.com
Affinity groups are used only for PaaS and IaaS cloud services and Azure Storage. They cannot be used for Azure Websites. Furthermore, Azure networking has been modified over the years so there is little benefit to be gained from using an affinity group in any situation.
 
I have no knowledge of RabbitMQ so the following should be viewed as general Azure IaaS guidance that could inform a RabbitMQ deployment. Before doing a RabbitMQ deployment in Azure I would investigate whether Azure Service Bus Brokered Messaging could be used instead.
 
An Azure Website can be connected to a VM in an Azure cloud service in one of following ways:
 
  1. through a public endpoint on the cloud service that is mapped to a port on the VM. This traffic is routed through the Azure Load Balancer and will use either hash-based load balancing or port forwarding.
  2. through a public instance-level endpoint directly on the VM. This traffic goes directly to the VM, bypassing the Azure Load Balancer. Appropriate firewalling is needed for this VM, since it sits directly on the internet. There would be no need to declare a public endpoint for RabbitMQ
  3. through a VPN connection between the Azure Website and the VNET in which the VM sits (assuming it has been added to one). This traffic flows over a private network directly to the VMs, and bypasses the Azure Load Balancer. There would be no need to declare a public endpoint for RabbitMQ.
In case 1, you should also configure a health probe if you use hash-based load balancing so that the Azure Load Balancer can take the VM out of rotation if it is failing. Azure now supports various forms of load-balancing algorithm:
 
  • source IP & port, destination IP & port, protocol
  • source IP, destination IP, protocol
  • source IP, destination IP
 
The latter two provide session-based routing for legacy applications that cannot take advantage of horizontally-scaled architectures.

In case 2, each VM with a public instance-level IP address (PIP) will have two public IP addresses - one for the cloud service VIP and one for the PIP. Either can be addressed from the outside, with the rules for the VIP being the same as case 1. Outbound traffic will flow over the PIP. There is a limit of 5 PIPs per subscription.

In case 3, traffic from the website would go over the VPN to the internal DIP of the VM. It would not be load balanced unless it would routed via an Internal Load Balancer which has similar capabilities to the Azure Load Balancer. 
 
If you want HA for multiple VMs, you would typically put them in an availability set which directs the Azure Fabric Controller to distribute them among different "racks" and take specific actions when host OS upgrades are performed. Note that an availability set is completely different from an affinity group.
 
 
 
 

Davide Icardi

unread,
Nov 16, 2014, 3:37:59 PM11/16/14
to Neil Mackenzie, rabbitm...@googlegroups.com
Thank you Neil, very clear explanation.

Currently we are using solution 1 (public endpoint with azure load balancer). 

Just one more clarification about probe settings.
Currently I have configured all 3 ports (private, public, probe) with the same rabbitmq port, 5672. It is ok? And what about probe timeout? How do you suggest to configure it?
Considering that RabbitMQ has an handshake timeout of 10 seconds I should use a lower value?

I can't find many informations about Load Balancer settings on internet...

Regards
Davide


--
You received this message because you are subscribed to a topic in the Google Groups "rabbitmq-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/rabbitmq-users/n0DzYVtxoak/unsubscribe.
To unsubscribe from this group and all its topics, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Michael Klishin

unread,
Nov 16, 2014, 3:42:17 PM11/16/14
to Neil Mackenzie, Davide Icardi, rabbitm...@googlegroups.com
 On 16 November 2014 at 20:38:00, Davide Icardi (davide...@gmail.com) wrote:
> Considering that RabbitMQ has an handshake timeout of 10 seconds
> I should use a lower value?

While handshake_timeout can be configured, its default value (5 seconds IIRC) is reasonable
for reasonable networks. I'm not sure bumping the value is going to work very well in the end
but feel free to try.
--
MK

Staff Software Engineer, Pivotal/RabbitMQ

Davide Icardi

unread,
Nov 16, 2014, 3:58:52 PM11/16/14
to Michael Klishin, Neil Mackenzie, rabbitm...@googlegroups.com
From this page 
seems to be 10000 milliseconds. 

Anyway I don't want to modify rabbitmq timeout, I'm just asking about Azure Load Balancer probe timeout. Maybe balancer settings can hinder in some way the environment? 
Sorry if sound stupid, I'm quite ignorant in this subject...

davide

Michael Klishin

unread,
Nov 16, 2014, 4:09:01 PM11/16/14
to Davide Icardi, Neil Mackenzie, rabbitm...@googlegroups.com
On 16 November 2014 at 20:58:51, Davide Icardi (davide...@gmail.com) wrote:
> Anyway I don't want to modify rabbitmq timeout, I'm just asking
> about Azure Load Balancer probe timeout. Maybe balancer settings
> can hinder in some way the environment?
> Sorry if sound stupid, I'm quite ignorant in this subject...

According to http://azure.microsoft.com/blog/2014/08/14/new-configurable-idle-timeout-for-azure-load-balancer/,
you can configure a TCP connection timeout for load balancer backends. This has to be higher
than 1/2 of your connection's heartbeat timeout (ideally higher than the whole heartbeat timeout)
because idle connections in some apps is a perfectly common scenario.

Some other Azure settings, e.g. ProbeIntervalInSeconds, seem to be entirely undocumented,
hopefully  because the names are so telling.

Neil Mackenzie

unread,
Nov 16, 2014, 9:07:52 PM11/16/14
to rabbitm...@googlegroups.com, davide...@gmail.com, nmack...@live.com
Most Azure services have a RESTful API and that is typically the place to look for the details of service configuration. The intervalInSeconds and timeoutInSeconds are documented here.

The idea with the load balancer probe is that it probes every intervalInSeconds and takes the VM out of rotation if it has not had a valid response in timeOutInSeconds. The defaults are 15 and 31 seconds respectively, but these times can be dropped as low as 5 and 11 if desired.

The load balancer probe is either HTTP to a specified URL, when it expects to receive a 200 OK, or TCP to a specified endpoint when it expects to connect successfully. The VM can, of course, do work before deciding how to responde - for example, the VM itself could be ready but an underlying service may not be.

Tomasz

unread,
Nov 29, 2014, 1:06:42 PM11/29/14
to rabbitm...@googlegroups.com
Hello Davide,

I am currently investigating RabbitMQ cluster in Azure and was wondering if you have managed to solve your problem in a satisfactional way?
For me it looks like cluster could be solution if there would be load balancer available in Azure which will support node priotization and active passive node option (traffic will be always directed to node A as long as it is available, in this case master will almost always remain master, which should make network partitioning recover pretty easy).

Thanks,
Tomasz

Davide Icardi

unread,
Nov 29, 2014, 1:32:20 PM11/29/14
to Tomasz, rabbitm...@googlegroups.com

Unfortunately, after many attempts, we changed our implementation to use Azure Service Bus.

We weren't able to run rabbitmq on azure in a reliable way. It doesn't seem to be a problem with the cluster (we have used the default endpoint load balancer, and the cluster seems to be ok).

--

Tomasz

unread,
Dec 2, 2014, 7:11:53 AM12/2/14
to rabbitm...@googlegroups.com, tomasz...@gmail.com
Hello Davide,

Could you please elaborate a bit more, what was the reason that you could not run rabbitmq in azure in reliable way?
Do you mean that if you would go with cluster - everything woudl be fine?
Our initial tests show that we are not facing network partitioning at least.

Thanks a lot,
Tomasz

Davide Icardi

unread,
Dec 2, 2014, 1:08:48 PM12/2/14
to Tomasz, rabbitm...@googlegroups.com
I mean that we are unable to install a stable RabbitMQ instance on Azure (with a cluster or without a cluster).
Every instances seems to be unable to send/receive messages to and from Azure Web Site in a reliable way. 
Sometime the messages arrive, some minutes later everything seems to be broken. 
On RabbitMQ and on our clients we have a lot of timeout error (handshake or connection).

Probably it is some wrong configuration but after many days we finally decided to replace it with Azure Service Bus. For now we use RabbitMQ only on our own hardware.

best regards
davide

Tomasz

unread,
Dec 2, 2014, 4:22:43 PM12/2/14
to rabbitm...@googlegroups.com, tomasz...@gmail.com
Thanks, Davide.
We have tested so far only setup in one VNET and within VMs and so far so good. Hence I believe it also depends how finally you ahve connnected website to your VNET - did you use VPN Point to Site ?

Carl Hörberg

unread,
Dec 3, 2014, 2:03:08 AM12/3/14
to rabbitm...@googlegroups.com, tomasz...@gmail.com
We at CloudAMQP recently released full support for Azure, ie. we setup and maintain RabbitMQ clusters also on Azure VMs now.

We don't use Azure's load balancer but the recently released public instance-level IPs and DNS for load-balancing and failover. So I would recommend anyone to try that, and avoid their load balancer. Unfortuently they don't make it easy to enable PIP, you can only do it through the API, and the API isn't great (xml-based, cryptic error reporting etc).
Reply all
Reply to author
Forward
0 new messages