Azure Load Balancer and health probes

295 views
Skip to first unread message

OffColour

unread,
Jun 13, 2019, 12:21:19 PM6/13/19
to rabbitmq-users
Hi,

I've got a Windows RabbitMQ cluster working happily in Azure behind a load balancer and all is working fine apart from logging of health probes.

Under 3.7.7 I'm getting the following log entries:

2019-06-13 00:02:54.946 [info] <0.25982.17> accepting AMQP connection <0.25982.17> (168.63.129.16:60023 -> 10.20.96.47:5672)
2019-06-13 00:02:54.946 [warning] <0.25982.17> closing AMQP connection <0.25982.17> (168.63.129.16:60023 -> 10.20.96.47:5672):{handshake_timeout,handshake}

Under 3.7.15 I get:
2019-06-13 14:59:13.235 [error] <0.15104.4> closing AMQP connection <0.15104.4> (168.63.129.16:60028 -> 10.20.97.47:5672):{handshake_timeout,handshake}

I'd read elsewhere that connections with no data no longer get logged, so it might be that the Azure load balancer has at least a byte in its TCP health probe, but that doesn't appear to be the case from the packet capture.

Odd that what was previously a warning has now become an error though.

Input appreciated!

Thanks.


OffColour

unread,
Jun 18, 2019, 6:49:07 AM6/18/19
to rabbitmq-users
Quick bump, As I'd like to get rid of all the handshake timeouts from the logs if at all possible.

Luke Bakken

unread,
Jun 18, 2019, 12:45:02 PM6/18/19
to rabbitmq-users
Hello,

The only way to get handshake_timeout,handshake is if the TCP connection stays open longer than the handshake timeout value (10 seconds by default). You should configure the health probe to close it's connection sooner. I tested by using this command ...

nc -4 localhost 5672

If I just let it sit, RabbitMQ closes the connection at 10 seconds and logs it. If I kill nc before 10 seconds, nothing is logged.


Thanks,
Luke

Luke Bakken

unread,
Jun 18, 2019, 12:48:27 PM6/18/19
to rabbitmq-users
Well that link isn't as helpful as I thought. View the files in that diff, and then go to the changes in rabbit_reader.erl. You'll see the log level change.

OffColour

unread,
Jun 21, 2019, 11:59:20 AM6/21/19
to rabbitmq-users
Hi Luke,

We've no control over the load balancer timeout, but looking at the RabbitMQ log entries, the accepting and timeout seem to be immediate unless there's something under the covers I'm missing.

OffColour

unread,
Jun 21, 2019, 12:42:27 PM6/21/19
to rabbitmq-users
I've also tested ncat as you suggested and get the log file entry after 10 seconds as you said.
For experimentation, I set handshake_timeout=30000 in the config file, but the connection was still forcibly closed after 10 seconds.

OffColour

unread,
Jun 21, 2019, 1:13:20 PM6/21/19
to rabbitmq-users
I think I've found the issue.
The Azure Load Balancer health probe is only sending an ACK rather than a FIN/ACK so RabbitMQ RST the connection after 10 seconds.

Ticket raised with Microsoft.

Thanks for your help, Luke. I've learned a lot along the way!

Luke Bakken

unread,
Jun 24, 2019, 12:45:01 PM6/24/19
to rabbitmq-users
Thank you for following up with the mailing list.
Reply all
Reply to author
Forward
0 new messages