Integrating with MQTT for scalable load-balancing cluster?

1,884 views
Skip to first unread message

RabbitMQ User

unread,
Sep 15, 2014, 2:14:10 AM9/15/14
to rabbitm...@googlegroups.com
Hi all,

Currently we have a push notification service based on MQTT + Apache Apollo. But our current implementation does not tolerate failure well since there is only one broker instance. 

Since there is no clustering implementation in Apache Apollo, so I wonder if I can use RabbitMQ instead of Apollo for load-balancing HA & scalability?


Regards.

--
Sig

Michael Klishin

unread,
Sep 15, 2014, 2:55:11 AM9/15/14
to rabbitm...@googlegroups.com, RabbitMQ User
On 15 September 2014 at 10:14:18, RabbitMQ User (wua...@gmail.com) wrote:
> Since there is no clustering implementation in Apache Apollo,
> so I wonder if I can use RabbitMQ instead of Apollo for load-balancing
> HA & scalability?

You can, RabbitMQ's distributed features (queue mirroring, federation) apply
to MQTT. There's one limitation you need to be aware of: RabbitMQ MQTT plugin
does not support QoS 2. But QoS 1 is sufficient for almost every use case
if you ask me.

So give it a try and take a look at

 * http://www.rabbitmq.com/mqtt.html 
 * http://www.rabbitmq.com/distributed.html
 * http://www.rabbitmq.com/ha.html
 * http://www.rabbitmq.com/partitions.html

If you have any questions about using e.g. queue mirroring with MQTT specifically,
simply ask on this list.
--
MK

Staff Software Engineer, Pivotal/RabbitMQ

Sigmund Lee

unread,
Sep 15, 2014, 9:46:25 PM9/15/14
to Michael Klishin, rabbitm...@googlegroups.com
Hi Michael,

Thank you so much for your prompt reply.

"Queue mirroring" means Highly Available Queues in RabbitMQ Clustering, is that right? If a queue has mirroring to all nodes of clustering(let me say 2 nodes), and consumer app need to subscribe to all nodes(because load-balancing), is there any change that we got two copied of original single message on the consumer side?


And, I am a little confused about the relationship between RabbitMQ application, node and broker since I am new to erlang, could you pls say more details about broker, node and application?


Appreciated.

--
Sig

Michael Klishin

unread,
Sep 16, 2014, 3:15:18 AM9/16/14
to Sigmund Lee, rabbitm...@googlegroups.com
On 16 September 2014 at 05:46:24, Sigmund Lee (wua...@gmail.com) wrote:
> "Queue mirroring" means Highly Available Queues in RabbitMQ
> Clustering, is that right? 

Correct.

> If a queue has mirroring to all nodes
> of clustering(let me say 2 nodes), and consumer app need to subscribe
> to all nodes(because load-balancing), is there any change that
> we got two copied of original single message on the consumer side?

Long story short, not under normal conditions (w/o network splits or
node failures) or unless one of the apps explicitly re-queues
a delivery (which is a feature MQTT lacks). These scenarios are
why QoS 2 ("exactly once delivery") won't work very well with clustering:
it would require cluster-wide coordination of deliveries.

Another way to answer this: a consumer is connected
to a single node, so without reconnecting it cannot possibly consume
anything from both queue master and a mirror. 

> And, I am a little confused about the relationship between RabbitMQ
> application, node and broker since I am new to erlang, could you
> pls say more details about broker, node and application?

Clients connect to the broker and work with it. The broker can be a single
node or a cluster of nodes (clients can work with any node in a cluster).
These are RabbitMQ terms.

An Erlang node is just a single Erlang VM. This is an Erlang term.

Sigmund Lee

unread,
Sep 19, 2014, 8:48:55 AM9/19/14
to Michael Klishin, rabbitm...@googlegroups.com
Hi MK,

Thanks again for your response, and sorry for my late reply.

 
Another way to answer this: a consumer is connected
to a single node, so without reconnecting it cannot possibly consume
anything from both queue master and a mirror. 


Is that means that both producers and consumers should point to the same load balancer, for example HAProxy, let HAProxy take care of all INPUT & OUTPUT trafic?


One more question:
Is Retained Message and Keep Alive (of CONNECT package) been supported in the letest release of RabbitMQ?






Sigmund Lee

unread,
Sep 19, 2014, 8:52:18 AM9/19/14
to Michael Klishin, rabbitm...@googlegroups.com
Hi MK,

Thanks again for your response, and sorry for my late reply.

 
Another way to answer this: a consumer is connected
to a single node, so without reconnecting it cannot possibly consume
anything from both queue master and a mirror. 


Is that means that both producers and consumers should point to the same load balancer, for example HAProxy, let HAProxy take care of all INPUT & OUTPUT traffic?

One more question:
Is Keep Alive (of CONNECT package) and Retained Message been supported in the letest release of RabbitMQ?


Regards.
--
Sig

Michael Klishin

unread,
Sep 19, 2014, 8:56:25 AM9/19/14
to Sigmund Lee, rabbitm...@googlegroups.com
On 19 September 2014 at 16:48:54, Sigmund Lee (wua...@gmail.com) wrote:
> Is that means that both producers and consumers should point
> to the same load balancer, for example HAProxy, let HAProxy take
> care of all INPUT & OUTPUT trafic?

Clients can connect to any node, RabbitMQ will take care of intra-cluster
traffic routing. However, if they do connect to the same node you will see
better throughput and lower latency because of data locality.

If you use HAproxy in front of a cluster, usually all clients connect to it.

> One more question:
> Is Retained Message and Keep Alive (of CONNECT package) been
> supported in the letest release of RabbitMQ?

Keepalive is supported, Retain is not. FWIW, I don't think I've seen Retain used
in the wild.

Sigmund Lee

unread,
Sep 23, 2014, 11:58:13 AM9/23/14
to Michael Klishin, rabbitm...@googlegroups.com
Hi MK,

1. When I said Keep Alive, I mean the Keep Alive flag of MQTT CONNECT package, as described in OASIS MQTT spec:

If the Keep Alive value is non-zero and the Server does not receive a Control Packet from the Client within one and a half times the Keep Alive time period, it MUST disconnect the Network Connection to the Client as if the network had failed

But I noticed that RabbitMQ doesn't close the underlying tcp connection even if we set the Keep Alive to non-zero value on client side, and no communication occur between client and Broker.

2. I noticed that the queues generated by RabbitMQ MQTT was named as mqtt-subscription-<clientId><qos-level>, does that means:
    2.1 Every android device should have a unique clientId?
    2.2 A particular android device should always use the same clientId to connect/reconnect to the Broker? otherwise the sticky-session messages of this client will lost.

Thanks in advance, MK.


Regards.
--
Sig





Michael Klishin

unread,
Sep 23, 2014, 2:09:07 PM9/23/14
to Sigmund Lee, rabbitm...@googlegroups.com
On 23 September 2014 at 19:58:18, Sigmund Lee (wua...@gmail.com) wrote:
> http://docs.oasis-open.org/mqtt/mqtt/v3.1.1/cos01/mqtt-v3.1.1-cos01.html#_Toc385349238
> If the Keep Alive value is non-zero and the Server does not receive
> a Control Packet from the Client within one and a half times the
> Keep Alive time period, it MUST disconnect the Network Connection
> to the Client as if the network had failed
>
> But I noticed that RabbitMQ doesn't close the underlying tcp
> connection even if we set the Keep Alive to non-zero value on client
> side, and no communication occur between client and Broker.

It should, and there should be messages about missed keep-alive frames
in the log. Please post your code.

> 2. I noticed that the queues generated by RabbitMQ MQTT was named
> as mqtt-subscription-, does that means:
> 2.1 Every android device should have a unique clientId?

Every client, period. If you have clients A and B using the same client-id,
and A connects first, then as soon as B connects and authenticates, A will
be disconnected.

Which is a known opinionated decision in MQTT and not specific to RabbitMQ.

> 2.2 A particular android device should always use the same clientId
> to connect/reconnect to the Broker? otherwise the sticky-session
> messages of this client will lost.

Correct. This is not RabbitMQ-specific.

Sigmund Lee

unread,
Oct 16, 2014, 9:22:36 AM10/16/14
to Michael Klishin, rabbitm...@googlegroups.com
Hi MK,

Sorry for late. Recently i'm working on our Push Notifications service.

Keep Alive of RabbitMQ MQTT plugin is work perfectly. It's my mistake, I once thought it's our responsibility that send PINGREQ to the server during Keep Alive interval, and did't notice Paho client already did it for us.

But I found out a issue really bug me when using MQTT plugin on a RabbitMQ cluster:
According to AMQP spec [MQTT-3.1.4-2]:
If the ClientId represents a Client already connected to the Server then the Server MUST disconnect the existing Client.

Reproducing(let's me say RabbitMQ cluster contain two nodes: nodeA & nodeB):
1. Using a clientId, for example abc, connect to nodeA. A queue named mqtt-subscription-abcqos0 will declared on the RabbitMQ.
2. Using this same clientId abc connect to nodeB. You will found that a new connection has established as consumer of mqtt-subscription-abcqos0, and the old one does not disconnected! thus queue mqtt-subscription-abcqos0 have two cousumers.

As we know network of devices is unstable. If a client reconnect a short time later and redirect to another node of RabbitMQ cluster by HAProxy that different with the last connected one, and the keep alive interval is big enough, abnormal termination of last connection hasn't even noticed by server yet, we will have the risk of lost message, because round-ribin algo will be used to dispatch message of queue to this two consumers.

Is this a bug, or am I just do it wrong?

Appreciated.
--
Sig

Michael Klishin

unread,
Oct 16, 2014, 2:59:16 PM10/16/14
to Sigmund Lee, rabbitm...@googlegroups.com
 On 16 October 2014 at 17:22:31, Sigmund Lee (wua...@gmail.com) wrote:
> Reproducing(let's me say RabbitMQ cluster contain two nodes:
> nodeA & nodeB):
> 1. Using a clientId, for example abc, connect to nodeA. A queue
> named mqtt-subscription-abcqos0 will declared on the RabbitMQ.
> 2. Using this same clientId abc connect to nodeB. You will found
> that a new connection has established as consumer of mqtt-subscription-abcqos0,
> and the old one does not disconnected! thus queue mqtt-subscription-abcqos0
> have two cousumers.

We will take a look at reproducing this after 3.4.0 is out (should be next week).

Sigmund Lee

unread,
Apr 7, 2015, 3:13:01 AM4/7/15
to Michael Klishin, rabbitm...@googlegroups.com
Hi MK,

Is this bug confirmed and fixed in the latest release?

Thanks in advance.

Bests.
--Sig

Michael Klishin

unread,
Apr 7, 2015, 9:19:18 AM4/7/15
to Sigmund Lee, rabbitm...@googlegroups.com
On 7 April 2015 at 10:13:00, Sigmund Lee (wua...@gmail.com) wrote:
> Is this bug confirmed and fixed in the latest release?

We have a change log you can consult :) 
http://www.rabbitmq.com/changelog.html

Emre gündoğdu

unread,
Dec 3, 2018, 9:59:00 AM12/3/18
to rabbitmq-users
Hi Michael,

We have the same We are using three node cluster.Our cluster consist of below properities
- Queue mirros
- HA SYNC ALL
- Queue Master locator Random.

rabbitmq_mqtt

SSL termination on ELB Load Balancer

We are using AWS Classic Load Balancer behind cluster.And our load balancer healt check as below.

Ping Target
TCP:1883
Timeout3 seconds
Interval30 seconds
Unhealthy threshold10
Healthy threshold


2





Broker configuration include below speficic parameter

mqtt.allow_anonymous = false

mqtt.subscription_ttl = 86400000
mqtt.tcp_listen_options.sndbuf = 16384
mqtt.tcp_listen_options.recbuf = 16384
mqtt.listeners.tcp.default = 1883
mqtt.listeners.ssl = none
mqtt.tcp_listen_options.backlog = 8192
mqtt.tcp_listen_options.nodelay = true
mqtt.tcp_listen_options.linger.timeout = 0
mqtt.tcp_listen_options.linger.on = true

num_acceptors.tcp = 50
vm_memory_high_watermark.relative = 0.7
vm_memory_high_watermark_paging_ratio = 0.5
hipe_compile = true

auth_backends.1 = internal
auth_backends.2 = cache
auth_cache.cached_backend = http
auth_http.http_method = post

Our case only occured when cluster node unavailable at same time.I think it is split brain.When i restart all nodes at same time and all cluster node rabbitmq application restart any reason my consumer that is a single consumer try to connect each node behind load balancer and sucessfully connect to the each individual nodes and is not disconnect after cluster peer completed.

When i restart all node at same time (systemctl restart rabbitmq-server) cluster peering complete  take time approximately 30-35 second.During this time consumer connect immediately to the each node as though not the realize these node only cluster.So consumer has more than one connection after cluster peering complete.

And than i send a few message topic to our consumer subscribe it a few message do not delivery on the stay queue unack state.

How can i prevent this situation ? 

7 Nisan 2015 Salı 16:19:18 UTC+3 tarihinde Michael Klishin yazdı:

Michael Klishin

unread,
Dec 3, 2018, 11:29:42 AM12/3/18
to rabbitm...@googlegroups.com
I'm afraid I don't understand the scenario. You haven't provided any evidence of a split brain scenario, e.g. server logs
or what exactly is the definition of "unavailable" in your system.

If your consumers connect to more than one node you will end up with effectively duplicate connections, in particular
when all nodes are restarted at the same time. I'm not sure what RabbitMQ can do about that as nodes can be booted in parallel
and they do not coordinate that, their handling of connection attempts is not coordinated and cannot be coordinated before they
form a cluster.

I suspect that you may be observing [1]. That issue is currently under considration for 3.8 as it will benefit from a new consensus protocol
adopted by RabbitMQ in that version for certain things.


--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Emre gündoğdu

unread,
Dec 3, 2018, 11:59:30 AM12/3/18
to rabbitmq-users
Hi Michael,

Thanks for your response.

We are using mqtt plugin both publish and consume side.
I think you understand to me correctly.We are using only one consumer and multiple publisher with mirroed queue rabbitmq cluser.I am using aws_peer_discovery plugin.And this case only occured when restart each node between short time like near by same time because i am changing rabbitmq conf file so i have to restart each node at same time to affect configuration changes .

And load balancer health check looking mqtt port tcp 1883 while i restart each node at same time.The consumer that behinde the load balancer to reconnect each node individual node count ( like if there is 2 node 2 consumer connection) at this time all publish messages published message did not consumed from consumer (like when i send 50 message to broker to the same topic which i consumed from my consumer but only 40 messages delivered to the my consumer 10 message remain on the queue with unack state.

I would be very pleased , I am wondering there is any workaround to solve this problem  because 3.8 is not yet release candidate now.It is still beta version so we can not upgrade our broker to that version soon.

Thanks
Emre

3 Aralık 2018 Pazartesi 19:29:42 UTC+3 tarihinde Michael Klishin yazdı:

Emre gündoğdu

unread,
Dec 3, 2018, 12:04:49 PM12/3/18
to rabbitmq-users
And also yes right i am observing 1. https://github.com/rabbitmq/rabbitmq-mqtt/issues/91 issue

3 Aralık 2018 Pazartesi 19:59:30 UTC+3 tarihinde Emre gündoğdu yazdı:

Michael Klishin

unread,
Dec 3, 2018, 12:23:02 PM12/3/18
to rabbitm...@googlegroups.com
Restarting all nodes at the same time WILL be problematic in just about
every distributed system. With 3.8 RabbitMQ clusters will only support losing the minority of nodes (less than half).

It's easy to see how during a cluster-wide restart would result in some published messages to be lost. AMQP 0-9-1 publishers have
a way of knowing if their messages were handled [1], for example, with MQTT the confirmation flow is more opaque.

I don't know what may be going on with unacknowledged messages but as [1] covers, they will be requeued on connection closure.

It also matters what acknowledgement mode is used (QoS).
Applications/clients are just as responsible for data safety as the broker.

Emre gündoğdu

unread,
Dec 7, 2018, 2:59:15 AM12/7/18
to rabbitmq-users
Hi Michael,

Thanks for your response.We are using mqtt QoS = 1 and yes right when i close one of the duplicated consumer connection published message requeued and delivered successfully.

I have also observed  during the cluster which has two node restart the duplicated consumer connection that are two connection with one consumer to the each individual node are different mqtt version of them has shown  mqtt 3.1.1 and another one has mqtt 3.1.0 why are they ? Do you have any idea ?

Thanks
Emre

3 Aralık 2018 Pazartesi 20:23:02 UTC+3 tarihinde Michael Klishin yazdı:

Michael Klishin

unread,
Dec 7, 2018, 7:04:17 AM12/7/18
to rabbitm...@googlegroups.com
I don't really understand the question. There aren't many differences between MQTT 3.1.1 and 3.1.0.

Per MQTT 3.1.x spec two connections with identical client ID cannot coexist. The broker must disconnect the older one (IIRC).
This is not version-specific in any way.

Some clients can implement 3.1.1, others 3.1.0. Consider sharing more evidence.
Reply all
Reply to author
Forward
0 new messages