RabbitMQ make cluster Questions

Alexey Makarov

unread,

Aug 30, 2014, 5:24:39 AM8/30/14

to rabbitm...@googlegroups.com

Hi. it happend than i need to make a rabbitmq HA cluster, with good throughput, and maybe with loadbalancing haproxy. Main task is to prevent data loss. What i made?
I install two nodes on separate machines, make cluster with mirrored queues. So messages from first node sync on second node. And when node down, messages available on the second node. So the questions.
1) Is mirror queues enough for save data when some node will fall down? Or what can i do, to save iformation?
2) If mirror queues is on, so information on hard drive of first node, must be replicate on the second node automaticaly? Or no? My second node database folder is empty, but if master falls, messages are still on the second node, and i dont know where are they store.
3)Is there any client for viewing Rabbitmq database? Like aqua? I found an erlang observer, but dont know how to use it.
4) Load balancing i'm planning to make with haproxy on frontend, for example if i push 8000 messages, they must distribute on two nodes? Like 4000 on 1st node and other on the second? Because when i'm sending messages on my nodes i see 8000 messages on each node.
Hope you'll help me. Thank you.

Michael Klishin

unread,

Aug 30, 2014, 7:33:09 AM8/30/14

to rabbitm...@googlegroups.com, Alexey Makarov

On 30 August 2014 at 13:24:45, Alexey Makarov (zip...@gmail.com) wrote:
> > 1) Is mirror queues enough for save data when some node will fall
> down? Or what can i do, to save iformation?

You can "lose messages" (quoted because the details may vary but people tend
to put the same label on every possible issue) at several stages:

* Before they reach RabbitMQ
* At routing
* When RabbitMQ restarts
* During a network split between nodes
* When messages are delivered to consumers

Most of which are covered in [3].

So, you need to

* Use publisher confirms, or possibly have a write-ahead log in the client (which may go down before having a chance to send a message and receive a confirm)
* Either ensure your message route somewhere or publish as mandatory + handle returns
* Use durable, non-exclusive queues *and* publish messages as persistent
* Use queue mirroring
* Pick your poison [1][2] when it comes to how exactly network splits are handled
* Use manual confirmations with consumers and be careful to only ack the deliveries that have been fully processed

All of these features affect overall throughput, which is often at odds with
replication and extra confirmations.

> 2) If mirror queues is on, so information on hard drive of first
> node, must be replicate on the second node automaticaly? Or no?
> My second node database folder is empty, but if master falls,
> messages are still on the second node, and i dont know where are
> they store.

Even if you start a node as RAM node, there should be something in its
database directory. So either you are not looking in the correct location [3],
or something is wrong with the node.

> 3)Is there any client for viewing Rabbitmq database? Like aqua?
> I found an erlang observer, but dont know how to use it

The observer app from OTP inspects Erlang runtime and processes.
RabbitMQ queues are generally not "listable", because currently retrieving
a message with basic.get may affect queue ordering.

That said, the on-disk format is not particularly complex, so developing
such a tool and using it on a hot standby (e.g. a spare federated cluster) should be possible.

> 4) Load balancing i'm planning to make with haproxy on frontend,
> for example if i push 8000 messages, they must distribute on two
> nodes?

RabbitMQ connections are long lived. If you use HAproxy with it (which is very common
and some Pivotal products do, too), your connection "sticks" to a node.
So HAproxy helps you distribute connections, not messages. In addition to this, to guarantee
operation ordering, every RabbitMQ queue has a master node in the cluster (note: queue have
masters, not RabbitMQ clusters!). All operations on a queue first go through the master
and are then replicated to mirrors, if any.

Which is why there are several partition handling schemes: when a node that hosted some
masters becomes disconnected from other nodes, it can handle it in multiple ways,
e.g. by refusing to accept publishes and connections as soon as possible.

A related subject is data locality: when you publish to a queue with master on node A,
if your consumers are on node B, RabbitMQ has to route extra traffic from A to B. If your
consumer is also connected to A, no extra intra-cluster transfers are needed.

Keeping this in mind can help you scale your overall cluster throughput to pretty
non-trivial values:
http://googlecloudplatform.blogspot.ru/2014/06/rabbitmq-on-google-compute-engine.html

> Like 4000 on 1st node and other on the second? Because when
> i'm sending messages on my nodes i see 8000 messages on each node.

Don't you think that sending 4000 messages to a cluster of 2 nodes with all queues
mirrored should result in both nodes eventually (with a really short delay) having
4000 messages each?

Finally, something else you should know: when a publisher receives a confirm for message
M, this means that

* For persistent messages routed to durable queue(s), the message was pushed to disk
* For queues that have mirrors, M was also replicated to all mirrors

But before you get too excited about this: using publisher confirms on a per-message basis
(publish, wait for confirm, publish another one, wait again, and so on) will ruin your
throughput so for decent throughput you want to publish in batches, and re-deliver
in batches when you detect a confirm for at least one message in the batch is missing.

HTH.

1. http://www.rabbitmq.com/clustering.html
2. http://www.rabbitmq.com/partitions.html
3. https://www.rabbitmq.com/reliability.html
4. https://www.rabbitmq.com/relocate.html
--
MK

Staff Software Engineer, Pivotal/RabbitMQ

Michael Klishin

unread,

Aug 30, 2014, 8:33:27 AM8/30/14

to Alexey Makarov, rabbitm...@googlegroups.com

+rabbitmq-users

On 30 August 2014 at 16:27:40, Alexey Makarov (zip...@gmail.com) wrote:
> > Is it true than if i create a rabbitmq config file, or rabbit-env.config,
> it wont work until i reinstall rabbit?

You need to restart RabbitMQ when the configuration changes.

Michael Klishin

unread,

Sep 16, 2014, 10:29:00 AM9/16/14

to Alexey Makarov, rabbitm...@googlegroups.com

+rabbitmq-users

On 16 September 2014 at 17:39:21, Alexey Makarov (zip...@gmail.com) wrote:
> Hi again, another one question. Is it possible, to connect the
> client (Java client) without login, and password and send messages
> to the queue?? I see a python pika client, which connect without
> authorization info, only broker ip. But i need this on java

Alexey,

Would you mind starting new threads for new questions in the future? Bonus
points for starting them on the list instead of emailing me directly .

Pika uses default credentials and vhost (guest/guest, /). Java client does
the same. This is by no means "authenticating without login and password", just
relying on all defaults.

Unlike Pika, Java client also supports other SASL authentication mechanisms,
e.g. x509 certificates.

https://www.rabbitmq.com/authentication.html

Reply all

Reply to author

Forward