[rabbitmq-discuss] Disk node vs ram node

237 views
Skip to first unread message

vinse

unread,
Aug 5, 2010, 2:08:47 PM8/5/10
to rabbitmq...@lists.rabbitmq.com

So I know from an instructions point of view how to set up a clustered
environment with disk nodes and ram nodes, but from the documentation/man
pages I am still unclear of a couple of things.
1. Assuming I have 3 nodes, node A, B, and C. Does node A become a disk
node by default when I set up the vhosts/users/permissions?

2. With the same node arrangement listed above.... I set up A(created vhost,
users, set permissions etc) and then clustered B against A,B (rabbitmqctl
cluster nodeA nodeB). I took down A and my messages did not persist even
though B should be a disk node. Is there something wrong with my steps?

3. What is the conventional/non-conventional way to determine whether or not
individual nodes in a cluster are a disk or ram nodes?

Thanks.

--
View this message in context: http://old.nabble.com/Disk-node-vs-ram-node-tp29356379p29356379.html
Sent from the RabbitMQ mailing list archive at Nabble.com.

_______________________________________________
rabbitmq-discuss mailing list
rabbitmq...@lists.rabbitmq.com
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss

Alexandru Scvortov

unread,
Aug 5, 2010, 3:43:09 PM8/5/10
to vinse, rabbitmq...@lists.rabbitmq.com
> 1. Assuming I have 3 nodes, node A, B, and C. Does node A become a disk
> node by default when I set up the vhosts/users/permissions?

The disk nodes are the ones explicitly listed in the "rabbitmqctl
cluster" command.

So,
% rabbitmqctl cluster rabbit@A rabbit@B
results in a cluster where A and B are disk nodes. If that command is
run on either A or B, the cluster has just the two nodes.
If, on the other hand, that command is run on C, you get a cluster with
A and B disc nodes (same as above) and C as a ram node.

> 2. With the same node arrangement listed above.... I set up A(created vhost,
> users, set permissions etc) and then clustered B against A,B (rabbitmqctl
> cluster nodeA nodeB). I took down A and my messages did not persist even
> though B should be a disk node. Is there something wrong with my steps?

Queues exist physically only on the node on which they were declared.
Clustering just makes them (and exchanges) accesible transparently on
all of the nodes.

Thus, if the queue was declared on A, and A was taken down, the messages
stored on the queue would become inaccesible.

If you want to never lose messages, have a look at our high-availability
clustering guide in the documentation section of the website.

> 3. What is the conventional/non-conventional way to determine whether or not
> individual nodes in a cluster are a disk or ram nodes?

To find out whether nodes are disc or ram, run
% rabbitmqctl status
on any node in the cluster.

Unfortunately, due to a bug, that command will not report the correct
node type, so it's not of much help to you now. This problem has been
fixed in our default branch and will be in the release.

Cheers,
Alex

Vince Chang

unread,
Aug 5, 2010, 4:49:51 PM8/5/10
to Alexandru Scvortov, rabbitmq...@lists.rabbitmq.com
Thanks for your response Alex,

1. So if I start node A by itself, and then cluster B against A, then A becomes a disk node because I specified A in the rabbitmqctl command? Here are the steps

Node1:
start_app
add_vhost host1
add_user user pass

Node2:
stop_app
reset
cluster node1 <--- (is this the step that makes node A a disk node?)
start_app

Node3:
stop_app
reset
cluster node1
start_app

2. To be clear, if I send a message that creates the queue Q1 in node A in a cluster with node A, node B, and node C and node A goes down, then I will lose those messages on node A. Node A is down, but B and C should still be alive. If I send the same message to be queued on Q1, will Q1 be created on node B and node C? For me so far, that is not the case. When I send a message to be queued on Q1 (where the messages on Q1 on node A are now lost because node A is down) I do not see the queue getting created on either node B or node C. Is there something wrong with my steps?

3. I am using RabbitMQ 1.7.2. Just to be clear there is _no way_ on 1.7.2 to determine whether or not a node is a disk node or ram node?

Alexandru Scvortov

unread,
Aug 5, 2010, 5:38:00 PM8/5/10
to Vince Chang, rabbitmq...@lists.rabbitmq.com
On Thu, Aug 05, 2010 at 01:49:51PM -0700, Vince Chang wrote:
> 1. So if I start node A by itself, and then cluster B against A, then A becomes a disk node because I specified A in the rabbitmqctl command? Here are the steps
>
> Node1:
> start_app
> add_vhost host1
> add_user user pass
>
> Node2:
> stop_app
> reset
> cluster node1 <--- (is this the step that makes node A a disk node?)
> start_app
>
> Node3:
> stop_app
> reset
> cluster node1
> start_app

I've just tried that on default.

All the nodes start up as disc nodes. At that point, you turn *Node2*
into a ram node. In the third group of commands, you also turn *Node3*
into a ram node.

So, the sequence is:
1. Start Node1 as a disc node
2. Cluster Node2 with Node1; Node2 becomes a ram node
3. Cluster Node3 with Node1; Node3 becomes a ram node
At the end, Node1 is disc; Node2 and Node3 are ram

> 2. To be clear, if I send a message that creates the queue Q1 in node A in a cluster with node A, node B, and node C and node A goes down, then I will lose those messages on node A. Node A is down, but B and C should still be alive. If I send the same message to be queued on Q1, will Q1 be created on node B and node C? For me so far, that is not the case. When I send a message to be queued on Q1 (where the messages on Q1 on node A are now lost because node A is down) I do not see the queue getting created on either node B or node C. Is there something wrong with my steps?

I'm not quite sure what you mean by ``message that creates the queue''.

If you declare a queue Q1 on node A with queue.declare, then Q1 will
reside on node A. If node A goes down, all the messages on its queues
(including Q1) will become inaccessible. If Q1 was declared as durable,
then the messages published as persistent on it will become available
when node A comes back online.

In a cluster, when a node goes down, its queues become inaccesbile.
They are *not* automatically created on other nodes.

Incidentally, which client are you using?

> 3. I am using RabbitMQ 1.7.2. Just to be clear there is _no way_ on 1.7.2 to determine whether or not a node is a disk node or ram node?

There's no way to do it with the provided tools.

If you're not squeamish about running a bit of Erlang, you could do
this:

% erl -sname mgmtsh -remsh rabbit@localhost

This gives you a remote Erlang shell named mgmtsh, connected to
rabbit@localhost

Then, running this should give you the disc nodes:
(rabbit@localhost)1> mnesia:table_info(rabbit_durable_exchange, disc_copies).

And running this should give you the ram nodes:
(rabbit@localhost)2> mnesia:table_info(rabbit_durable_exchange, ram_copies).

Vince Chang

unread,
Aug 5, 2010, 5:56:55 PM8/5/10
to Alexandru Scvortov, rabbitmq...@lists.rabbitmq.com
Thanks for your answers!

#2. I mean that if I send a message that is meant for Q1, where Q1 was originally on node A but now is down because node A is down, will the queue be created on the other nodes B,C? It looks like your answer is no, just to confirm? I am using the 1.7.2 Java client. Is there a way to set up the cluster such that a queue (like Q1 in our case) will be created(automatically) on other nodes (such as node B and node C) if node A goes down? What happens when I make either (node B or node C) or both (node B and node C) disk nodes?

#3. Not sure if I should do anything before hand, but I ran the command you provided and got this output:
$ erl -sname mgmtsh -remsh rabbit@localhost
Erlang R13B03 (erts-5.7.4) [source] [smp:4:4] [rq:4] [async-threads:0] [hipe] [kernel-poll:false]

*** ERROR: Shell process terminated! (^G to start new job) ***
{error_logger,{{2010,8,5},{14,52,59}},"~s~n",["Error in process <0.33.0> on node 'mgmtsh@ebony' with exit value: {badarg,[{erlang,list_to_existing_atom,[\"rabbit@ubuntu\"]},{dist_util,recv_challenge,1},{dist_util,handshake_we_started,1}]}\n"]}

=ERROR REPORT==== 5-Aug-2010::14:52:59 ===
Error in process <0.33.0> on node 'mgmtsh@ebony' with exit value: {badarg,[{erlang,list_to_existing_atom,["rabbit@ubuntu"]},{dist_util,recv_challenge,1},{dist_util,handshake_we_started,1}]}

FYI: I'm running Ubuntu and installed rabbitMQ 1.7.2 with sudo apt-get install rabbitmq-server.

-----Original Message-----
From: Alexandru Scvorţov [mailto:scv...@gmail.com] On Behalf Of Alexandru Scvortov
Sent: Thursday, August 05, 2010 2:38 PM
To: Vince Chang
Cc: rabbitmq...@lists.rabbitmq.com
Subject: Re: [rabbitmq-discuss] Disk node vs ram node

Alexandru Scvortov

unread,
Aug 5, 2010, 7:07:03 PM8/5/10
to rabbitmq...@lists.rabbitmq.com
> #2. I mean that if I send a message that is meant for Q1, where Q1 was originally on node A but now is down because node A is down, will the queue be created on the other nodes B,C? It looks like your answer is no, just to confirm? I am using the 1.7.2 Java client. Is there a way to set up the cluster such that a queue (like Q1 in our case) will be created(automatically) on other nodes (such as node B and node C) if node A goes down? What happens when I make either (node B or node C) or both (node B and node C) disk nodes?

There's no automatic way to create missing queues inside a cluster in
RabbitMQ, as far as I know.

Whether a node is disc or ram refers to where the broker metadata
(queues, exchanges, bindings) is kept. Persistent messages are always
stored on disk. When a ram node goes down, it looses all its metadata.
If it's in a cluster, it will just connect to some other node and
recover the information. If the entire cluster goes down (say, power
outage), the metadata can only be recovered if at least one node was
disc. In general, there's not much sense in having a lot of disc nodes.
One or two should be enough. Ram nodes are faster.

> #3. Not sure if I should do anything before hand, but I ran the command you provided and got this output:
> $ erl -sname mgmtsh -remsh rabbit@localhost
> Erlang R13B03 (erts-5.7.4) [source] [smp:4:4] [rq:4] [async-threads:0] [hipe] [kernel-poll:false]
>
> *** ERROR: Shell process terminated! (^G to start new job) ***
> {error_logger,{{2010,8,5},{14,52,59}},"~s~n",["Error in process <0.33.0> on node 'mgmtsh@ebony' with exit value: {badarg,[{erlang,list_to_existing_atom,[\"rabbit@ubuntu\"]},{dist_util,recv_challenge,1},{dist_util,handshake_we_started,1}]}\n"]}
>
> =ERROR REPORT==== 5-Aug-2010::14:52:59 ===
> Error in process <0.33.0> on node 'mgmtsh@ebony' with exit value: {badarg,[{erlang,list_to_existing_atom,["rabbit@ubuntu"]},{dist_util,recv_challenge,1},{dist_util,handshake_we_started,1}]}
>
> FYI: I'm running Ubuntu and installed rabbitMQ 1.7.2 with sudo apt-get install rabbitmq-server.

I assumed the rabbit@localhost was your node. Try rabbit@ebony.

Dave Greggory

unread,
Aug 11, 2010, 9:17:40 AM8/11/10
to Alexandru Scvortov, rabbitmq...@lists.rabbitmq.com
I'm a little concerned regarding what happens when a node that contains a certain queue in a cluster goes down.

On Aug 5, 2010, at 5:38 PM, Alexandru Scvortov <alex...@rabbitmq.com> wrote:

> If you declare a queue Q1 on node A with queue.declare, then Q1 will
> reside on node A. If node A goes down, all the messages on its queues
> (including Q1) will become inaccessible. If Q1 was declared as durable,
> then the messages published as persistent on it will become available
> when node A comes back online.
>
> In a cluster, when a node goes down, its queues become inaccesbile.
> They are *not* automatically created on other nodes.

I'm using the java client (1.8.1) and I automatically reconnect my producers/consumers to another node when a node goes down. I do this by setting up a shutdown listener hook on the connection.

When a node with a queue goes down, can I write something using the java client that re-declares all known queues on the other node. May be have all the queue names(and params) registered somewhere in the app and upon reconnect, passively re-declare them. If passive declaration throws an exception, catch that and do an actual declaration. Would that work?

If that works, then what happens if we bring up the node that went down that had the queue originally? Would it play nice?

I'd like to find a solution without resorting to using the pacemaker approach.

This is important not just for node failures but for other things as upgrading or reconfiguring rabbit without affecting any clients. (our rabbits sit behind a round-robin load balancer).

Dave Greggory

unread,
Aug 12, 2010, 8:55:50 AM8/12/10
to rabbitmq...@lists.rabbitmq.com
Maybe no one saw my questions below due to their placement in the original email. Anybody?

> When a node with a queue goes down, can I write something using the java client that re-declares all known queues on the other node. May be have all the queue names(and params) registered somewhere in the app and upon reconnect, passively re-declare them. If passive declaration throws an exception, catch that and do an actual declaration. Would that work?
>
> If that works, then what happens if we bring up the node that went down that had the queue originally? Would it play nice?

Alexandru Scvorţov

unread,
Aug 12, 2010, 9:33:54 AM8/12/10
to Dave Greggory, rabbitmq...@lists.rabbitmq.com
Hi Dave,

> When a node with a queue goes down, can I write something using the java client that re-declares all known queues on the other node. May be have all the queue names(and params) registered somewhere in the app and upon reconnect, passively re-declare them. If passive declaration throws an exception, catch that and do an actual declaration. Would that work?

If you:
1) declare a durable queue on one of the clustered nodes,
2) take down that node,
3) declare the same queue on another node
you get a "404: Not Found".

If it's not durable, sure.

As an aside, if you passive declare something, then immediately declare
it, the first step is redundant. There's no problem with declaring
the same thing repeatedly. (assuming it's declared in exactly the
same way and it's not unreachable)

Cheers,
Alex

Dave Greggory

unread,
Aug 13, 2010, 10:36:48 AM8/13/10
to rabbitmq...@lists.rabbitmq.com

Ouch... so no way to ensure that there's something to catch messages when the
node with the original durable queue goes down?

There are several mentions throughout the site about out-of-the-box live
failover in future release, how far out is that? 6 months? 1 year?

Not a big fan of the pacemaker approach due to the number of moving parts and
different packages (pacemaker/corosync/heartbeat/erbd) to maintain and manage.
And also haven't seen any indication of people actually using that setup in a
complex VM environment in production.


----- Original Message ----
From: Alexandru Scvorţov <alex...@rabbitmq.com>
To: Dave Greggory <davegr...@yahoo.com>
Cc: "rabbitmq...@lists.rabbitmq.com" <rabbitmq...@lists.rabbitmq.com>
Sent: Thu, August 12, 2010 9:33:54 AM
Subject: Re: [rabbitmq-discuss] Disk node vs ram node

Alexis Richardson

unread,
Aug 13, 2010, 10:40:14 AM8/13/10
to Dave Greggory, rabbitmq...@lists.rabbitmq.com
Dave

One of the things on our todo list, as you can imagine, is to
understand how to best provide HA in a VM based setting. Some VM
technologies provide useful failover / snapshot capability...

alexis

Dave Greggory

unread,
Aug 13, 2010, 12:45:39 PM8/13/10
to Alexis Richardson, rabbitmq...@lists.rabbitmq.com
I understand, look forward to hearing about your plans for HA.
Reply all
Reply to author
Forward
0 new messages