rabbitmq server Mnesia backup and restore

jayashree gn

unread,

Oct 6, 2014, 1:44:25 PM10/6/14

to rabbitm...@googlegroups.com, si...@rabbitmq.com

Hi All,

I have been trying to get the rabbitMQ databases to backup and restore.

I went through couple of messages in the forum where they were pointers to use hpadmin and management api.

But i have failed to restore the mnesia folder itself because the rabbitmq node names are different from the once i backedup.. and renaming the mnesia folder made this worst for the service to start.

Is there a better and reliable method sing which i can backup my existing cluster and restore it to a new one?

I have been struggling with this for a while now, any help will be greatly appreciated.

Thanks!

jayashree gn

unread,

Oct 10, 2014, 5:25:36 PM10/10/14

to rabbitm...@googlegroups.com

Hi all,

i was able to get the server definitions backed up and restored to a new cluster.. but stuck at getting the messages to restore.

Any ideas?? Any pointers will help as i am running out of options?

I tried to backup the mnesia folder and restore the msg_store_persistent, but everytime i restart the broker, the folder is wiped off.

Anyone able to restore the messages to a new cluster?

Thanks in advance!!!

Jayashree

Michael Klishin

unread,

Oct 11, 2014, 5:19:18 AM10/11/14

to jayashree gn, rabbitm...@googlegroups.com

On 11 October 2014 at 01:25:42, jayashree gn (jaish...@gmail.com) wrote:
> i was able to get the server definitions backed up and restored
> to a new cluster.. but stuck at getting the messages to restore.
> Any ideas?? Any pointers will help as i am running out of options?
> I tried to backup the mnesia folder and restore the msg_store_persistent,
> but everytime i restart the broker, the folder is wiped off.
>
> Anyone able to restore the messages to a new cluster?

The node that you restore has to be the "first" node in the cluster (others join it)
because that node's state of the world (internal database state, which maintains the list of
queues) will be the initial state "adopted" by other nodes.
--
MK

Staff Software Engineer, Pivotal/RabbitMQ

jayashree gn

unread,

Oct 13, 2014, 11:24:31 AM10/13/14

to Michael Klishin, rabbitm...@googlegroups.com

Thanks for the response!

But doing stop_app on all cluster nodes, and trying to get other nodes to join the first doesnt work every well.

Scenario:

1. Stop_all all the cluster_nodes )rabbitmqctl stop_app)

2. on the first node, restored the messages - tried to start the node. rabbitmqctl start_app

3. the first node fails to start because, the other nodes are stopped. And doesn't assume to be down.

At this point, i am running out of ideas.. any help with this would help taking this to production! thanks in advance...!!

Error.

Log files (may contain more information):

/var/log/rabbitmq/rab...@ec2-54-88-219-40.log

/var/log/rabbitmq/rab...@ec2-54-88-219-40-sasl.log

=INFO REPORT==== 13-Oct-2014::15:17:23 ===

Starting RabbitMQ 3.3.1 on Erlang R14B04

Licensed under the MPL. See http://www.rabbitmq.com/

=INFO REPORT==== 13-Oct-2014::15:17:23 ===

node : rabbit@ec2-54-88-219-40

home dir : /var/lib/rabbitmq

config file(s) : /etc/rabbitmq/rabbitmq.config

cookie hash : AXRbPCxp7gfZ+OAqKZijEA==

log : /var/log/rabbitmq/rab...@ec2-54-88-219-40.log

sasl log : /var/log/rabbitmq/rab...@ec2-54-88-219-40-sasl.log

database dir : /var/lib/rabbitmq/mnesia/rabbit@ec2-54-88-219-40

=INFO REPORT==== 13-Oct-2014::15:17:24 ===

Limiting to approx 924 file handles (829 sockets)

=INFO REPORT==== 13-Oct-2014::15:17:24 ===

Error description:

{badmatch,{error,mnesia_not_running}}

Log files (may contain more information):

/var/log/rabbitmq/rab...@ec2-54-88-219-40.log

/var/log/rabbitmq/rab...@ec2-54-88-219-40-sasl.log

Stack trace:

[{rabbit_mnesia,init_from_config,0},

{rabbit_mnesia,init,0},

{rabbit,'-run_boot_step/1-lc$^1/1-1-',1},

{rabbit,run_boot_step,1},

{rabbit,'-start/2-lc$^0/1-0-',1},

{rabbit,start,2},

{application_master,start_it_old,4}]

=INFO REPORT==== 13-Oct-2014::15:17:25 ===

Error description:

{could_not_start,rabbit,

{bad_return,

{{rabbit,start,[normal,[]]},

{'EXIT',

{rabbit,failure_during_boot,

{badmatch,{error,mnesia_not_running}}}}}}}

Log files (may contain more information):

/var/log/rabbitmq/rab...@ec2-54-88-219-40.log

/var/log/rabbitmq/rab...@ec2-54-88-219-40-sasl.log

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send an email to rabbitm...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Michael Klishin

unread,

Oct 13, 2014, 11:48:27 AM10/13/14

to jayashree gn, rabbitm...@googlegroups.com

On 13 October 2014 at 19:24:35, jayashree gn (jaish...@gmail.com) wrote:
> But doing stop_app on all cluster nodes, and trying to get other
> nodes to join the first doesnt work every well.
>
> Scenario:
> 1. Stop_all all the cluster_nodes )rabbitmqctl stop_app)
> 2. on the first node, restored the messages - tried to start the
> node. rabbitmqctl start_app
> 3. the first node fails to start because, the other nodes are stopped.
> And doesn't assume to be down.

* Stop all but 1 nodes, back up its DB directory.
* Restore it elsewhere
* Make more nodes join it

This is not really different from, say, doing a cluster upgrade, as far
as the list of actions involved goes.

jayashree gn

unread,

Oct 13, 2014, 12:17:54 PM10/13/14

to Michael Klishin, rabbitm...@googlegroups.com

I have been struggling with backing up its DB directory because, the hostnames are not same from one cluster to another.

ex: the cluster nodes are something like this rabbit@ec2-54-88-219-40. So the mnesia folder restore fails the service to start on a new cluster, because the host names are not same as the old cluster/backed up sever names

And, so i am trying to backup the definitions and queues separatly that way i don't have to deal a whole lot with mnesia folder itself.

But restoring messages has not been working.

Any other ways on restoring messages from one cluster to other?

Thank you!

Michael Klishin

unread,

Oct 13, 2014, 2:26:41 PM10/13/14

to jayashree gn, Michael Klishin, rabbitm...@googlegroups.com

Node's own hostname should only be used in the directory name, so rename it before starting the seed node in the new cluster.

MK

jayashree gn

unread,

Oct 13, 2014, 3:20:07 PM10/13/14

to Michael Klishin, rabbitm...@googlegroups.com

I have never got that to work. i have tried that in every possible wa, ir, updating the cluster_config file, permissions, renaming the folders under mnesia.

I run into this error when i start the service. It tried to connect to nodes of the backup cluster nodes - updating the cluster_config file in the mnesia folder did not fix this.

DIAGNOSTICS

===========

attempted to contact: ['rabbit@ec2-54-86-170-222','rabbit@ec2-54-86-182-49',

'rabbit@ec2-54-85-11-21']

rabbit@ec2-54-86-170-222:

* unable to connect to epmd (port 4369) on ec2-54-86-170-222: nxdomain (non-existing domain)

rabbit@ec2-54-86-182-49:

* unable to connect to epmd (port 4369) on ec2-54-86-182-49: nxdomain (non-existing domain)

rabbit@ec2-54-85-11-21:

* unable to connect to epmd (port 4369) on ec2-54-85-11-21: nxdomain (non-existing domain)

current node details:

- node name: 'rabbit@ec2-54-172-139-243'

- home dir: /var/lib/rabbitmq

- cookie hash: AXRbPCxp7gfZ+OAqKZijEA==

=INFO REPORT==== 13-Oct-2014::19:17:13 ===

Error description:

{could_not_start,rabbit,

{bad_return,

{{rabbit,start,[normal,[]]},

{'EXIT',

{rabbit,failure_during_boot,

{error,

{timeout_waiting_for_tables,

[rabbit_user,rabbit_user_permission,rabbit_vhost,

rabbit_listener,rabbit_durable_route,

rabbit_semi_durable_route,rabbit_route,

rabbit_reverse_route,rabbit_topic_trie_node,

rabbit_topic_trie_edge,rabbit_topic_trie_binding,

rabbit_durable_exchange,rabbit_exchange,

rabbit_exchange_serial,rabbit_runtime_parameters,

rabbit_durable_queue,rabbit_queue,gm_group,

mirrored_sup_childspec]}}}}}}}

Log files (may contain more information):

/var/log/rabbitmq/rab...@ec2-54-172-139-243.log

/var/log/rabbitmq/rab...@ec2-54-172-139-243-sasl.log

Michael Klishin

unread,

Oct 13, 2014, 5:26:53 PM10/13/14

to jayashree gn, rabbitm...@googlegroups.com

Apparently hostnames are used in the schema. The best way to proceed is to have a stand-by cluster with all of its exchanges federated and having TTL of however many hours or days works for you.

A separate back-up tool is indeed a missing piece of functionality at the moment.

MK

Reply all

Reply to author

Forward