"msg_store_persistent: rebuilding indices from scratch" is taking a lot of time

christophe molliere

unread,

Apr 29, 2015, 11:36:50 AM4/29/15

to rabbitm...@googlegroups.com

We had to restart a server after a reboot of the Virtual machine the Rabbitmq server is working on.
The server was properly shutdown :

and then at the restart

=INFO REPORT==== 29-Apr-2015::13:56:56 ===
Starting RabbitMQ 3.4.2 on Erlang 17.1
Copyright (C) 2007-2014 GoPivotal, Inc.
Licensed under the MPL. See http://www.rabbitmq.com/

=INFO REPORT==== 29-Apr-2015::13:56:56 ===
node           : nodename@machinename
home dir       : /home/rabbitmq
config file(s) : /PATHTOCONFIG/rabbitmq/rabbitmq.config
cookie hash    : UHdCqMlg8zjiF0qecJzwrQ==
log            : /PATHTOTHELOG/LOGNAME.log
sasl log       : /PATHTOTHELOG/LOGNAME-sasl.log
database dir   : /PATHTODATABASE

=INFO REPORT==== 29-Apr-2015::13:56:57 ===
Limiting to approx 64900 file handles (58408 sockets)

=INFO REPORT==== 29-Apr-2015::13:58:52 ===
Memory limit set to 13994MB of 19991MB total.

=INFO REPORT==== 29-Apr-2015::13:58:52 ===
Disk free limit set to 50MB

=INFO REPORT==== 29-Apr-2015::13:58:52 ===
msg_store_transient: using rabbit_msg_store_ets_index to provide index

=INFO REPORT==== 29-Apr-2015::13:58:52 ===
msg_store_persistent: using rabbit_msg_store_ets_index to provide index

=WARNING REPORT==== 29-Apr-2015::13:58:52 ===
msg_store_persistent: rebuilding indices from scratch

This step tooks 1 hour and 30 minutes to complete before to complete the start :
=INFO REPORT==== 29-Apr-2015::15:26:13 ===
started TCP Listener on 0.0.0.0:51012

=INFO REPORT==== 29-Apr-2015::15:26:14 ===
Management plugin started. Port: 52012

=INFO REPORT==== 29-Apr-2015::15:26:14 ===
Server startup complete; 8 plugins started.
* rabbitmq_management
* rabbitmq_web_dispatch
* webmachine
* mochiweb
* rabbitmq_auth_backend_ldap
* rabbitmq_management_agent
* amqp_client
* eldap

Could it be possible to avoid such "rebuilding indices from scratch" how to make sure this is not done each time we have to stop and start the server ?

Many thanks in advance

Michael Klishin

unread,

Apr 29, 2015, 11:49:07 AM4/29/15

to rabbitm...@googlegroups.com, christophe molliere

On 29 April 2015 at 18:36:51, christophe molliere (momole...@gmail.com) wrote:
> Could it be possible to avoid such "rebuilding indices from
> scratch" how to make sure this is not done each time we have to stop
> and start the server ?

Christophe,

Indices are only rebuilt after a non-clean shutdown (e.g. via kill or kill -9).
So simply shut down the server gracefully. How do you do that at the moment?
--
MK

Staff Software Engineer, Pivotal/RabbitMQ

christophe molliere

unread,

Apr 29, 2015, 12:07:09 PM4/29/15

to Michael Klishin, rabbitm...@googlegroups.com

thanks for your answer :

I simply run the following :

rabbitmqctl stop $RABBITMQ_PID_FILE

more shutdown_log
Stopping and halting node 'nodename@machinename' ...

should I use "

rabbitmq-server stop" command ?

Michael Klishin

unread,

Apr 29, 2015, 3:06:52 PM4/29/15

to christophe molliere, rabbitm...@googlegroups.com

On 29 April 2015 at 19:07:06, christophe molliere (momole...@gmail.com) wrote:
> I simply run the following :
>
> rabbitmqctl stop $RABBITMQ_PID_FILE
>
> more shutdown_log
> Stopping and halting node 'nodename@machinename' ...
>
> should I use "rabbitmq-server stop" command ?

rabbitmqctl stop is fine. This is what our Debian package uses in the init script uses.

How did you install RabbitMQ? What version? Can you post log files and (in case of the Debian package)
startup_{err,log}?

christophe molliere

unread,

Apr 29, 2015, 4:13:08 PM4/29/15

to Michael Klishin, rabbitm...@googlegroups.com

RABBIT MQ VERSION : RabbitMQ 3.4.2
ERLANG VERSION : Erlang 17.1

To install rabbit MQ we simply untar the package in a directory with the user rabbitmq

We then just set the appropriate environment variables in order to be able to start the server on the appropriate ports

We have 3 servers running on the same host with the same version

cat startup_log

              RabbitMQ 3.4.2. Copyright (C) 2007-2014 GoPivotal, Inc.
## ##      Licensed under the MPL. See http://www.rabbitmq.com/
## ##
########## Logs: /cltr/logs/m7-prod-fis-amq1/m7_fis_prod_rmq_standard_ixe.log
###### ##        /cltr/logs/m7-prod-fis-amq1/m7_fis_prod_rmq_standard-sasl_ixe.log
##########
              Starting broker... completed with 8 plugins.

If you want some logs please ask for it

Regards

Christophe

Michael Klishin

unread,

Apr 29, 2015, 4:32:26 PM4/29/15

to christophe molliere, rabbitm...@googlegroups.com

On 29 April 2015 at 23:13:06, christophe molliere (momole...@gmail.com) wrote:
> If you want some logs please ask for it

I believe I have ;)

christophe molliere

unread,

May 4, 2015, 4:13:57 AM5/4/15

to Michael Klishin, rabbitm...@googlegroups.com

Hello,

Here is the log file for Startup amd log file of the instance for the restart.

The "_err" logs are empty.

We could increase the log level on your request if this is needed

Christophe

RabbitMQ_log.log

shutdown_log.log

startup_log.log

Michael Klishin

unread,

May 4, 2015, 4:32:45 AM5/4/15

to christophe molliere, rabbitm...@googlegroups.com

On 4 May 2015 at 11:13:55, christophe molliere (momole...@gmail.com) wrote:
> Here is the log file for Startup amd log file of the instance for
> the restart.
> The "_err" logs are empty.

Hm, that's a pretty short log and it only mentions broker startup, not earlier shutdowns.

Can there be earlier rotated logs? *_err logs have stderr redirection (by the Debian package
scripts), they are not related to effective log severity.

christophe molliere

unread,

May 4, 2015, 4:42:32 AM5/4/15

to Michael Klishin, rabbitm...@googlegroups.com

- Yes, this is true we rotate the logs between stop and start

- "_err" are empty : this is the reason why I didn't post them

- log severity is for you having more information.

- I have put the log file of the previous stop

RabbitMQ_log1.log

christophe molliere

unread,

May 4, 2015, 5:54:49 AM5/4/15

to Michael Klishin, rabbitm...@googlegroups.com

Could it be related to the amount of routing keys present on the system ( about 87000) ?

Michael Klishin

unread,

May 4, 2015, 7:44:48 AM5/4/15

to christophe molliere, rabbitm...@googlegroups.com

On 4 May 2015 at 12:54:47, christophe molliere (momole...@gmail.com) wrote:

> Could it be related to the amount of routing keys present on the
> system ( about 87000) ?

No. When a node is shut down, can you find recovery.dets and/or clean.dot files under RabbitMQ's database path?

christophe molliere

unread,

May 5, 2015, 7:59:28 AM5/5/15

to Michael Klishin, rabbitm...@googlegroups.com

after a clean shutdown this is the content of the database dir:

ls -l
total 397688
-rw-rw-r-- 1 rabbitmq rabbitmq       164 May 5 12:45 DECISION_TAB.LOG
-rw-rw-r-- 1 rabbitmq rabbitmq       100 May 5 12:46 LATEST.LOG
-rw-rw-r-- 1 rabbitmq rabbitmq        95 May 5 12:39 cluster_nodes.config
drwxrwxr-x 2 rabbitmq rabbitmq      4096 May 5 12:46 msg_store_persistent
drwxrwxr-x 2 rabbitmq rabbitmq      4096 May 5 12:46 msg_store_transient
-rw-rw-r-- 1 rabbitmq rabbitmq        47 May 5 12:39 nodes_running_at_shutdown
-rw-rw-r-- 1 rabbitmq rabbitmq    853687 May 5 11:57 rabbit_durable_exchange.DCD
-rw-rw-r-- 1 rabbitmq rabbitmq       362 May 5 12:02 rabbit_durable_exchange.DCL
-rw-rw-r-- 1 rabbitmq rabbitmq      2518 May 5 12:24 rabbit_durable_queue.DCD
-rw-rw-r-- 1 rabbitmq rabbitmq       673 May 5 12:45 rabbit_durable_queue.DCL
-rw-rw-r-- 1 rabbitmq rabbitmq 405994703 May 5 11:57 rabbit_durable_route.DCD
-rw-rw-r-- 1 rabbitmq rabbitmq    271742 May 5 12:45 rabbit_durable_route.DCL
-rw-rw-r-- 1 rabbitmq rabbitmq       370 May 5 11:57 rabbit_runtime_parameters.DCD
-rw-rw-r-- 1 rabbitmq rabbitmq         4 May 5 11:58 rabbit_serial
-rw-rw-r-- 1 rabbitmq rabbitmq       334 May 5 11:58 rabbit_user.DCD
-rw-rw-r-- 1 rabbitmq rabbitmq       989 May 5 11:57 rabbit_user_permission.DCD
-rw-rw-r-- 1 rabbitmq rabbitmq       253 May 5 11:58 rabbit_vhost.DCD
-rw-rw-r-- 1 rabbitmq rabbitmq      5464 May 5 11:58 recovery.dets
-rw-rw-r-- 1 rabbitmq rabbitmq     25752 Nov 26 18:40 schema.DAT
-rw-rw-r-- 1 rabbitmq rabbitmq       272 Nov 24 18:03 schema_version

Michael Klishin

unread,

May 5, 2015, 8:01:09 AM5/5/15

to christophe molliere, rabbitm...@googlegroups.com

On 5 May 2015 at 14:59:25, christophe molliere (momole...@gmail.com) wrote:
> drwxrwxr-x 2 rabbitmq rabbitmq 4096 May 5 12:46 msg_store_persistent
> drwxrwxr-x 2 rabbitmq rabbitmq 4096 May 5 12:46 msg_store_transient

What is under these directories?

christophe molliere

unread,

May 5, 2015, 8:19:11 AM5/5/15

to Michael Klishin, rabbitm...@googlegroups.com

msg_store_persistent:
total 12
-rw-rw-r-- 1 rabbitmq rabbitmq 0 May 5 12:48 0.rdq
-rw-rw-r-- 1 rabbitmq rabbitmq 61 May 5 14:13 clean.dot
-rw-rw-r-- 1 rabbitmq rabbitmq 478 May 5 14:13 file_summary.ets
-rw-rw-r-- 1 rabbitmq rabbitmq 404 May 5 14:13 msg_store_index.ets

msg_store_transient:
total 12
-rw-rw-r-- 1 rabbitmq rabbitmq 0 May 5 12:48 0.rdq
-rw-rw-r-- 1 rabbitmq rabbitmq 61 May 5 14:13 clean.dot
-rw-rw-r-- 1 rabbitmq rabbitmq 478 May 5 14:13 file_summary.ets
-rw-rw-r-- 1 rabbitmq rabbitmq 404 May 5 14:13 msg_store_index.ets

Michael Klishin

unread,

May 5, 2015, 8:50:27 AM5/5/15

to christophe molliere, rabbitm...@googlegroups.com

On 5 May 2015 at 15:19:10, christophe molliere (momole...@gmail.com) wrote:
> msg_store_persistent:
> total 12
> -rw-rw-r-- 1 rabbitmq rabbitmq 0 May 5 12:48 0.rdq
> -rw-rw-r-- 1 rabbitmq rabbitmq 61 May 5 14:13 clean.dot
> -rw-rw-r-- 1 rabbitmq rabbitmq 478 May 5 14:13 file_summary.ets
> -rw-rw-r-- 1 rabbitmq rabbitmq 404 May 5 14:13 msg_store_index.ets
>
> msg_store_transient:
> total 12
> -rw-rw-r-- 1 rabbitmq rabbitmq 0 May 5 12:48 0.rdq
> -rw-rw-r-- 1 rabbitmq rabbitmq 61 May 5 14:13 clean.dot
> -rw-rw-r-- 1 rabbitmq rabbitmq 478 May 5 14:13 file_summary.ets
> -rw-rw-r-- 1 rabbitmq rabbitmq 404 May 5 14:13 msg_store_index.ets

So the clean shutdown files are all there.

Does it happen every single time? Can you try 3.5.x or at least 3.4.4?

Do you only see "rebuilding indices from scratch" or is there a "failed to recover index"
message earlier in the logs?

How many messages do you have on disk, would it be possible to archive the entire
database directory and send it to us off-list?

christophe molliere

unread,

May 5, 2015, 11:43:53 AM5/5/15

to Michael Klishin, rabbitm...@googlegroups.com

Does it happen every single time? Can you try 3.5.x or at least 3.4.4?

--> yes so far it happened all the time.
--> the migration should be possible on one of our system test environment but it might not be possible to migrate production

Do you only see "rebuilding indices from scratch" or is there a "failed to recover index"
message earlier in the logs?

--> So far we had always "rebuilding indices from scratch"
m7_fis_prod_rmq_standard_ixe.log:msg_store_persistent: rebuilding indices from scratch
rollover/m7_fis_prod_rmq_standard_ixe.log.1:msg_store_persistent: rebuilding indices from scratch
rollover/m7_fis_prod_rmq_standard_ixe.log.12:rebuilding indices from scratch
rollover/m7_fis_prod_rmq_standard_ixe.log.14:msg_store_persistent: rebuilding indices from scratch
rollover/m7_fis_prod_rmq_standard_ixe.log.2:msg_store_persistent: rebuilding indices from scratch
rollover/m7_fis_prod_rmq_standard_ixe.log.4:msg_store_persistent: rebuilding indices from scratch
rollover/m7_fis_prod_rmq_standard_ixe.log.9:msg_store_persistent: rebuilding indices from scratch

From time to time, we don't have the line "rebuilding indices from scratch".
But Anyway the restart is anyway taking 45 minutes to complete :

This was the case in production
This is as well the case on the System test 1 ( we manage to reproduce exactly the same behavior ):

=INFO REPORT==== 5-May-2015::14:13:54 ===
stopped TCP Listener on 0.0.0.0:5672

=INFO REPORT==== 5-May-2015::14:13:56 ===
Halting Erlang VM

=INFO REPORT==== 5-May-2015::14:16:56 ===
Starting RabbitMQ 3.3.4 on Erlang 17

Licensed under the MPL. See http://www.rabbitmq.com/

=INFO REPORT==== 5-May-2015::14:16:56 ===
node           : rabbit@m7shrdsyt1amq2
home dir       : /var/lib/rabbitmq
config file(s) : /etc/rabbitmq/rabbitmq.config
cookie hash    : BVzNakYXCG9o08ofTtxBgg==
log            : /var/log/rabbitmq/rabbit.log
sasl log       : /var/log/rabbitmq/rabbit-sasl.log
database dir   : /var/lib/rabbitmq/mnesia/rabbit

=INFO REPORT==== 5-May-2015::14:16:57 ===

Limiting to approx 64900 file handles (58408 sockets)

=INFO REPORT==== 5-May-2015::14:17:43 ===

Memory limit set to 13994MB of 19991MB total.

=INFO REPORT==== 5-May-2015::14:17:43 ===

Disk free limit set to 50MB

=INFO REPORT==== 5-May-2015::14:17:43 ===

msg_store_transient: using rabbit_msg_store_ets_index to provide index

=INFO REPORT==== 5-May-2015::14:17:43 ===

msg_store_persistent: using rabbit_msg_store_ets_index to provide index

=INFO REPORT==== 5-May-2015::14:59:03 ===
started TCP Listener on 0.0.0.0:5672

How many messages do you have on disk, would it be possible to archive the entire
database directory and send it to us off-list?

--> We don't have any messages on disk at all : all the queues are empty for the moment
--> I am quite not sure this will be possible but I need to ask on my side

Michael Klishin

unread,

May 5, 2015, 5:43:32 PM5/5/15

to christophe molliere, rabbitm...@googlegroups.com

On 5 May 2015 at 18:43:52, christophe molliere (momole...@gmail.com) wrote:
> --> We don't have any messages on disk at all : all the queues are
> empty for the moment

This and the fact that you have no data segment files in the message store directories
suggests the issue is with something else. The message logged is likely a red herring.

45 minutes is a lot of time. Please inspect iostat and vmstat output as the broker boots:
do you see high disk I/O? OS swapping? high CPU use?

We've seen DNS server timeouts slow down connections by several minutes, for example.
So I'm fairly convinced the issue is not in the message store.

christophe molliere

unread,

May 8, 2015, 10:57:31 AM5/8/15

to Michael Klishin, rabbitm...@googlegroups.com

We manage to test it with the version 3.5.1 :

- We have created the RabbitMQ instance with an empty mnesia database : no issue detected

- We have created the 5103966 Exchanges necessary for our application : this steps tooks quite a while ( around 3 hours ... )

- Then we attached a second node to the cluster : the node attached to the cluster recreate its mnesia database from scratch : this steps tooks around 2 hours and 30 minutes.

- When this was completed we performed a proper stop and start of this server. Here is the logs of the stop and start.

bash-4.1$ ./stop.sh

=INFO REPORT==== 8-May-2015::16:46:40 ===
Stopping RabbitMQ

=INFO REPORT==== 8-May-2015::16:46:40 ===
stopped TCP Listener on 0.0.0.0:51702

=SUPERVISOR REPORT==== 8-May-2015::16:46:46 ===
     Supervisor: {local,mnesia_kernel_sup}
     Context:    shutdown_error
     Reason:     shutdown
     Offender:   [{pid,<0.69.0>},
                  {name,mnesia_monitor},
                  {mfargs,{mnesia_monitor,start,[]}},
                  {restart_type,permanent},
                  {shutdown,3000},
                  {child_type,worker}]

=INFO REPORT==== 8-May-2015::16:46:46 ===
Stopped RabbitMQ application

=INFO REPORT==== 8-May-2015::16:46:46 ===
Halting Erlang VM

bash-4.1$ ./start.sh

=INFO REPORT==== 8-May-2015::16:47:22 ===
Starting RabbitMQ 3.5.1 on Erlang 17.1

Copyright (C) 2007-2014 GoPivotal, Inc.
Licensed under the MPL. See http://www.rabbitmq.com/

=INFO REPORT==== 8-May-2015::16:47:22 ===
node           : m7-systest1-apc-amq2@m7shrdsyt1amq2
home dir       : /var/lib/rabbitmq
config file(s) : /shrd/m7-systest1-apc-amq2/sbin/../etc/rabbitmq/rabbitmq.config
cookie hash    : BVzNakYXCG9o08ofTtxBgg==
log            : /shrd/logs/m7-systest1-apc-amq2/m7-systest1-apc-amq2.log
sasl log       : /shrd/logs/m7-systest1-apc-amq2/m7-systest1-apc-amq2.log
database dir   : /shrd/m7-systest1-apc-amq2/sbin/../var/lib/rabbitmq/mnesia/m7-systest1-apc-amq2

=INFO REPORT==== 8-May-2015::16:47:22 ===

Limiting to approx 64900 file handles (58408 sockets)

=INFO REPORT==== 8-May-2015::16:50:06 ===
Priority queues enabled, real BQ is rabbit_variable_queue

=INFO REPORT==== 8-May-2015::16:50:06 ===
Memory limit set to 16992MB of 19991MB total.

=INFO REPORT==== 8-May-2015::16:50:06 ===

Disk free limit set to 50MB

=INFO REPORT==== 8-May-2015::16:50:06 ===

msg_store_transient: using rabbit_msg_store_ets_index to provide index

=INFO REPORT==== 8-May-2015::16:50:06 ===

msg_store_persistent: using rabbit_msg_store_ets_index to provide index

=WARNING REPORT==== 8-May-2015::16:50:06 ===
msg_store_persistent: recovery terms differ from present
rebuilding indices from scratch

and the steps rebuilding the indices is taking 2 hours and 25 minutes

Michael Klishin

unread,

May 8, 2015, 12:28:55 PM5/8/15

to christophe molliere, rabbitm...@googlegroups.com

On 8 May 2015 at 17:57:29, christophe molliere (momole...@gmail.com) wrote:
> We have created the 5103966 Exchanges necessary for our application
> : this steps tooks quite a while ( around 3 hours ... )

I suspect that this is what takes a while during recovery: loading 5M exchange rows
and associated bindings.

Consider using fewer exchanges or make some of them transient.

Reply all

Reply to author

Forward