After crashing, rabbitmq service fails to start; unless persistent data is deleted

3,604 views
Skip to first unread message

Sukosd Endre

unread,
Feb 23, 2016, 4:14:48 AM2/23/16
to rabbitmq-users

Hi,


I'm using a single RabbitMQ instance (not a cluster), and all the queues declared are durable and all the messages sent are persistent.

I'm sending messages to RabbitMQ continuously, then (to simulate a crash) I kill the rabbitmq process and then start RabbitMQ service again.


The problem I face is that after the second unexpected shutdown, the RabbitMQ service fails to start normally.

Even though rabbitmq-service.bat start returns this:

C:\Program Files\erl7.1\erts-7.1\bin\erlsrv: Service RabbitMQ started.

the service is not running. 

rabbitmqctl.bat status outputs:

Error: unable to connect to node 'rabbit@HCE-G971WY1': nodedown

Any suggestions, why the service fails to start?


If I delete all persistence data (\AppData\Roaming\RabbitMQ\db), then RabbitMQ starts normally, but then all my messages and queues are lost. Which is not what I want.


I'm using:

  • Windows 7
  • RabbitMQ 3.6.0 on Erlang 18.1

I've attached my AppData folder containing logs, erl_crash, mnesia and journal files.


Any help would be appreciated,

Endre

RabbitMQ_AppData.zip

Michael Klishin

unread,
Feb 23, 2016, 8:22:04 AM2/23/16
to rabbitm...@googlegroups.com
RabbitMQ fails to read a queue index. Force killing the node can have this effect: there is a window of time during which some message store data is held only in RAM. We would be interested in making the message store more resilient to this if you can explain how exactly you conduct your test but that's the general problem. If you have to test with a single node, shut it down normally and you should see the difference.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
MK

Staff Software Engineer, Pivotal/RabbitMQ

Sukosd Endre

unread,
Feb 23, 2016, 8:48:50 AM2/23/16
to rabbitmq-users
Thanks, for your answer.

I rerun the tests using graceful shutdown (rabbitmq-service.bat stop), and in this case, of course rabbitmq can start and everything works nicely.

However, when I just kill the process (to simulate a power outage) and crash rabbitmq, it cannot start again. 
To kill the rabbitmq process I use the WMIC (Windows Management Instrumentation Command-line) - wmic process where "CommandLine like '%rabbitmq%' and Name like 'erl.exe'" delete /nointeractive 

Sukosd Endre

unread,
Feb 23, 2016, 8:58:30 AM2/23/16
to rabbitmq-users
I forgot to mention that my goal is to achieve guaranteed message delivery.

So I use publisher conforms and message acknowledgment, which should make sure that no messages are lost and acknowledged messages can be recovered after a unexpected shutdown. But this doesn't seems to be the case, or maybe I'm missing some configuraiton.

Michael Klishin

unread,
Feb 23, 2016, 9:01:43 AM2/23/16
to rabbitm...@googlegroups.com
rabbit.queue_index_max_journal_entries control how quickly queue index will be flushed to disk. Note that values < 32 are probably going to have substantial disk use and throughput impact with small (< 4KiB by default) messages.

On Tue, Feb 23, 2016 at 4:58 PM, Sukosd Endre <endre....@gmail.com> wrote:
I forgot to mention that my goal is to achieve guaranteed message delivery.

So I use publisher conforms and message acknowledgment, which should make sure that no messages are lost and acknowledged messages can be recovered after a unexpected shutdown. But this doesn't seems to be the case, or maybe I'm missing some configuraiton.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Sukosd Endre

unread,
Feb 24, 2016, 4:24:48 AM2/24/16
to rabbitmq-users
Okay, one more question to see if I understand this right.

Even though we use publisher confirms (which means the messages are persisted), sometimes the messages aren't persisted yet and force killing the node corrupts the queue index (making rabbitmq unable to start)? Is this true?
What is scary about this, is the frequency at which this occurs. One out of three times in our test. (We use two queues, send 20000 messages during a 10 minutes period, during which we restart rabbitmq three times)

Michael Klishin

unread,
Feb 24, 2016, 4:37:41 AM2/24/16
to rabbitm...@googlegroups.com, Sukosd Endre
On 24 February 2016 at 12:24:51, Sukosd Endre (endre....@gmail.com) wrote:
> Even though we use publisher confirms (which means the messages
> are persisted), sometimes the messages aren't persisted yet
> and force killing the node corrupts the queue index (making rabbitmq
> unable to start)? Is this true?

This is a lot more involved than your  definition.

Publisher confirms are sent when it is safe, as described in
http://rabbitmq.com/confirms.html

The definition of that varies between mirrored and non-mirrored queues, for example.

However, message store and queue index have multiple tuneable values.
You can disable message embedding into the index (the index can be rebuilt from
on disk messages on boot, even though it takes time) by setting rabbit.queue_index_embed_msgs_below
to 0:

http://www.rabbitmq.com/persistence-conf.html
https://github.com/rabbitmq/rabbitmq-server/blob/master/docs/rabbitmq.config.example#L305
https://github.com/rabbitmq/rabbitmq-server/blob/master/src/rabbit.app.src#L38

Lastly, lazy queues assume things should be moved to disk as soon as possible:
http://rabbitmq.com/lazy-queues.html

Like with any other system, it's a matter of throughput vs. safety.

Sukosd Endre

unread,
Feb 24, 2016, 10:31:38 AM2/24/16
to rabbitmq-users, endre....@gmail.com
Thanks, this was really useful.

I'm using non-mirrored queues and tried setting rabbit.queue_index_embed_msgs_below to 0, but the failure still occurred pretty frequently.
What would you suggest if I want to go for absolute safety?

Michael Klishin

unread,
Feb 24, 2016, 10:44:57 AM2/24/16
to rabbitm...@googlegroups.com, Sukosd Endre
On 24 February 2016 at 18:31:42, Sukosd Endre (endre....@gmail.com) wrote:
> What would you suggest if I want to go for absolute safety?

You can try setting rabbit. queue_index_max_journal_entries to 64 or something really low
like that. 

Sukosd Endre

unread,
Feb 26, 2016, 9:50:31 AM2/26/16
to rabbitmq-users, endre....@gmail.com
Thanks, this seems to work, and was exactly the config I was looking for.
I've set rabbit.queue_index_max_journal_entries to 64 and run the tests. RabbitMQ service could now start normally, even after multiple force kills.

ez...@qumulo.com

unread,
Jul 23, 2018, 10:03:41 PM7/23/18
to rabbitmq-users
From this thread I gather that by default RabbitMQ can corrupt my queues on a crash, which is disappointing… It means that in such a crash, I don't just lose messages that were "in flight" (not yet fully written) but may actually lose huge queues of messages (if my consumers are behind for some reason). (And even then, it sounds like I would have to develop my own automated system for deleting the corrupted files.)

In my experiments, setting queue_index_max_journal_entries to 1 has not produced the problem. Is this something I can count on, in order to get guaranteed delivery?

Thanks,
Ezra

Michael Klishin

unread,
Jul 23, 2018, 10:17:37 PM7/23/18
to rabbitm...@googlegroups.com
queue_index_max_journal_entries controls how many messages are “in flight” in a queue index. It has no relation to whether your consumers are “behind”. Consumer
considerations are documented in [1]. I’m not sure how you arrived at these conclusions.

Setting the value to 1 is only suitable for low volume workloads.

Ezra Cooper

unread,
Jul 23, 2018, 11:44:05 PM7/23/18
to rabbitm...@googlegroups.com
Yes, sorry, I made a big step in the middle there.

The corruption bug seems to mean that I have to delete an entire queue when it hits (is that right?). Therefore I want to minimize the likelihood of hitting the corruption. Setting queue_index_max_journal_entries to 1 (or 0?) seems to be a way of forcing the writes to be completed immediately, and thus not giving a window for the corruption to occur. My workload is not particularly low-volume, so I'd like a better solution.

Is there any better way to prevent the corruption?

Thanks,
Ezra


To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

Michael Klishin

unread,
Jul 24, 2018, 7:42:48 AM7/24/18
to rabbitm...@googlegroups.com
You cannot set queue_index_max_journal_entries to 0. I don’t even know what would happen if you do but a buffer “to flush” with the size of 0 makes no sense.

There’s always a window of time in which something can fail before certain pieces of data are moved to disk, with RabbitMQ or not. The question is how long the window is and how likely the failure is to occur.

Like I said a single entry journal is extremely conservative and is only realistic for environments with low message rates.

Reducing the value to, say, 1024 and focussing on using Publisher Confirms [1] in your publishers correctly sounds like a better use of developer’s time to me.

Ezra Cooper

unread,
Jul 24, 2018, 1:24:28 PM7/24/18
to rabbitm...@googlegroups.com
On Tue, Jul 24, 2018 at 4:42 AM, Michael Klishin <mkli...@pivotal.io> wrote:
You cannot set queue_index_max_journal_entries to 0. I don’t even know what would happen if you do but a buffer “to flush” with the size of 0 makes no sense.

There’s always a window of time in which something can fail before certain pieces of data are moved to disk, with RabbitMQ or not. The question is how long the window is and how likely the failure is to occur.

I'm not concerned about losing the in-flight messages themselves. I'm concerned about the fact that the entire queue gets corrupted, and needs to be deleted. The only reason I got interested in queue_index_max_journal_entries is just it seems like the corruption can only occur if there *are* in-flight messages. Let's put queue_index_max_journal_entries aside, I think that was a red herring, and I can't afford the super-low throughput. 

If I understand correctly, Publisher Confirms won't help because even when the message is written to disk and confirmed to the publisher, the queue (on disk) can still be corrupted.

As an example of what I'm concerned about: Let's say I'm processing 100k messages per hour, and I have a restart of rabbitmq that loses the in-flight messages; let's say that's 1024 messages. Fine, that's life. But let's say I took my consumers out of service for an hour, so I've got 100k message *persisted to disk in the queue*. Now the restart of rabbitmq might lose 100k messages, if the corruption occurs (which seems pretty likely in my experiments; well more than 10% of the time). That's where my problem lies.

I would expect a system like this never to corrupt its own indexes, even if it's interrupted in the middle of an incoming operation.

If it matters, I'm on RabbitMQ 3.5.7.

Ezra

Michael Klishin

unread,
Jul 24, 2018, 1:34:35 PM7/24/18
to rabbitm...@googlegroups.com
> If I understand correctly, Publisher Confirms won't help because even when the message is written to disk and confirmed to the publisher, the queue (on disk) can still be corrupted.

Again, I'm not sure how specifically you arrived at this conclusion based on this thread. We do see scenarios a few times a year where seemingly the same sequence of events
prevents queue index from recovering. No one has been able to reliably reproduce this and the vast majority of users never hit it.

There are other kinds of known failures that our team is aware of that are more problematic. Have you considered such "unknown unknowns"?
We are working on redesigning pretty much the entire distribution layer(s) for 3.8 and 4.0 to address most of them.

> I would expect a system like this never to corrupt its own indexes, even if it's interrupted in the middle of an incoming operation.

I assure you that the system tries to do that and so does our team. Unfortunately there are these things called
"hard to track down bugs" and "design decisions that have unforeseen consequences". There's plenty of them in distributed and highly concurrent systems
and it takes a while to iron out.

If you don't find what RabbitMQ has to offer, use something else.

One day node local storage will be significantly redesigned to make it more resilient
to more failure scenarios but this is a never ending process.

There are data stores that I won't name that have the reputation of being super robust and
if you ask one of the core developers if it's possible to reproduce a data corruption, they always know of a scenario or two.
Yet their users usually don't run into those and sleep well at night assuming there are no failure scenarios their favorite tool cannot handle.


To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Ezra Cooper

unread,
Jul 27, 2018, 6:00:53 PM7/27/18
to rabbitm...@googlegroups.com
Michael,

I don't think I stated my goal very clearly, which is to understand if I'm doing something really strange with Rabbit or if this corruption is expected. It sounds like you think corruption should be rare, which is good, while my tests have been turning it up pretty frequently. I'm still optimistic about getting to use Rabbit in my project, in production. I hope we can be allies in that!

Also, apologies for the delay—I needed some time to work up a standalone repro script (see attached). This script has a central loop where it chooses an action from (1) send a message, (2) try to receive a message, or (3) kill and restart rabbit. This script, in my experience (on an Ubuntu 14.04 Linux VM with RabbitMQ 3.5.7) can reproduce the problem quite quickly (see output below: in this run, it took 4 restarts and a handful of seconds).

Hopefully this script can help your team reproduce the issue; in any case maybe you can say what it is about the script that's atypical. I don't want to get one corruption out of every 4 restarts in production, but then the script is pretty harsh. One thing about the script that seems relevant is the time between starting it up and doing the next actions. If I insert a sleep of a few seconds after the line RabbitRestarter.wait_for_port('127.0.0.1', amqp_port), the test seems to get through orders of magnitude more restarts before corruption.

Thanks for any help you can offer,
Ezra

ezra@ezravm:~$ ~/rabbit_test.py
RabbitMQ node name: rabbit-16VQ99
 Sending message 0
RabbitMQ pid: 17970
Failed to connect; waiting and retrying

              RabbitMQ 3.5.7. Copyright (C) 2007-2015 Pivotal Software, Inc.
  ##  ##      Licensed under the MPL.  See http://www.rabbitmq.com/
  ##  ##
  ##########  Logs: /tmp/rabbitmq-server/rabbit-16VQ99/log/rabbit-16VQ99.log
  ######  ##        /tmp/rabbitmq-server/rabbit-16VQ99/log/rabbit-16VQ99-sasl.log
  ##########
              Starting broker...Failed to connect; waiting and retrying
 completed with 0 plugins.
Received: 0 at iteration 0
Received: 1 at iteration 1
Listing queues ...
hello 0
Killing rabbit for the 0th time
RabbitMQ node name: rabbit-16VQ99
RabbitMQ pid: 18206

              RabbitMQ 3.5.7. Copyright (C) 2007-2015 Pivotal Software, Inc.
  ##  ##      Licensed under the MPL.  See http://www.rabbitmq.com/
  ##  ##
  ##########  Logs: /tmp/rabbitmq-server/rabbit-16VQ99/log/rabbit-16VQ99.log
  ######  ##        /tmp/rabbitmq-server/rabbit-16VQ99/log/rabbit-16VQ99-sasl.log
  ##########
              Starting broker... completed with 0 plugins.
Listing queues ...
hello 0
Received: 2 at iteration 2
Received: 3 at iteration 3
Listing queues ...
hello 1
Killing rabbit for the 1th time
RabbitMQ node name: rabbit-16VQ99
RabbitMQ pid: 18538

              RabbitMQ 3.5.7. Copyright (C) 2007-2015 Pivotal Software, Inc.
  ##  ##      Licensed under the MPL.  See http://www.rabbitmq.com/
  ##  ##
  ##########  Logs: /tmp/rabbitmq-server/rabbit-16VQ99/log/rabbit-16VQ99.log
  ######  ##        /tmp/rabbitmq-server/rabbit-16VQ99/log/rabbit-16VQ99-sasl.log
  ##########
              Starting broker... completed with 0 plugins.
Listing queues ...
hello 1
Listing queues ...
hello 1
Killing rabbit for the 2th time
RabbitMQ node name: rabbit-16VQ99
RabbitMQ pid: 18870

              RabbitMQ 3.5.7. Copyright (C) 2007-2015 Pivotal Software, Inc.
  ##  ##      Licensed under the MPL.  See http://www.rabbitmq.com/
  ##  ##
  ##########  Logs: /tmp/rabbitmq-server/rabbit-16VQ99/log/rabbit-16VQ99.log
  ######  ##        /tmp/rabbitmq-server/rabbit-16VQ99/log/rabbit-16VQ99-sasl.log
  ##########
              Starting broker... completed with 0 plugins.
Listing queues ...
hello 1
Received: 4 at iteration 4
Received: 5 at iteration 5
Received: 6 at iteration 6
Listing queues ...
hello 3
Killing rabbit for the 3th time
RabbitMQ node name: rabbit-16VQ99
RabbitMQ pid: 19202

              RabbitMQ 3.5.7. Copyright (C) 2007-2015 Pivotal Software, Inc.
  ##  ##      Licensed under the MPL.  See http://www.rabbitmq.com/
  ##  ##
  ##########  Logs: /tmp/rabbitmq-server/rabbit-16VQ99/log/rabbit-16VQ99.log
  ######  ##        /tmp/rabbitmq-server/rabbit-16VQ99/log/rabbit-16VQ99-sasl.log
  ##########
              Starting broker...

BOOT FAILED
===========

Error description:
   {could_not_start,rabbit,
       {{badmatch,
            {error,
                {{{{case_clause,
                       {{true,
                            <<187,214,15,194,134,91,109,68,20,23,60,138,2,115,
                              246,16,0,0,0,0,0,0,0,0,0,0,0,158>>,
                            <<131,104,6,100,0,13,98,97,115,105,99,95,109,101,
                              115,115,97,103,101,104,4,100,0,8,114,101,115,
                              111,117,114,99,101,109,0,0,0,1,47,100,0,8,101,
                              120,99,104,97,110,103,101,109,0,0,0,0,108,0,0,0,
                              1,109,0,0,0,5,104,101,108,108,111,106,104,6,100,
                              0,7,99,111,110,116,101,110,116,97,60,100,0,4,
                              110,111,110,101,109,0,0,0,3,16,0,2,100,0,25,114,
                              97,98,98,105,116,95,102,114,97,109,105,110,103,
                              95,97,109,113,112,95,48,95,57,95,49,108,0,0,0,1,
                              109,0,0,0,158,97,98,99,100,101,102,103,104,105,
                              106,107,108,109,110,111,112,113,114,115,116,117,
                              118,119,120,121,122,97,98,99,100,101,102,103,
                              104,105,106,107,108,109,110,111,112,113,114,115,
                              116,117,118,119,120,121,122,97,98,99,100,101,
                              102,103,104,105,106,107,108,109,110,111,112,113,
                              114,115,116,117,118,119,120,121,122,97,98,99,
                              100,101,102,103,104,105,106,107,108,109,110,111,
                              112,113,114,115,116,117,118,119,120,121,122,97,
                              98,99,100,101,102,103,104,105,106,107,108,109,
                              110,111,112,113,114,115,116,117,118,119,120,121,
                              122,97,98,99,100,101,102,103,104,105,106,107,
                              108,109,110,111,112,113,114,115,116,117,118,119,
                              120,121,122,32,52,106,109,0,0,0,16,187,214,15,
                              194,134,91,109,68,20,23,60,138,2,115,246,16,100,
                              0,4,116,114,117,101>>},
                        no_del,no_ack}},
                   [{rabbit_queue_index,action_to_entry,3,
                        [{file,"src/rabbit_queue_index.erl"},{line,780}]},
                    {rabbit_queue_index,add_to_journal,3,
                        [{file,"src/rabbit_queue_index.erl"},{line,757}]},
                    {rabbit_queue_index,add_to_journal,3,
                        [{file,"src/rabbit_queue_index.erl"},{line,748}]},
                    {rabbit_queue_index,parse_journal_entries,2,
                        [{file,"src/rabbit_queue_index.erl"},{line,895}]},
                    {rabbit_queue_index,recover_journal,1,
                        [{file,"src/rabbit_queue_index.erl"},{line,869}]},
                    {rabbit_queue_index,scan_segments,3,
                        [{file,"src/rabbit_queue_index.erl"},{line,692}]},
                    {rabbit_queue_index,queue_index_walker_reader,2,
                        [{file,"src/rabbit_queue_index.erl"},{line,680}]},
                    {rabbit_queue_index,'-queue_index_walker/1-fun-0-',2,
                        [{file,"src/rabbit_queue_index.erl"},{line,661}]}]},
                  {gen_server2,call,[<0.199.0>,out,infinity]}},
                 {child,undefined,msg_store_persistent,
                     {rabbit_msg_store,start_link,
                         [msg_store_persistent,
                          "/tmp/rabbitmq-server/rabbit-16VQ99/mnesia",[],
                          {#Fun<rabbit_queue_index.2.95208117>,
                           {start,[{resource,<<"/">>,queue,<<"hello">>}]}}]},
                     transient,4294967295,worker,
                     [rabbit_msg_store]}}}},
        [{rabbit_variable_queue,start_msg_store,2,
             [{file,"src/rabbit_variable_queue.erl"},{line,452}]},
         {rabbit_variable_queue,start,1,
             [{file,"src/rabbit_variable_queue.erl"},{line,434}]},
         {rabbit_priority_queue,start,1,
             [{file,"src/rabbit_priority_queue.erl"},{line,90}]},
         {rabbit_amqqueue,recover,0,
             [{file,"src/rabbit_amqqueue.erl"},{line,214}]},
         {rabbit,recover,0,[{file,"src/rabbit.erl"},{line,665}]},
         {rabbit,'-run_step/2-lc$^1/1-1-',1,
             [{file,"src/rabbit.erl"},{line,561}]},
         {rabbit,run_step,2,[{file,"src/rabbit.erl"},{line,561}]},
         {rabbit,'-run_boot_steps/1-lc$^0/1-0-',1,
             [{file,"src/rabbit.erl"},{line,548}]}]}}

Log files (may contain more information):
   /tmp/rabbitmq-server/rabbit-16VQ99/log/rabbit-16VQ99.log
   /tmp/rabbitmq-server/rabbit-16VQ99/log/rabbit-16VQ99-sasl.log

{"init terminating in do_boot",{could_not_start,rabbit,{{badmatch,{error,{{{{case_clause,{{true,<<187,214,15,194,134,91,109,68,20,23,60,138,2,115,246,16,0,0,0,0,0,0,0,0,0,0,0,158>>,<<131,104,6,100,0,13,98,97,115,105,99,95,109,101,115,115,97,103,101,104,4,100,0,8,114,101,115,111,117,114,99,101,109,0,0,0,1,47,100,0,8,101,120,99,104,97,110,103,101,109,0,0,0,0,108,0,0,0,1,109,0,0,0,5,104,101,108,108,111,106,104,6,100,0,7,99,111,110,116,101,110,116,97,60,100,0,4,110,111,110,101,109,0,0,0,3,16,0,2,100,0,25,114,97,98,98,105,116,95,102,114,97,109,105,110,103,95,97,109,113,112,95,48,95,57,95,49,108,0,0,0,1,109,0,0,0,158,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,32,52,106,109,0,0,0,16,187,214,15,194,134,91,109,68,20,23,60,138,2,115,246,16,100,0,4,116,114,117,101>>},no_del,no_ack}},[{rabbit_queue_index,action_to_entry,3,[{file,"src/rabbit_queue_index.erl"},{line,780}]},{rabbit_queue_index,add_to_journal,3,[{file,"src/rabbit_queue_index.erl"},{line,757}]},{rabbit_queue_index,add_to_journal,3,[{file,"src/rabbit_queue_index.erl"},{line,748}]},{rabbit_queue_index,parse_journal_entries,2,[{file,"src/rabbit_queue_index.erl"},{line,895}]},{rabbit_queue_index,recover_journal,1,[{file,"src/rabbit_queue_index.erl"},{line,869}]},{rabbit_queue_index,scan_segments,3,[{file,"src/rabbit_queue_index.erl"},{line,692}]},{rabbit_queue_index,queue_index_walker_reader,2,[{file,"src/rabbit_queue_index.erl"},{line,680}]},{rabbit_queue_index,'-queue_index_walker/1-fun-0-',2,[{file,"src/rabbit_queue_index.erl"},{line,661}]}]},{gen_server2,call,[<0.199.0>,out,infinity]}},{child,undefined,msg_store_persistent,{rabbit_msg_store,start_link,[msg_store_persistent,"/tmp/rabbitmq-server/rabbit-16VQ99/mnesia",[],{#Fun<rabbit_queue_index.2.95208117>,{start,[{resource,<<"/">>,queue,<<"hello">>}]}}]},transient,4294967295,worker,[rabbit_msg_store]}}}},[{rabbit_variable_queue,start_msg_store,2,[{file,"src/rabbit_variable_queue.erl"},{line,452}]},{rabbit_variable_queue,start,1,[{file,"src/rabbit_variable_queue.erl"},{line,434}]},{rabbit_priority_queue,start,1,[{file,"src/rabbit_priority_queue.erl"},{line,90}]},{rabbit_amqqueue,recover,0,[{file,"src/rabbit_amqqueue.erl"},{line,214}]},{rabbit,recover,0,[{file,"src/rabbit.erl"},{line,665}]},{rabbit,'-run_step/2-lc$^1/1-1-',1,[{file,"src/rabbit.erl"},{line,561}]},{rabbit,run_step,2,[{file,"src/rabbit.erl"},{line,561}]},{rabbit,'-run_boot_steps/1-lc$^0/1-0-',1,[{file,"src/rabbit.erl"},{line,548}]}]}}}

rabbit_test.py

Michael Klishin

unread,
Jul 27, 2018, 8:53:03 PM7/27/18
to rabbitm...@googlegroups.com
It can be that after SIGTERM'ing a node enough times in a loop it will fail to recover a queue index.

How realistic that workload is, I don't know. It would be interesting to run a similar test against
quorum queues with a write-ahead log coming in 3.8.
Reply all
Reply to author
Forward
0 new messages