RabbitMQ 3.6.5 won't start, crashes. shutdown_error noproc

7,485 views
Skip to first unread message

Alex Gkiouros

unread,
Jun 23, 2017, 7:14:27 AM6/23/17
to rabbitmq-users
RabbitMQ with around 8million messages on it, crashed for no reason.
This is an EC2 machine using EBS persistent storage.

Tried to restart the service failing due to 5 minute limit.
Tried to sudo rabbitmq-server, failed with the SASL errors

Got a backup of mnesia directory but it's 19GB.

Any help appreciated.
Thanks!

https://pastebin.com/hnidSXGK - One of the Supervisor Report in SASL.log
https://pastebin.com/rVYtvbSR - One of the Crash Reports in SASL.log

https://pastebin.com/ATPnH8Cd - One of the Error Reports in log
https://pastebin.com/uimvcpMe - Last shutdown_err that was recordded
https://pastebin.com/6qU6hp6P - RAM error prior to shutting down and upgrading 128GB to 256GB

Michael Klishin

unread,
Jun 23, 2017, 8:44:18 AM6/23/17
to rabbitm...@googlegroups.com
The "noproc" exceptions mean that a generic pool of processes used by queue recovery
is depleted. It is not the root cause, something lead those process to fail earlier.

"shutdown_err" is not a reason for node shutdown. 

One of the pastebins simply says that rabbitmqctl could not connect to a node that's
not running (as far as the CLI tool could detect).

eheap_alloc is likely be a result of an issue in Erlang/OTP (RabbitMQ does not allocate memory directly),
not a real need to allocate 77 GB, and it definitely does lead to VM termination.

If you post full logs we may be able to see what the root cause is.

I highly recommend upgrading to Erlang 19.3.6 and RabbitMQ 3.6.10 if you can.
Here's where you can learn more: http://www.rabbitmq.com/changelog.html.


--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
MK

Staff Software Engineer, Pivotal/RabbitMQ

Alex Gkiouros

unread,
Jun 23, 2017, 9:34:48 AM6/23/17
to rabbitmq-users

Let me know if you need anything else Michael, thanks.
rabbit@liverabbit3.log
rabbit@liverabbit3-sasl.log

Alex Gkiouros

unread,
Jun 23, 2017, 10:02:11 AM6/23/17
to rabbitmq-users
Forgot that one aswell, thanks!
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
startup_log

Michael Klishin

unread,
Jun 23, 2017, 10:29:32 AM6/23/17
to rabbitm...@googlegroups.com
The root cause is

    exception exit: {{case_clause,undefined},
                     [{rabbit_queue_index,add_segment_relseq_entry,3,
                          [{file,"src/rabbit_queue_index.erl"},{line,1091}]},
                      {rabbit_queue_index,parse_segment_entries,3,
                          [{file,"src/rabbit_queue_index.erl"},{line,1075}]},
                      {rabbit_queue_index,segment_entries_foldr,3,
                          [{file,"src/rabbit_queue_index.erl"},{line,1041}]},
                      {rabbit_queue_index,scan_segments,3,
                          [{file,"src/rabbit_queue_index.erl"},{line,677}]},
                      {rabbit_queue_index,queue_index_walker_reader,2,
                          [{file,"src/rabbit_queue_index.erl"},{line,664}]},
                      {rabbit_queue_index,'-queue_index_walker/1-fun-0-',2,
                          [{file,"src/rabbit_queue_index.erl"},{line,645}]},
                      {worker_pool_worker,handle_cast,2,
                          [{file,"src/worker_pool_worker.erl"},{line,121}]},
                      {gen_server2,handle_msg,2,
                          [{file,"src/gen_server2.erl"},{line,1032}]}]}

Long story short, a queue failed to recover its index after the VM terminated abruptly (as we've seen in the original port).
It's a long standing issue (or group of issues) which we haven't gotten to reproduce yet.
We have a few more fundamental solutions in mind for 3.8.0.

Unfortunately the node won't be able to recover. You need to reset it (as in `rabbitmqctl reset`)
or re-provision a new one from scratch.

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Alex Gkiouros

unread,
Jun 23, 2017, 10:31:06 AM6/23/17
to rabbitmq-users
So this means i`ll lose all of my data currently in the queues, right? about 8 million messages.. 

Michael Klishin

unread,
Jun 23, 2017, 10:34:18 AM6/23/17
to rabbitm...@googlegroups.com
Actually, you can remove all queue index files but if your messages are < 4096 bytes
that would be no particularly different from doing a reset since message bodies are embedded
into the index (by default).

With larger messages that go into the message store, the index will be recreated by performing
a sequential scan over the entire store on node start.

Michael Klishin

unread,
Jun 23, 2017, 10:35:23 AM6/23/17
to rabbitm...@googlegroups.com
With a reset, you will lose everything on this node that wasn't mirrored.

See my earlier response about an alternative solution that will preserve
larger messages (assuming you use default message store settings).

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Alex Gkiouros

unread,
Jun 23, 2017, 10:43:06 AM6/23/17
to rabbitmq-users
Its not a cluster, there was nothing mirrored...

Michael Klishin

unread,
Jun 23, 2017, 10:51:33 AM6/23/17
to rabbitm...@googlegroups.com
You can move

/var/lib/rabbitmq/mnesia/rabbit@{hostname]/queues/**/*.idx

files and the node will start with larger messages (in the message store) recovered.

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Alex Gkiouros

unread,
Jun 23, 2017, 10:54:06 AM6/23/17
to rabbitmq-users
All of the messages are a size of 700 bytes...

Alex Gkiouros

unread,
Jun 23, 2017, 10:59:00 AM6/23/17
to rabbitmq-users
mostly... will try that and get back to you.

Thanks a lot btw Michael

Alex Gkiouros

unread,
Jun 26, 2017, 8:37:53 AM6/26/17
to rabbitmq-users
The error receiving after i moved all *.idx files out of mnesia folder:

=INFO REPORT==== 26-Jun-2017::12:36:11 ===
Error description:
   {could_not_start,rabbit,
       {{badmatch,
            {error,
                {{{function_clause,
                      [{rabbit_queue_index,journal_minus_segment1,
                           [{no_pub,del,no_ack},undefined],
                           [{file,"src/rabbit_queue_index.erl"},{line,1181}]},
                       {rabbit_queue_index,'-journal_minus_segment/3-fun-0-',
                           4,
                           [{file,"src/rabbit_queue_index.erl"},{line,1158}]},
                       {array,sparse_foldl_3,7,
                           [{file,"array.erl"},{line,1690}]},
                       {array,sparse_foldl_2,9,
                           [{file,"array.erl"},{line,1684}]},
                       {rabbit_queue_index,'-recover_journal/1-fun-0-',1,
                           [{file,"src/rabbit_queue_index.erl"},{line,865}]},
                       {lists,map,2,[{file,"lists.erl"},{line,1238}]},
                       {rabbit_queue_index,segment_map,2,
                           [{file,"src/rabbit_queue_index.erl"},{line,989}]},
                       {rabbit_queue_index,recover_journal,1,
                           [{file,"src/rabbit_queue_index.erl"},{line,856}]}]},
                  {gen_server2,call,[<0.274.0>,fork,infinity]}},
                 {child,undefined,msg_store_persistent,
                     {rabbit_msg_store,start_link,
                         [msg_store_persistent,
                          "/var/lib/rabbitmq/mnesia/rabbit@liverabbit3",[],
                          {#Fun<rabbit_queue_index.2.103862237>,
                           {start,
                               [{resource,<<"location">>,queue,
                                    <<"ha.v.b0f1112c783f4f9022e0dc45be55a721">>},
                                {resource,<<"location">>,queue,
                                    <<"ha.v.ed878d29990cafe49c2d669367190c71">>},
                                {resource,<<"location">>,queue,
                                    <<"ha.v.03e42ef0d0419cab237eab89b5544459">>},
                                {resource,<<"location">>,queue,
                                    <<"ha.v.cf1b8e2f030bbaa4239bff7e70f7cea6">>},
                                {resource,<<"location">>,queue,
                                    <<"ha.v.4d3eadf21fd0e9e385320d8419de9060">>},
                                {resource,<<"location">>,queue,
                                    <<"ha.v.4a2abfff4ed9c982a3a794b580719e2e">>},
                                {resource,<<"location">>,queue,
                                    <<"ha.v.e06a8f1e9149a99c3858afd9199fb1f7">>},
                                {resource,<<"location">>,queue,
                                    <<"ha.v.6d2f376d3791bbe89cdd11cce01b8c85">>},
                                {resource,<<"location">>,queue,
                                    <<"ha.v.635816481b482444dfdf7b04cfc7e9b8">>},
                                {resource,<<"location">>,queue,
                                    <<"ha.v.2c7f5de57d95ab17cda765d307309f86">>},
                                {resource,<<"location">>,queue,
                                    <<"ha.v.e106d2508d194641b28c972f2b9968cb">>},
                                {resource,<<"location">>,queue,
                                    <<"ha.v.82a4ba2090a3e4dc8c83c04f38984c3d">>},
                                {resource,<<"location">>,queue,
                                    <<"ha.v.86ecef73b9fe459e61950cd7bf02494f">>},
                                {resource,<<"location">>,queue,
                                    <<"ha.v.a2091c2038e846f4e61c934b58c554c1">>},
                                {resource,<<"location">>,queue,
                                    <<"ha.v.dea4c12ca78286a62c8c799e8adb00ac">>},
                                {resource,<<"location">>,queue,
                                    <<"ha.v.578e1105c71a72cbda15a5982136896d">>},
                                {resource,<<"location">>,queue,
                                    <<"ha.v.807dd671910f6ebc2e4a4a35625482a4">>},
                                {resource,<<"location">>,queue,
                                    <<"ha.v.e5e4d7e93b2fa0387d4b792bcacfa169">>},
                                {resource,<<"location">>,queue,
                                    <<"ha.v.62023fccc1f9b5a1b02d54f107eb92da">>},
                                {resource,<<"location">>,queue,
                                    <<"ha.v.ba41155dd83fe1153c6886d41bac0de4">>},
                                {resource,<<"location">>,queue,
                                    <<"ha.v.54b443b36b0c4f594a84504ded6fc9a3">>},
                                {resource,<<"location">>,queue,
                                    <<"ha.v.f8a9e36e2a71118f7c87be2353545e30">>},
                                {resource,<<"location">>,queue,
                                    <<"ha.v.8f67a605b136216e58ece4178ec44d19">>},
                                {resource,<<"location">>,queue,
                                    <<"ha.v.1d2309c155f3e57bcbdd6d200adf00e5">>},
                                {resource,<<"location">>,queue,
                                    <<"ha.z.912ab6d2a5589480ad97695a1ce32709">>},
                                {resource,<<"location">>,queue,
                                    <<"ha.v.cbe10c318db73618318cae0ce1b79b9b">>},
                                {resource,<<"location">>,queue,
                                    <<"ha.v.bc4dd8ab2be3db4c5825df718e7ac39f">>,
                                    '...'},
                                {resource,<<"location">>,queue,'...'},
                                {resource,<<"location">>,'...'},
                                {resource,'...'},
                                {'...'},
                                {'...'},
                                {'...'},
                                {'...'},
                                {'...'},
                                '...']}}]},
                     transient,30000,worker,
                     [rabbit_msg_store]}}}},
        [{rabbit_variable_queue,start_msg_store,2,
             [{file,"src/rabbit_variable_queue.erl"},{line,454}]},
         {rabbit_variable_queue,start,1,
             [{file,"src/rabbit_variable_queue.erl"},{line,436}]},
         {rabbit_priority_queue,start,1,
             [{file,"src/rabbit_priority_queue.erl"},{line,92}]},
         {rabbit_amqqueue,recover,0,
             [{file,"src/rabbit_amqqueue.erl"},{line,239}]},
         {rabbit,recover,0,[{file,"src/rabbit.erl"},{line,652}]},
         {rabbit_boot_steps,'-run_step/2-lc$^1/1-1-',1,
             [{file,"src/rabbit_boot_steps.erl"},{line,49}]},
         {rabbit_boot_steps,run_step,2,
             [{file,"src/rabbit_boot_steps.erl"},{line,49}]},
         {rabbit_boot_steps,'-run_boot_steps/1-lc$^0/1-0-',1,
             [{file,"src/rabbit_boot_steps.erl"},{line,26}]}]}}
Log files (may contain more information):
   /var/log/rabbitmq/rab...@liverabbit3.log
   /var/log/rabbitmq/rab...@liverabbit3-sasl.log

Thanks

Michael Klishin

unread,
Jun 26, 2017, 12:49:09 PM6/26/17
to rabbitm...@googlegroups.com
It's still queue index recovery, so some index segment files were not moved.

   /var/log/rabbitmq/rabbit@liverabbit3.log
   /var/log/rabbitmq/rabbit@liverabbit3-sasl.log

Thanks

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Alex Gkiouros

unread,
Jun 27, 2017, 4:36:13 AM6/27/17
to rabbitmq-users
So I keep adding directories into queues one by one, and rabbitmq sometimes start sometimes not,
when it doesnt, i remove the dir and continue with the next one...

Today as I was adding directories, rabbitmq was starting each time normally. I tried stopping / starting without touching queues (for a test) and then i got the following error.

=CRASH REPORT==== 27-Jun-2017::08:31:26 ===
  crasher:
    initial call: application_master:init/4
    pid: <0.171.0>
    registered_name: []
    exception exit: {bad_return,
                     {{rabbit,start,[normal,[]]},
                      {'EXIT',
                       {{badmatch,
                         {error,
                          {{{{case_clause,undefined},
                             [{rabbit_queue_index,add_segment_relseq_entry,3,
                               [{file,"sr..."},{line,1091}]},
                              {rabbit_queue_index,parse_segment_entries,3,
                               [{file,"s..."},{line,1075}]},
                              {rabbit_queue_index,segment_entries_foldr,3,
                               [{file,"..."},{line,1041}]},
                              {rabbit_queue_index,scan_segments,3,
                               [{file,"..."},{line,677}]},
                              {rabbit_queue_index,queue_index_walker_reader,
                               2,
                               [{file,"..."},{line,664}]},
                              {rabbit_queue_index,
                               '-queue_index_walker/1-fun-0-',2,
                               [{file,"..."},{line,645}]},
                              {worker_pool_worker,handle_cast,2,
                               [{file,"..."},{line,121}]},
                              {gen_server2,handle_msg,2,
                               [{file,"..."},{line,1032}]}]},
                            {gen_server2,call,[<0.274.0>,out,infinity]}},
                           {child,undefined,msg_store_persistent,
                            {rabbit_msg_store,start_link,
                             [msg_store_persistent,"/var/lib/rabbit...",[],
                              {#Fun<rabbit_queue_index.2.103862237>,
                               {start,
                                [{'...'},{'...'},{'...'},{'...'},'...']}}]},
                            transient,30000,worker,
                            [rabbit_msg_store]}}}},
                        [{rabbit_variable_queue,start_msg_store,2,
                          [{file,"src/rabbit_variable_queue.erl"},{line,454}]},
                         {rabbit_variable_queue,start,1,
                          [{file,"src/rabbit_variable_queue.er..."},
                           {line,436}]},
                         {rabbit_priority_queue,start,1,
                          [{file,"src/rabbit_priority_queue.e..."},{line,92}]},
                         {rabbit_amqqueue,recover,0,
                          [{file,"src/rabbit_amqqueue.erl"},{line,239}]},
                         {rabbit,recover,0,
                          [{file,"src/rabbit.erl"},{line,652}]},
                         {rabbit_boot_steps,'-run_step/2-lc$^1/1-1-',1,
                          [{file,"src/rabbit_boot_steps.er..."},{line,49}]},
                         {rabbit_boot_steps,run_step,2,
                          [{file,"src/rabbit_boot_steps.e..."},{line,49}]},
                         {rabbit_boot_steps,'-run_boot_steps/1-lc$^0/1-0-',1,
                          [{file,"src/rabbit_boot_steps...."},{line,26}]}]}}}}
      in function  application_master:init/4 (application_master.erl, line 134)
    ancestors: [<0.170.0>]
    messages: [{'EXIT',<0.172.0>,normal}]
    links: [<0.170.0>,<0.7.0>]
    dictionary: []
    trap_exit: true
    status: running
    heap_size: 75113
    stack_size: 27
    reductions: 5842
  neighbours:


Any ideas?
Thanks

Michael Klishin

unread,
Jun 27, 2017, 4:40:45 AM6/27/17
to rabbitm...@googlegroups.com
It's still queue index recovery (`rabbit_queue_index` is the module in question).

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Alex Gkiouros

unread,
Jun 28, 2017, 9:29:40 AM6/28/17
to rabbitmq-users
I really do getting messages recovered by copying some old queue folders inside the new queue directory, firing up the service, let it work - stop and repeat.
Thanks Michael.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages