RabbitMQ instance crashes with badmatch error not_a_dets_file

15,399 views
Skip to first unread message

Abhijit Kulkarni

unread,
Jul 6, 2015, 1:34:49 AM7/6/15
to rabbitm...@googlegroups.com
On starting a rabbitmq instance, it crashed and the log file reads as follows

=INFO REPORT==== 6-Jul-2015::10:27:43 ===
Error description:
   {could_not_start,rabbit,
       {{badmatch,
            {error,
                {{{badmatch,
                      {error,
                          {not_a_dets_file,
                              "/var/lib/rabbitmq/mnesia/services_ord/recovery.dets"}}},
                  [{rabbit_recovery_terms,open_table,0,[]},
                   {rabbit_recovery_terms,init,1,[]},
                   {gen_server,init_it,6,[{file,"gen_server.erl"},{line,306}]},
                   {proc_lib,init_p_do_apply,3,
                       [{file,"proc_lib.erl"},{line,239}]}]},
                 {child,undefined,rabbit_recovery_terms,
                     {rabbit_recovery_terms,start_link,[]},
                     transient,4294967295,worker,
                     [rabbit_recovery_terms]}}}},
        [{rabbit_queue_index,start,1,[]},
         {rabbit_variable_queue,start,1,[]},
         {rabbit_priority_queue,start,1,[]},
         {rabbit_amqqueue,recover,0,[]},
         {rabbit,recover,0,[]},
         {rabbit,'-run_step/2-lc$^1/1-1-',1,[]},
         {rabbit,run_step,2,[]},
         {rabbit,'-run_boot_steps/1-lc$^0/1-0-',1,[]}]}}

The sasl log file reads the following

=CRASH REPORT==== 6-Jul-2015::10:35:23 ===
  crasher:
    initial call: application_master:init/4
    pid: <0.135.0>
    registered_name: []
    exception exit: {bad_return,
                     {{rabbit,start,[normal,[]]},
                      {'EXIT',
                       {{badmatch,
                         {error,
                          {{{badmatch,
                             {error,
                              {not_a_dets_file,
                               "/var/lib/rabbitmq/mnesia/services_ord/recovery.dets"}}},
                            [{rabbit_recovery_terms,open_table,0,[]},
                             {rabbit_recovery_terms,init,1,[]},
                             {gen_server,init_it,6,
                              [{file,"gen_server.erl"},{line,306}]},
                             {proc_lib,init_p_do_apply,3,
                              [{file,"proc_lib.erl"},{line,239}]}]},
                           {child,undefined,rabbit_recovery_terms,
                            {rabbit_recovery_terms,start_link,[]},
                            transient,4294967295,worker,
                            [rabbit_recovery_terms]}}}},
                        [{rabbit_queue_index,start,1,[]},
                         {rabbit_variable_queue,start,1,[]},
                         {rabbit_priority_queue,start,1,[]},
                         {rabbit_amqqueue,recover,0,[]},
                         {rabbit,recover,0,[]},
                         {rabbit,'-run_step/2-lc$^1/1-1-',1,[]},
                         {rabbit,run_step,2,[]},
                         {rabbit,'-run_boot_steps/1-lc$^0/1-0-',1,[]}]}}}}
      in function  application_master:init/4 (application_master.erl, line 133)
    ancestors: [<0.134.0>]
    messages: [{'EXIT',<0.136.0>,normal}]
    links: [<0.134.0>,<0.7.0>]
    dictionary: []

Since we needed the rabbitmq-server instance to be up immediately, we removed the folder for the node from the mnesia directory and restarted the broker

Can someone please tell me what went wrong here? And what could be the solution apart from deleting the mnesia folder which I think is not a healthy resolution at all. 

Regards,
Abhijit

Michael Klishin

unread,
Jul 6, 2015, 4:11:41 AM7/6/15
to Abhijit Kulkarni, rabbitm...@googlegroups.com
On 6 Jul 2015 at 08:34:52, Abhijit Kulkarni (abhijitku...@gmail.com) wrote:
> =INFO REPORT==== 6-Jul-2015::10:27:43
> ===
> Error description:
> {could_not_start,rabbit,
> {{badmatch,
> {error,
> {{{badmatch,
> {error,
> {not_a_dets_file,
> "/var/lib/rabbitmq/mnesia/services_ord/recovery.dets"}}},

RabbitMQ could not read recovery.dets, which suggests it was corrupted or modified by another piece of software.

If the issue persists, remove that file and restart the service. 
--
MK

Staff Software Engineer, Pivotal/RabbitMQ


Michael Sander

unread,
Jun 19, 2016, 1:20:16 PM6/19/16
to rabbitmq-users, abhijitku...@gmail.com
I just saw this too. Removing recovery.dets and restarting fixed the issue, but it is concerning how I got into this state in the first place. It would be nice if rabbit could auto-detect this corruption and either fix the file or delete it itself.

Michael Sander

unread,
Jun 19, 2016, 8:26:38 PM6/19/16
to rabbitmq-users, abhijitku...@gmail.com
This just happened again today. I'm not sure the cause. The recovery.dets file exists, but it's empty when I get this error.

ocr-proc-3:~$ sudo ls -l /var/lib/rabbitmq/mnesia/rabbit@ocr-proc-3/recovery.dets
-rw-r--r-- 1 rabbitmq rabbitmq 0 Jun 18 06:42 /var/lib/rabbitmq/mnesia/rabbit@ocr-proc-3/recovery.dets


Again, after deleting this file and restarting, things worked fine. Below is the output of sudo service rabbitmq-server status:

ocr-proc-3:~$ sudo service rabbitmq-server status
Status of node 'rabbit@ocr-proc-3' ...
[{pid,6959},
 {running_applications,
     [{rabbitmq_management,"RabbitMQ Management Console","3.6.2"},
      {rabbitmq_management_agent,"RabbitMQ Management Agent","3.6.2"},
      {rabbit,"RabbitMQ","3.6.2"},
      {mnesia,"MNESIA  CXC 138 12","4.13.3"},
      {rabbitmq_web_dispatch,"RabbitMQ Web Dispatcher","3.6.2"},
      {webmachine,"webmachine","1.10.3"},
      {mochiweb,"MochiMedia Web Server","2.13.1"},
      {ssl,"Erlang/OTP SSL application","7.3"},
      {public_key,"Public key infrastructure","1.1.1"},
      {crypto,"CRYPTO","3.6.3"},
      {amqp_client,"RabbitMQ AMQP Client","3.6.2"},
      {rabbit_common,[],"3.6.2"},
      {compiler,"ERTS  CXC 138 10","6.0.3"},
      {xmerl,"XML parser","1.3.10"},
      {os_mon,"CPO  CXC 138 46","2.4"},
      {syntax_tools,"Syntax tools","1.7"},
      {inets,"INETS  CXC 138 49","6.2"},
      {ranch,"Socket acceptor pool for TCP protocols.","1.2.1"},
      {asn1,"The Erlang ASN1 compiler version 4.0.2","4.0.2"},
      {sasl,"SASL  CXC 138 11","2.7"},
      {stdlib,"ERTS  CXC 138 10","2.8"},
      {kernel,"ERTS  CXC 138 10","4.2"}]},
 {os,{unix,linux}},
 {erlang_version,
     "Erlang/OTP 18 [erts-7.3] [source] [64-bit] [smp:32:32] [async-threads:512] [kernel-poll:true]\n"},
 {memory,
     [{total,79535568},
      {connection_readers,0},
      {connection_writers,0},
      {connection_channels,0},
      {connection_other,2712},
      {queue_procs,43168},
      {queue_slave_procs,0},
      {plugins,589144},
      {other_proc,20640832},
      {mnesia,78880},
      {mgmt_db,342856},
      {msg_index,126944},
      {other_ets,1435848},
      {binary,31232},
      {code,27723001},
      {atom,992409},
      {other_system,27528542}]},
 {alarms,[]},
 {listeners,[{clustering,25672,"::"},{amqp,5672,"::"}]},
 {vm_memory_high_watermark,0.4},
 {vm_memory_limit,12167703756},
 {disk_free_limit,50000000},
 {disk_free,4854685696},
 {file_descriptors,
     [{total_limit,32668},
      {total_used,3},
      {sockets_limit,29399},
      {sockets_used,0}]},
 {processes,[{limit,1048576},{used,257}]},
 {run_queue,0},
 {uptime,14},
 {kernel,{net_ticktime,60}}]

Michael Sander

unread,
Jun 19, 2016, 8:42:52 PM6/19/16
to rabbitmq-users, abhijitku...@gmail.com
After more investigation, I believe that this error can arise if rabbitmq does not shut down properly (i.e., if a computer is turned off while rabbitmq is running). 

For me, this particular error appeared on a virtual machine after it was shut down. When I restarted the same machine, rabbitmq would not start. In the SASL logs I see the following error during shutdown. 

=SUPERVISOR REPORT==== 18-Jun-2016::06:10:00 ===
     Supervisor: {<0.7152.105>,rabbit_channel_sup}
     Context:    shutdown_error
     Reason:     killed
     Offender:   [{pid,<0.7192.105>},
                  {name,channel},
                  {mfargs,
                      {rabbit_channel,start_link,
                          [1,<0.6689.105>,<0.7190.105>,<0.6689.105>,
                           <<"127.0.0.1:39393 -> 127.0.0.1:5672">>,
                           rabbit_framing_amqp_0_9_1,
                           {user,<<"docketal">>,
                               [administrator],
                               [{rabbit_auth_backend_internal,none}]},
                           <<"docketalarm.com">>,
                           [{<<"connection.blocked">>,bool,true},
                            {<<"consumer_cancel_notify">>,bool,true}],
                           <0.7154.105>,<0.7193.105>]}},
                  {restart_type,intrinsic},
                  {shutdown,70000},
                  {child_type,worker}]


=SUPERVISOR REPORT==== 18-Jun-2016::06:29:45 ===
     Supervisor: {<0.29.0>,user_sup}
     Context:    child_terminated
     Reason:     enospc
     Offender:   [{pid,<0.30.0>},{mod,user_sup}]


=CRASH REPORT==== 18-Jun-2016::06:29:45 ===
  crasher:
    initial call: supervisor_bridge:user_sup/1
    pid: <0.29.0>
    registered_name: []
    exception exit: enospc
      in function  gen_server:terminate/7 (gen_server.erl, line 826)
    ancestors: [kernel_sup,<0.10.0>]
    messages: []
    links: [<0.11.0>]
    dictionary: []
    trap_exit: true
    status: running
    heap_size: 376
    stack_size: 27
    reductions: 9304
  neighbours:

=SUPERVISOR REPORT==== 18-Jun-2016::06:29:45 ===
     Supervisor: {local,kernel_sup}
     Context:    child_terminated
     Reason:     enospc
     Offender:   [{pid,<0.29.0>},
                  {id,user},
                  {mfargs,{user_sup,start,[]}},
                  {restart_type,temporary},
                  {shutdown,2000},
                  {child_type,supervisor}]


=CRASH REPORT==== 18-Jun-2016::06:42:59 ===
  crasher:
    initial call: gen:init_it/7
    pid: <0.268.0>
    registered_name: msg_store_transient
    exception exit: {{badmatch,{error,enospc}},
                     [{rabbit_msg_store,terminate,2,
                          [{file,"src/rabbit_msg_store.erl"},{line,975}]},
                      {gen_server2,terminate,3,
                          [{file,"src/gen_server2.erl"},{line,1146}]},
                      {proc_lib,wake_up,3,
                          [{file,"proc_lib.erl"},{line,250}]}]}
      in function  gen_server2:terminate/3 (src/gen_server2.erl, line 1149)
    ancestors: [rabbit_sup,<0.139.0>]
    messages: [{'DOWN',#Ref<0.0.6029315.242562>,process,<0.26188.107>,
                          normal},
                  {'EXIT',<0.269.0>,normal},
                  {'DOWN',#Ref<0.0.6029315.242563>,process,<0.26192.107>,
                          normal}]
    links: [<0.142.0>,#Port<0.22679>]
    dictionary: [{credit_blocked,[]},
                  {{"/var/lib/rabbitmq/mnesia/rabbit@ocr-proc-3/msg_store_transient/...",
                    fhc_file},
                   {file,1,true}},
                  {fhc_age_tree,{1,
                                 {{-576422097843728985,#Ref<0.0.3.1030>},
                                  true,nil,nil}}},
                  {{#Ref<0.0.3.1030>,fhc_handle},
                   {handle,{file_descriptor,prim_file,{#Port<0.22679>,47}},
                           #Ref<0.0.3.1030>,4194424,true,1048606,1048576,
                           [<<0,0,0,0,0,0,0,22,143,180,202,112,85,247,147,220,
                              149,141,50,24,77,20,208,16,131,109,0,0,0,0,255>>,
                            <<0,0,0,0,0,0,0,22,245,106,143,75,166,80,208,216,
                              34,9,195,103,200,254,184,250,131,109,0,0,0,0,255>>,
                            <<0,0,0,0,0,0,0,22,145,39,112,144,46,18,238,3,153,
                              173,54,126,85,77,190,153,131,109,0,0,0,0,255>>,
                            <<0,0,0,0,0,0,0,22,216,1

Michael Klishin

unread,
Jun 20, 2016, 7:26:38 AM6/20/16
to rabbitm...@googlegroups.com
ENOSPC = no space left on disk

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Michael Phillips

unread,
Oct 13, 2017, 1:43:51 AM10/13/17
to rabbitmq-users
This resolved my issue of RabbitMQ service for Solarwinds not running. Thanks Michael Klishin!

sibr...@newscape.me

unread,
Dec 27, 2017, 6:22:13 AM12/27/17
to rabbitmq-users
Thanks Michel
 this resolved my issue with alienvault OSSIM rabbitmq error starting. 

can.l...@gmail.com

unread,
Jul 25, 2018, 2:51:58 AM7/25/18
to rabbitmq-users
Is there any way to check the recovery.dets, I want delete recovery.dets if it is not OK before rabbitmq-server start .


在 2015年7月6日星期一 UTC+8下午1:34:49,Abhijit Kulkarni写道:

Michael Klishin

unread,
Jul 25, 2018, 9:52:11 AM7/25/18
to rabbitm...@googlegroups.com
You can delete the file but that will make the node perform more (potentially a lot more) work during recovery
since it won't know what has been done previously.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages