Had a server that couldn't start due to a corrupt queue index file.
=ERROR REPORT==== 1-May-2017::12:57:08 ===
** Generic server <0.171.0> terminating
** Last message in was {'$gen_cast',
{submit_async,
#Fun<rabbit_queue_index.32.38356666>}}
** When Server state == undefined
** Reason for termination ==
** {function_clause,
[{rabbit_queue_index,journal_minus_segment1,
[{{true,
<<210,124,139,191,213,156,62,104,107,226,7,108,7,75,110,182,
0,0,0,0,0,0,0,0,0,0,36,160>>,
<<>>},
no_del,no_ack},
{{true,
<<210,124,139,191,213,156,62,104,107,226,7,108,7,75,110,182,
0,0,0,0,0,0,0,0,0,0,36,160>>,
<<>>},
del,no_ack}],
[]},
{rabbit_queue_index,'-journal_minus_segment/3-fun-0-',4,[]},
{array,sparse_foldl_3,7,[{file,"array.erl"},{line,1690}]},
{array,sparse_foldl_2,9,[{file,"array.erl"},{line,1684}]},
{rabbit_queue_index,'-recover_journal/1-fun-0-',1,[]},
{lists,map,2,[{file,"lists.erl"},{line,1238}]},
{rabbit_queue_index,segment_map,2,[]},
{rabbit_queue_index,recover_journal,1,[]}]}
I was able to pin it down a a certain file with the help of strace:
sudo strace -fytT -e trace=file -o strace.log rabbitmq-server
In the strace log i found that it crashed when reading /var/lib/rabbitmq/mnesia/rabbit@cluster-node-01/queues/2EWRQ76TBN0CW0ZZUCSSOKWOU/115.idx
and by removing it the server could boot again, without having to remove all queues and messages.
But if RabbitMQ could detect and skip files that are corrupt in this way it might be even better.
RabbitMQ 3.5.7, Erlang 18.2
I've sent all related logs, files and traces to support at
rabbitmq.com as they contain message data.