Invalid argument when RabbitMQ tries to delete snapshot - queue gets status down

97 views
Skip to first unread message

Robert

unread,
May 23, 2023, 6:57:07 AM5/23/23
to rabbitmq-users
Hi,

RabbitMQ sets a queue to status "down" when RabbitMQ tries to remove a snapshot. It tries to remove it several times without success.

I past the start of the error logs down below. Anyone have any ideas what the problem can be? It says "invalid argument".




 ** Reason for termination = error:{bad_return_from_state_function,

                                    {error,

 ** Callback modules = [ra_server_proc]

 ** Stacktrace =

   crasher:

     initial call: ra_server_proc:init/1

                     {'DOWN',#Ref<0.4087736136.3453747202.14601>,process,

                          [{ra_snapshot,'-begin_snapshot/3-fun-0-',7,

                           {'%2F_QueueName1, QueueName2',

                            'rab...@rabbitmq-2.rabbitmq-headless.svc.cluster.local'}},

                          {opt,terminate},

                          {num_pending_commands,0},

                          {num_delayed_commands,0},

                          {election_timeout_set,false},

                          {ra_server_state,

                               {0,true},

                                 'rab...@rabbitmq-1.rabbitmq-headless.svc.cluster.local'} =>

                                 #{commit_index_sent => 391364,

                                   query_index => 0,status => normal},

                                 'rab...@rabbitmq-2.rabbitmq-headless.svc.cluster.local'} =>

                                   status => normal}},

                              {write_concurrency,

                               #Ref<0.4087736136.3359506433.54447>},

                             current_term => 7,

                              {'%2F_QueueName1, QueueName2',

                               'rab...@rabbitmq-2.rabbitmq-headless.svc.cluster.local'},

                             last_applied => 391364,

                             log =>

                             log_id =>

                              #{checkout_message_bytes => 56,

                                config =>

                                 #{consumer_strategy => competing,

                                   delivery_limit => undefined,

                                   expires => undefined,max_bytes => undefined,

                                   msg_ttl => undefined,

                                   release_cursor_interval => {2048,2048},

                                     <<"QueueName1, QueueName2">>}},

                                enqueue_message_bytes => 0,

                                num_consumers => 2,

                                num_in_memory_ready_messages => 0,

                                release_cursor_enqueue_counter => 1,

                                release_cursors => [],

                                single_active_consumer_id =>

                                 {<<"7d0f00ce-2850-435d-839e-8918d8d6ae0d">>,

                                  <0.7170.1128>},

                                single_active_num_waiting_consumers => 0,

                                smallest_raft_index => 391364,

                                type => rabbit_fifo},

                             max_pipeline_count => 4096,

                             system_config =>

                              #{data_dir =>

                                name => quorum_queues,

                                names =>

                                   directory => ra_directory,

                                   directory_rev => ra_directory_reverse,

                                   segment_writer => ra_log_segment_writer,

                                wal_garbage_collect => false,

                                wal_pre_allocate => false,

                             uid => <<"2F_AFAV6OVQJEUY8BQ">>,

                               'rab...@rabbitmq-2.rabbitmq-headless.svc.cluster.local'}}}]

                                     "delete file /bitnami/rabbitmq/mnesia/rab...@rabbitmq-2.rabbitmq-headless.svc.cluster.local/quorum/rab...@rabbitmq-2.rabbitmq-headless.svc.cluster.local/2F_AFAV6OVQJEUY8BQ/snapshots/0000000000000007_000000000005F8C3: invalid argument\n"}}

 ** Callback mode = [state_functions,state_enter]

 **  [{gen_statem,loop_state_callback_result,11,

                  [{file,"gen_statem.erl"},{line,1580}]},

      {proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,240}]}]

 ** Time-outs: {1,[{{timeout,tick},tick_timeout}]}

Michal Kuratczyk

unread,
May 23, 2023, 7:05:52 AM5/23/23
to rabbitm...@googlegroups.com
What version is this?

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/66dc25f4-d70c-46b2-9bd5-b2902124a218n%40googlegroups.com.


--
Michał
RabbitMQ team

Robert

unread,
May 23, 2023, 7:19:43 AM5/23/23
to rabbitmq-users
Hi,

It's 3.11.4.

Thank you for taking the time to reply. I have run RabbitMQ for a couple of years without problems so this is a new problem for me. 

Best regards
Robert 

Michal Kuratczyk

unread,
May 23, 2023, 7:31:24 AM5/23/23
to rabbitm...@googlegroups.com
Can you provide full logs from this timeframe? Ideally from all nodes. Thanks,



--
Michał
RabbitMQ team

Robert

unread,
May 23, 2023, 7:45:37 AM5/23/23
to rabbitmq-users
Sure thing. Here is all Iv got. Alot of messages (around 1200) in a veery short time period.
rabbitlogs.txt

Michal Kuratczyk

unread,
May 23, 2023, 8:12:36 AM5/23/23
to rabbitm...@googlegroups.com
What's the state of the node now? If it's still running, can you try `rabbitmqctl set_log_level debug` to see if nore logs are produced?
It seems like a low-level disk operation failed - any chance you ran out of space or something?

Can you show the file structure of bitnami/rabbitmq/mnesia/rab...@rabbitmq-2.rabbitmq-headless.svc.cluster.local/quorum/rab...@rabbitmq-2.rabbitmq-headless.svc.cluster.local/2F_AFAV6OVQJEUY8BQ/?
(using `tree` or something)?

Thanks,



--
Michał
RabbitMQ team

Robert

unread,
May 23, 2023, 8:48:45 AM5/23/23
to rabbitmq-users
The node works fine now. I think you are right that it is something with memory. Thanks for the guidance. I think I will look into the infrastructure that RabbitMQ runs on to see if I can find a memory related problem. It runs on Kubernetes. 

I have attached the file structure of one of a similar file.

Thanks
filestructure_rabbitmq2.yml

kjnilsson

unread,
May 23, 2023, 9:33:34 AM5/23/23
to rabbitmq-users
We're more thinking storage space than memory.

Could it be that the queue in question was deleted? The filestructure you shared does not contain the queue member's internal id (2F_AFAV6OVQJEUY8BQ)

Reply all
Reply to author
Forward
0 new messages