Hi,
No steps to reproduce yet, but I had a crash again, and here is the situation.
1. Using RabbitMQ 3.10.5 this time. Single RabbitMQ instance.
2. The logs are flooded with the following messages. It is about 20 messages per second, every second:
2022-06-20 22:02:57.003759+00:00 [warning] <0.1109.0> rabbit_stream_coordinator: failed to get tail of member __n23-sensor_pressure_1650136879875999476 on rabbit@iotp in 18 Error: function_clause
2022-06-20 22:02:57.056537+00:00 [warning] <0.1110.0> rabbit_stream_coordinator: failed to get tail of member __n23-sensor_light_1650136878094380461 on rabbit@iotp in 16 Error: function_clause
2022-06-20 22:02:57.076811+00:00 [warning] <0.1111.0> rabbit_stream_coordinator: failed to get tail of member __n23-sensor_temperature_1650136883291908840 on rabbit@iotp in 17 Error: function_clause
2022-06-20 22:02:57.097473+00:00 [warning] <0.1112.0> rabbit_stream_coordinator: failed to get tail of member __n23-sensor_humidity_1650136876352549559 on rabbit@iotp in 18 Error: function_clause
3. The list_queues command shows no message count beside affected streams
# rabbitmqctl list_queues
Timeout: 60.0 seconds ...
Listing queues for vhost / ...
name messages
n23-sensor/battery_level 131166
n23-sensor/accelerometer 58993
test/performance 0
n23-sensor/switch 99
n23-sensor/light
n23-sensor/humidity
n23-sensor/pressure
n23-sensor/temperature
4. CPU usage by beam.smp process is at 368% - each core at 50-80% CPU utilization.
5. Finally
# du -sm stream/*
12 stream/__n23-sensor_accelerometer_1650136610118548000
24 stream/__n23-sensor_battery_level_1650136874671313741
3202 stream/__n23-sensor_humidity_1650136876352549559
2782 stream/__n23-sensor_light_1650136878094380461
3170 stream/__n23-sensor_pressure_1650136879875999476
1 stream/__n23-sensor_switch_1650136881596498973
3258 stream/__n23-sensor_temperature_1650136883291908840
1 stream/__test_performance_1650130735857311698
Any tips on how can I recover the streams and minimize data loss (my only method so far is to remove all data from streams' directories)?
I can hold this state of my system for few days. If, by any chance, you would have a patch to try, so RabbitMQ could automatically recover, let me know please.
Regards,
Artur