2020-04-01 18:50:13.415 [error] <0.337.0> CRASH REPORT Process <0.337.0> with 0 neighbours exited with reason: {{schema_integrity_check_failed,[{table_content_invalid,rabbit_durable_queue,{amqqueue,{resource,'_',queue,'_'},'_','_','_','_','_','_','_','_','_','_','_','_','_','_','_','_','_'},[{amqqueue,{resource,<<"/">>,queue,<<"mq_facts">>},true,false,none,[],<0.16806.0>,[],[],[],[{vhost,<<"/">>},{name,<<"queue-size-limit">>},{pattern,<<"^mq_facts$">>},{'apply-to',<<"all">>},{definition,[{<<"max-length-bytes">>,72000000000}]},{priority,0}],undefined,[],undefined,stopped,0,[],<<"/">>,#{user => <<"ro...">>},...}]}]},...} in application_master:init/4 line 138
It looped in that state, not able to start up, until we moved aside its data files and started it with a fresh database.
The reboot was triggered by an ordinary "shutdown" command and the syslog shows systemd shutting down services in an orderly manner, including Rabbit. Rabbit logs around the time of the shutdown follow:
2020-04-01 18:49:42.719 [error] <0.28535.0> ** Connection attempt from disallowed node 'rabbitmqcli-19535-rabbit@missionq-queueserver-experiment' **
2020-04-01 18:49:47.218 [error] <0.28541.0> ** Connection attempt from disallowed node 'rabbitmqcli-19735-rabbit@missionq-queueserver-experiment' **
2020-04-01 18:49:47.227 [error] <0.28543.0> ** Connection attempt from disallowed node 'rabbitmqcli-19735-rabbit@missionq-queueserver-experiment' **
2020-04-01 18:49:47.240 [info] <0.65.0> SIGTERM received - shutting down
2020-04-01 18:49:47.243 [warning] <0.620.0> RabbitMQ HTTP listener registry could not find context rabbitmq_management_tls
2020-04-01 18:49:47.248 [info] <0.272.0> Peer discovery backend rabbit_peer_discovery_classic_config does not support registration, skipping unregistration.
2020-04-01 18:49:47.249 [info] <0.614.0> stopped TCP listener on [::]:5672
2020-04-01 18:49:47.249 [error] <0.16481.0> Error on AMQP connection <0.16481.0> (172.20.0.135:22526 -> 172.20.0.237:5672, vhost: '/', user: 'roger', state: running), channel 0:
operation none caused a connection exception connection_forced: "broker forced connection closure with reason 'shutdown'"
[lots of those repeat as connections close]
2020-04-01 18:50:11.505 [info] <0.8.0> Log file opened with Lager
2020-04-01 18:50:13.305 [info] <0.8.0> Feature flags: list of feature flags found:
2020-04-01 18:50:13.305 [info] <0.8.0> Feature flags: [ ] drop_unroutable_metric
2020-04-01 18:50:13.305 [info] <0.8.0> Feature flags: [ ] empty_basic_get_metric
2020-04-01 18:50:13.305 [info] <0.8.0> Feature flags: [ ] implicit_default_bindings
2020-04-01 18:50:13.305 [info] <0.8.0> Feature flags: [ ] quorum_queue
2020-04-01 18:50:13.305 [info] <0.8.0> Feature flags: [ ] virtual_host_metadata
2020-04-01 18:50:13.305 [info] <0.8.0> Feature flags: feature flag states written to disk: yes
2020-04-01 18:50:13.390 [info] <0.327.0> ra: meta data store initialised. 0 record(s) recovered
2020-04-01 18:50:13.391 [info] <0.334.0> WAL: recovering ["/var/lib/rabbitmq/mnesia/rabbit@missionq-queueserver-experiment/quorum/rabbit@missionq-queueserver-experiment/00000005.wal"]
2020-04-01 18:50:13.392 [info] <0.338.0>
Starting RabbitMQ 3.8.2 on Erlang 21.3.8.11
Copyright (c) 2007-2019 Pivotal Software, Inc.
Licensed under the MPL 1.1. Website: https://rabbitmq.com
2020-04-01 18:50:13.393 [info] <0.338.0>
node : rabbit@missionq-queueserver-experiment
home dir : /var/lib/rabbitmq
config file(s) : /etc/rabbitmq/rabbitmq.conf
cookie hash : hXNHjnSRJHlc6uC0fTYImQ==
log(s) : /var/log/rabbitmq/rab...@missionq-queueserver-experiment.log
: /var/log/rabbitmq/rabbit@missionq-queueserver-experiment_upgrade.log
database dir : /var/lib/rabbitmq/mnesia/rabbit@missionq-queueserver-experiment
2020-04-01 18:50:13.404 [info] <0.338.0> Running boot step pre_boot defined by app rabbit
2020-04-01 18:50:13.404 [info] <0.338.0> Running boot step rabbit_core_metrics defined by app rabbit
2020-04-01 18:50:13.404 [info] <0.338.0> Running boot step rabbit_alarm defined by app rabbit
2020-04-01 18:50:13.407 [info] <0.344.0> Memory high watermark set to 2861 MiB (3000000000 bytes) of 7680 MiB (8053833728 bytes) total
2020-04-01 18:50:13.409 [info] <0.346.0> Enabling free disk space monitoring
2020-04-01 18:50:13.409 [info] <0.346.0> Disk free limit set to 1000MB
2020-04-01 18:50:13.412 [info] <0.338.0> Running boot step code_server_cache defined by app rabbit
2020-04-01 18:50:13.412 [info] <0.338.0> Running boot step file_handle_cache defined by app rabbit
2020-04-01 18:50:13.412 [info] <0.349.0> Limiting to approx 32671 file handles (29401 sockets)
2020-04-01 18:50:13.412 [info] <0.350.0> FHC read buffering: OFF
2020-04-01 18:50:13.412 [info] <0.350.0> FHC write buffering: ON
2020-04-01 18:50:13.412 [info] <0.338.0> Running boot step worker_pool defined by app rabbit
2020-04-01 18:50:13.413 [info] <0.339.0> Will use 4 processes for default worker pool
2020-04-01 18:50:13.413 [info] <0.339.0> Starting worker pool 'worker_pool' with 4 processes in it
2020-04-01 18:50:13.413 [info] <0.338.0> Running boot step database defined by app rabbit
2020-04-01 18:50:13.414 [info] <0.338.0> Waiting for Mnesia tables for 30000 ms, 9 retries left
2020-04-01 18:50:13.415 [info] <0.338.0> Waiting for Mnesia tables for 30000 ms, 9 retries left
2020-04-01 18:50:13.415 [error] <0.337.0> CRASH REPORT Process <0.337.0> with 0 neighbours exited with reason: {{schema_integrity_check_failed,[{table_content_invalid,rabbit_durable_queue,{amqqueue,{resource,'_',queue,'_'},'_','_','_','_','_','_','_','_','_','_','_','_','_','_','_','_','_'},[{amqqueue,{resource,<<"/">>,queue,<<"mq_facts">>},true,false,none,[],<0.16806.0>,[],[],[],[{vhost,<<"/">>},{name,<<"queue-size-limit">>},{pattern,<<"^mq_facts$">>},{'apply-to',<<"all">>},{definition,[{<<"max-length-bytes">>,72000000000}]},{priority,0}],undefined,[],undefined,stopped,0,[],<<"/">>,#{user => <<"ro...">>},...}]}]},...} in application_master:init/4 line 138
2020-04-01 18:50:13.416 [info] <0.43.0> Application rabbit exited with reason: {{schema_integrity_check_failed,[{table_content_invalid,rabbit_durable_queue,{amqqueue,{resource,'_',queue,'_'},'_','_','_','_','_','_','_','_','_','_','_','_','_','_','_','_','_'},[{amqqueue,{resource,<<"/">>,queue,<<"mq_facts">>},true,false,none,[],<0.16806.0>,[],[],[],[{vhost,<<"/">>},{name,<<"queue-size-limit">>},{pattern,<<"^mq_facts$">>},{'apply-to',<<"all">>},{definition,[{<<"max-length-bytes">>,72000000000}]},{priority,0}],undefined,[],undefined,stopped,0,[],<<"/">>,#{user => <<"ro...">>},...}]}]},...}
Thanks for any help you can offer!
Ezra
Hi Ezra,You need to be sure that RabbitMQ is stopped gracefully before rebooting your server.
Hi Ezra,No matter how "write consistent" software tries to be there is always the chance that a non-graceful stop results in corrupt data. This isn't unique to RabbitMQ, and yes, RabbitMQ writes data as consistently as possible.
schema_integrity_check_failed is an unusual error, and one that I would only expect to see if something happened during an upgrade.Was your reboot part of an upgrade process?Thanks -Luke
On Wednesday, April 15, 2020 at 9:48:52 AM UTC-7, Ezra Cooper wrote:On Tue, Apr 14, 2020 at 4:39 PM Luke Bakken <lba...@pivotal.io> wrote:Hi Ezra,You need to be sure that RabbitMQ is stopped gracefully before rebooting your server.That's a big limitation, isn't it? I'm expecting the queue to persist across a power outage, or Amazon deciding to fail the instance (we're running on EC2 in production).If the answer here is "RabbitMQ doesn't really make an effort to keep its writes crash-consistent," then I can accept that, though it's a bit of a disappointment. It would actually help us a lot to know that for sure.We're using Ubuntu Xenial.Thanks for your follow-up,Ezra
--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/9899fe0b-1bd4-4c1a-aef7-4b65a139cccd%40googlegroups.com.