We saw a peculiar issue upgrading from 3.7.12 -> 3.7.18.
This was conducted on a four node rabbit cluster v 3.7.12, no changes made to config files or settings prior to upgrade.
only 3 vhosts living on the broker, with no traffic. This is a testing/integration cluster.
- all nodes are up
- deployed 3.7.18 package to all nodes
- stopped 1 node
on startup:
09:50:29 sys-rabbit-defau:Waiting for RabbitMQ server to initialize...
2019-10-03 09:50:30.204 [info] <0.7.0> Log file opened with Lager
2019-10-03 09:50:34.499 [info] <0.7.0> Feature flags: list of feature flags found:
2019-10-03 09:50:34.499 [info] <0.7.0> Feature flags: feature flag states written to disk: yes
2019-10-03 09:50:34.775 [info] <0.42.0> Application mnesia exited with reason: stopped
2019-10-03 09:50:34.802 [error] <0.7.0>
Error description:
lists:foldl/3 line 1263
rabbit_mnesia:check_cluster_consistency/2 line 685
rabbit_mnesia:check_consistency/5 line 876
rabbit_mnesia:check_rabbit_consistency/2 line 945
rabbit_feature_flags:check_node_compatibility/2 line 1711
rabbit_feature_flags:exchange_feature_flags_from_unknown_apps/2 line 1901
rabbit_feature_flags:fetch_remote_feature_flags_from_apps_unknown_locally/2 line 1909
rabbit_feature_flags:query_remote_feature_flags/3 line 1845
error:{case_clause,feature_flags_file_not_set}
...
09:50:34 sys-rabbit:
09:50:34 sys-rabbit:BOOT FAILED
09:50:34 sys-rabbit:===========
09:50:34 sys-rabbit:
09:50:34 sys-rabbit:Error description:
09:50:34 sys-rabbit: lists:foldl/3 line 1263
09:50:34 sys-rabbit: rabbit_mnesia:check_cluster_consistency/2 line 685
09:50:34 sys-rabbit: rabbit_mnesia:check_consistency/5 line 876
09:50:34 sys-rabbit: rabbit_mnesia:check_rabbit_consistency/2 line 945
09:50:34 sys-rabbit: rabbit_feature_flags:check_node_compatibility/2 line 1711
09:50:34 sys-rabbit: rabbit_feature_flags:exchange_feature_flags_from_unknown_apps/2 line 1901
09:50:34 sys-rabbit: rabbit_feature_flags:fetch_remote_feature_flags_from_apps_unknown_locally/2 line 1909
09:50:34 sys-rabbit: rabbit_feature_flags:query_remote_feature_flags/3 line 1845
09:50:34 sys-rabbit:error:{case_clause,feature_flags_file_not_set}
...
09:50:35 sys-rabbit:{"init terminating in do_boot",{case_clause,feature_flags_file_not_set}}~
09:50:35 sys-rabbit:init terminating in do_boot ({case_clause,feature_flags_file_not_set})~
...
09:50:36 procmgr: sys-rabbit: end of task 57604 0.
So the 1st thing that jumped out is "feature_flags_file_not_set"
What the heck is the 'Feature Flags File'? Nothing about this in the upgrade doc.
We checked the source code and found in
rabbitmq-server a reference to RABBITMQ_FEATURE_FLAGS_FILE env var.
So in rabbitmq-env.conf, we set:
'export RABBITMQ_FEATURE_FLAGS_FILE="/FILE/PATH" '
and created an empty file "/FILE/PATH"
now we get error:
10:07:50 sys-rabbit-defau:Waiting for RabbitMQ server to initialize...
2019-10-03 10:07:50.858 [info] <0.7.0> Log file opened with Lager
2019-10-03 10:07:51.453 [error] <0.7.0>
Error description:
init:do_boot/3
init:start_em/1
rabbit:start_it/1 line 491
rabbit:'-boot/0-fun-0-'/0 line 338
rabbit_feature_flags:initialize_registry/1 line 781
rabbit_feature_flags:read_enabled_feature_flags_list/0 line 1285
rabbit_feature_flags:try_to_read_enabled_feature_flags_list/0 line 1299
error:{case_clause,{ok,[]}}
....
10:07:51 sys-rabbit:
10:07:51 sys-rabbit:BOOT FAILED
10:07:51 sys-rabbit:===========
10:07:51 sys-rabbit:
10:07:51 sys-rabbit:Error description:
10:07:51 sys-rabbit: init:do_boot/3
10:07:51 sys-rabbit: init:start_em/1
10:07:51 sys-rabbit: rabbit:start_it/1 line 491
10:07:51 sys-rabbit: rabbit:'-boot/0-fun-0-'/0 line 338
10:07:51 sys-rabbit: rabbit_feature_flags:initialize_registry/1 line 781
10:07:51 sys-rabbit: rabbit_feature_flags:read_enabled_feature_flags_list/0 line 1285
10:07:51 sys-rabbit: rabbit_feature_flags:try_to_read_enabled_feature_flags_list/0 line 1299
10:07:51 sys-rabbit:error:{case_clause,{ok,[]}}
...
10:07:52 sys-rabbit:{"init terminating in do_boot",{case_clause,{ok,[]}}}~
10:07:52 sys-rabbit:init terminating in do_boot ({case_clause,{ok,[]}})~
10:07:52 sys-rabbit:~
10:07:52 procmgr: sys-rabbit: end of task 84603 0.
Since there's nothing about the feature flag file in the docs, there no info about format or contents, more digging into the source.
%% If the file is missing, we consider the list of enabled
%% feature flags to be empty.
We delete the file, and get the same error as the 1st attempt:
10:19:21 sys-rabbit:Error description:
10:19:21 sys-rabbit: lists:foldl/3 line 1263
10:19:21 sys-rabbit: rabbit_mnesia:check_cluster_consistency/2 line 685
10:19:21 sys-rabbit: rabbit_mnesia:check_consistency/5 line 876
10:19:21 sys-rabbit: rabbit_mnesia:check_rabbit_consistency/2 line 945
10:19:21 sys-rabbit: rabbit_feature_flags:check_node_compatibility/2 line 1711
10:19:21 sys-rabbit: rabbit_feature_flags:exchange_feature_flags_from_unknown_apps/2 line 1901
10:19:21 sys-rabbit: rabbit_feature_flags:fetch_remote_feature_flags_from_apps_unknown_locally/2 line 1909
10:19:21 sys-rabbit: rabbit_feature_flags:query_remote_feature_flags/3 line 1845
10:19:21 sys-rabbit:error:{case_clause,feature_flags_file_not_set}
...
10:19:21 sys-rabbit:
10:19:22 sys-rabbit:{"init terminating in do_boot",{case_clause,feature_flags_file_not_set}}~
10:19:22 sys-rabbit:init terminating in do_boot ({case_clause,feature_flags_file_not_set})~
10:19:22 sys-rabbit:~
But this time we noticed "query_remote_feature_flags" hmmm interesting.
So now we run "rabbitmqctl eval 'application:set_env(rabbit, feature_flags_file, ["/FILE/PATH").'" on the other three nodes.
"/FILE/PATH" obviously a substitute for the actual path.
$ rabbitmqctl eval 'rpc:multicall(application, get_env, [rabbit, feature_flags_file]).'
{[{ok,'/FILE/PATH'},
{ok,'/FILE/PATH'},
{ok,'/FILE/PATH'}],
[]}
At this point, once we've set 'feature_flags_file' location in the running VM on the other nodes, and have set the export in our local rabbitmq-env.conf, (but no actual file exists), now the 1st node starts:
10:34:36 sys-rabbit: ## ##
10:34:36 sys-rabbit: ## ## RabbitMQ 3.7.18. Copyright (C) 2007-2019 Pivotal Software, Inc.
10:34:36 sys-rabbit: ###### ##
10:34:36 sys-rabbit: ########## Logs: REDACTED
10:34:36 sys-rabbit: REDACTED
10:34:36 sys-rabbit:
10:34:36 sys-rabbit: Starting broker...
We can easily recreate this. Please let us know what other details might be needed to help address the issue.