--
You received this message because you are subscribed to a topic in the Google Groups "rabbitmq-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/rabbitmq-users/GaUrjI0-uP8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/732332e0-b75c-4179-8ff2-a0a1a0de15b5%40googlegroups.com.
Two or more nodes in your cluster use different schema database identifiers (this cannot be configured).
I don’t know how you can end up with this situation but most likely between node restarts something about
the node data directory has changed.
[1] is highly relevant as it demonstrates largely the same fundamental problem.
I’d also check that your nodes do not use the same **shared data directory** on NFS as that is not supported
In any way and is a guaranteed way to get data corruption as nodes do not assume that their data directory
can be shared with other nodes.
--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
rabbitmq-user...@googlegroups.com.
Schema database identity (Mnesia cookie, not to be confused with the Erlang cookie) does not depend on node name
directly. But since data directory location includes node name by default (you can override it not to), a hostname change
can lead a node to start with a *blank* database which will be initialized from scratch and will have another Mnesia cookie.
If such node then discovers its peer and tries to cluster with it, they will have this conflict. However, nodes that start
with a blank database are no different to reset ones. So they will simply sync schema from a peer and “adopt” the existing schema
database identity.
So something else must be going on, or at least something similar but more nuanced.
To view this discussion on the web, visit
https://groups.google.com/d/msgid/rabbitmq-users/6111EC03-56EC-4615-A9CF-599942DFEEEA%40vmware.com.
Schema database identity (Mnesia cookie, not to be confused with the Erlang cookie) does not depend on node name
directly. But since data directory location includes node name by default (you can override it not to), a hostname change
can lead a node to start with a *blank* database which will be initialized from scratch and will have another Mnesia cookie.
If such node then discovers its peer and tries to cluster with it, they will have this conflict. However, nodes that start
with a blank database are no different to reset ones. So they will simply sync schema from a peer and “adopt” the existing schema
database identity.
So something else must be going on, or at least something similar but more nuanced.
On 29.05.2020, 20:51, rabbit...@googlegroups.com on behalf of Michael Klishin wrote:
Two or more nodes in your cluster use different schema database identifiers (this cannot be configured).
I don’t know how you can end up with this situation but most likely between node restarts something about
the node data directory has changed.
[1] is highly relevant as it demonstrates largely the same fundamental problem.
I’d also check that your nodes do not use the same **shared data directory** on NFS as that is not supported
In any way and is a guaranteed way to get data corruption as nodes do not assume that their data directory
can be shared with other nodes.
On 29.05.2020, 18:26, rabbit...@googlegroups.com on behalf of Rasmus Larsson wrote:
With regards to this thread: https://groups.google.com/forum/#!topic/rabbitmq-users/GzdT9jepH9A
We are getting the following fatal errors when restarting a clustered node:
2020-05-29 12:51:49.697 [error] <0.200.0> Mnesia('rabbit@staging-rabbitmq-node1.example.private'): ** ERROR ** mnesia_event got {inconsistent_database, starting_partitioned_network, 'rabbit@staging-rabbitmq-node2.example.private'}
2020-05-29 12:51:49.697 [error] <0.200.0> Mnesia('rabbit@staging-rabbitmq-node1.example.private'): ** ERROR ** mnesia_event got {inconsistent_database, starting_partitioned_network, 'rabbit@staging-rabbitmq-node3.example.private'}
2020-05-29 12:51:49.905 [error] <0.200.0> Mnesia('rabbit@staging-rabbitmq-node1.example.private'): ** ERROR ** (core dumped to file: "/MnesiaCore.rabbit@staging-rabbitmq-node1.example.private_1590_756709_903879")
** FATAL ** Failed to merge schema: Bad cookie in table definition rabbit_runtime_parameters: 'rabbit@staging-rabbitmq-node1.example.private' = ...
Thank you for reporting back to the list!
2020-05-29 12:51:49.697 [error] <0.200.0> Mnesia('rab...@staging-rabbitmq-node1.example.private'): ** ERROR ** mnesia_event got {inconsistent_database, starting_partitioned_network, 'rab...@staging-rabbitmq-node2.example.private'}
2020-05-29 12:51:49.697 [error] <0.200.0> Mnesia('rab...@staging-rabbitmq-node1.example.private'): ** ERROR ** mnesia_event got {inconsistent_database, starting_partitioned_network, 'rab...@staging-rabbitmq-node3.example.private'}
2020-05-29 12:51:49.905 [error] <0.200.0> Mnesia('rab...@staging-rabbitmq-node1.example.private'): ** ERROR ** (core dumped to file: "/MnesiaCo...@staging-rabbitmq-node1.example.private_1590_756709_903879")
** FATAL ** Failed to merge schema: Bad cookie in table definition rabbit_runtime_parameters: 'rab...@staging-rabbitmq-node1.example.private' = ...
A bit of background:
- Our nodes are launched on Fargate, so hostnames and IPs are dynamic. At startup we write a simple host alias with localhost pointing to the logical node name. The logical node name is also added to AWS service discovery which in turn sets up an A record for the node so that other nodes in the cluster can resolve the name. This part works.
- The instances persist their data via NFSv4 (AWS EFS mounted as /var/lib/rabbitmq), and successfully join the cluster on initial boot.
However, as mentioned before, when restarting a node it reads its state from EFS and then things fail. rm -rf /var/lib/rabbitmq/* during init (basically a node reset) will allow the node to rejoin successfully.
In this thread Michael talks about an "identity" (not the erlang cookie): https://groups.google.com/d/msg/rabbitmq-users/DtofWINXppc/3wltfYntCwAJ
Is this identity affected by the "cluster name" and "internal cluster id"? Since these are set at startup dynamically when the cluster is "new", will explicitly setting these possibly help? And if not, is there anything else we can try?
Any help appreciated!
/Rasmus
--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit
https://groups.google.com/d/msgid/rabbitmq-users/b027f2a6-6613-46e6-985a-f6c421ef642a%40googlegroups.com.