What does it mean for a node to be "blacklisted"? (MariaDB Galera Cluster)

2,250 views
Skip to first unread message

rma...@pivotal.io

unread,
Jan 30, 2015, 7:55:35 PM1/30/15
to codersh...@googlegroups.com, cf-core-services-eng
Hello,

We are trying to understand how a 3-node MariaDB Galera cluster got into a bad state, where all nodes are non-primary and have marked their peers as partitioned. We know how to recover the cluster by bootstrapping, but would like to know how this came about.

In the logs, we see many messages like

 150131  0:45:59 [Note] WSREP: (801845e4-a8e2-11e4-b6f1-ba02aca81e90, 'tcp://0.0.0.0:4567') address 'tcp://10.85.15.107:4567' pointing to uuid 801845e4-a8e2-11e4-b6f1-ba02aca81e90 is blacklisted, skipping 

This is sometimes seen during a healthy state and sometimes is repeated for 100's of lines when we are in a bad state. Seems like nodes occasionally blacklist themselves, or other cluster members. There are other messages related to potential networking issues, though we are not aware of any persistent connectivity issues in our environment.

For example: 

[ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
at gcomm/src/pc.cpp:connect():141

and

150130 22:45:30 [Note] WSREP: (a7489639-97b5-11e4-b171-020823f89106, 'tcp://0.0.0.0:4567') reconnecting to 9e803b0f-a8d1-11e4-a798-d2a8cb4a589c (tcp://10.85.15.108:4567), attempt 0

Can someone explain how "blacklisting" would happen? Overall, trying to learn whether this is a significant part of the failure, or something we can overlook for now. 

Thanks,

Raina Masand
Cloud Foundry Services Team

Mohd Zainal Abidin

unread,
Jan 30, 2015, 7:59:26 PM1/30/15
to rma...@pivotal.io, cf-core-services-eng, codersh...@googlegroups.com

I have same issue just like you.

--
You received this message because you are subscribed to the Google Groups "codership" group.
To unsubscribe from this group and stop receiving emails from it, send an email to codership-tea...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Elijah

unread,
Jan 30, 2015, 8:07:10 PM1/30/15
to codersh...@googlegroups.com, cf-core-se...@pivotal.io
I had the same issue with a 3-node Percona cluster.

I read that in a healthy state a node 'blacklisting' itself is normal and related to the node seeing itself and excluding himself from communication negotiations.

But I'd also like to know why 'blacklisting' of other node members occurs.

alexey.y...@galeracluster.com

unread,
Feb 1, 2015, 1:47:45 PM2/1/15
to Elijah, codersh...@googlegroups.com, cf-core-se...@pivotal.io
At the moment the node should only blacklist own address (when it is
given as part of the wsrep_cluster_address option). Are you sure you see
blacklisting of other nodes?

Elijah

unread,
Feb 2, 2015, 10:17:44 AM2/2/15
to codersh...@googlegroups.com, elija...@gmail.com, cf-core-se...@pivotal.io
Just had another look at the logs. My apologies you are right. It only blacklists itself. (it was late)

Raina Masand

unread,
Feb 2, 2015, 11:57:27 AM2/2/15
to Elijah, codersh...@googlegroups.com, cf-core-se...@pivotal.io, elija...@gmail.com
In this particular case, it was only blacklisting itself. Was wondering what this meant and whether it was a state that could be applied to other nodes. To clarify: the "blacklisted" state exists specifically to stop a node from connecting to itself as a cluster member, so there is no way that a node would blacklist others. If so, sounds like we can ignore those logs and dig deeper to explain our cluster's state.

Thanks,
Raina
Cloud Foundry Services


Sent from Mailbox
Reply all
Reply to author
Forward
0 new messages