Which is more correct to use in order to determine the health of a
Galera node.
wsrep_local_state_comment : Synced (6)
or
wsrep_ready : ON
Thanks
The answer is, when checking for error scenarios, you should check for
the very thing you are interested to know the answer to. So in this
case you want to know if Galera is in a state that your application
can read and write data to tables. Your check should be simply "SELECT
* FROM someinnodbtable WHERE id=1;"
3 different results are possible:
- You get the row with id=1 (node is healthy)
- Unknown error (node is online but Galera is not connected/synced
with the cluster)
- Connection error (node is not online)
It's simple as that. Trying to poll anything else than what you are
actually interested in is a failure waiting to happen. As an example,
the semantics of wsrep_local_state_comment changed in 2.0 version so
something that worked with 1.0 version wouldn't work anymore. The
above test always works because it tests the very thing you want to
know.
henrik
> --
> You received this message because you are subscribed to the Google Groups "codership" group.
> To post to this group, send email to codersh...@googlegroups.com.
> To unsubscribe from this group, send email to codership-tea...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/codership-team?hl=en.
>
--
henri...@avoinelama.fi
+358-40-8211286 skype: henrik.ingo irc: hingo
www.openlife.cc
My LinkedIn profile: http://www.linkedin.com/profile/view?id=9522559
In a sense Henrik is right: cluster partition can happen any time, so
even if your previous SHOW STATUS command showed wsrep_ready = ON, your
next query can easily return 'Unknown command'.
However, until we come out with an unambiguous error code, the 'Unknown
command' may really be an unknown command. So it makes sense to check
the wsrep status, when your application gets such error.
Another situation when you may actually look for the wsrep_local_state
value is polling for load balancer. Anything but 4 is a reason to divert
connections to another node. The thing is that donor node will still
have wsrep_ready = ON, but since it may be blocked and is anyways busy
with state transfer and has deficit of IO, it may be a good idea not to
send new connections to it.
Henrik, what exactly do you mean by
> As an example, the semantics of wsrep_local_state_comment changed in
> 2.0 version so something that worked with 1.0 version wouldn't work
> anymore.
? I'm not aware of anything like that. It must be a bug then.
Regards,
Alex
--
Alexey Yurchenko,
Codership Oy, www.codership.com
Skype: alexey.yurchenko, Phone: +358-400-516-011
The test wasn't done by me personally, and it was agains XtraDB
Cluster, but it seems to me that previously the "Donor" state would
mean that the node is blocked, whereas in 2.0 if you use xtrabackup it
is actually available and also when the node is donor to an IST
operation it is fully available. Hence the Donor status of this
variable doesn't tell us whether the node is usable or not.
It's a good example of why a load balancer or application should
handle the actual errors, not check some proxy value. If there is an
error, there is an error.
Your point that such checks also inherently include a race condition
is good too, although such designs typically allow that transactions
will be lost for some amount of seconds until failover happens. (Ie it
is a master-slave or primary-secondary paradigm, imposing limitations
Galera has not.)
henrik