Hi,
It is very unlikely that this is an issue with a replication layer (i.e.
WAN/segments). If it were the case it would have been widely unusable
since Galera does not know what it really replicates so you would have
such behaviour in every cluster out there.
Much more likely it is a genuine DB inconsistency. If there are any
foreign keys involved, you're advised to upgrade to the latest release.
Regards,
Alex
On 2017-10-16 12:27, Jorge Oliveira wrote:
> We have a 5 nodes cluster: 2 datacenters with 2 nodes each and 1
> datacenter
> with only 1 node.
>
> Sometimes an insert in a node would take down all nodes (except itself)
> with a "Node consistency compromized", bringing down the cluster.
> Looking at the log we could see the error:
>
> [ERROR] Slave SQL: Could not execute Write_rows event on table
> main.documents; Duplicate entry '3729-3882600-01P2017-17040' for key
> 'sequence', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the
> event's master log FIRST, end_log_pos 455, Error_code: 1062
>
> It seemed like nodes were accepting the same statement more than once
> (from
> different nodes) in that ms between certification and commit, and when
> applying the transaction by the second, it would raise the error.
No. It is just that for some reason the master node for that transaction
didn't have that row while the rest of the nodes did.