Yes sorry, this is MariaDB 10.0.11 on ubuntu 14.04. The slave connects to the keepalived VIP and the haproxy handles which node the connection is directed to.So running CHANGE MASTER TO MASTER_AUTO_POSITION=1 on the slave doesn't seem to work. Maybe due to it not being compatible on Mariadb 10.0.11? The slave reports the following on show slave status.
However, after running for a little while it does its next connection to the master (i noticed this by seeing a change in the Master_server_id value which was initally set to 1, 2 and 3 on each node, but was later change to 1 on all nodes per suggestions). When it connects again, haproxy redirects it to the next node.
Hope this gives some more information on the setup and maybe som ideas on how I could go about solving this. Any ideas are more than welcome.
Just in case, do you have log_slave_updates=ON ?
The problem comes when we want to do a standard master/slave replication from the cluster to an external slave. The slave is set up to connect to the VIP (or I have also been testing with connecting directly to the haproxied ip), and Using_gtid is set to Slave_pos.
However, after some time, once the connection changes to a different node through haproxy, the following error occurs:
Got fatal error 1236 from master when reading data from binary log: 'Error: connecting slave requested to start from GTID 3-1-422, which is not in the master's binlog'
And the Slave_IO_State shows that it's no longer in sync.
I have run SELECT @@GLOBAL.gtid_slave_pos; to check what the current GTID for each node is, and they all return: 1-1-2145, however, sometimes if I add a lot of data, that value is different on some nodes, which is why I think the slave gets confused.
On the slave, when activating using_gtid=slave_pos, the following gtid_IO_pos appear: 1-1-2464,2-3-420,3-1-422
From what I have read, this should be somewhat correct, as the first value is the server id. However, in the config I have specified that node 1 has server id 1, node 2 has id 2 and so on, and that the same goes for gtid_domain_id. Is this the correct setup or do the nodes need to have the same server-id or gtid_domain_id?
Surely there must be a good way to solve this? Is the system not built to handle an asynchronous slave replicating from one random node?
Hope to hear from someone soon.