I am trying to figure out when a galera node is able to serve requests so the load balancer can direct traffic to it. Right now I am using a check to get the galera status number and if it is 4 (SYNCED) I consider it able to serve requests. The issue I am having is when I run the daily backup using xtrabackup. In the last 5 mins of the backup when everything gets locked and the transaction log is streamed it seems the other nodes do not serve requests either although there is nothing logged about it and if they are affected by locked node.
On Wed, Oct 10, 2012 at 11:30 PM, Ilias Bertsimas <award...@gmail.com> wrote:
> Hello,
> I am trying to figure out when a galera node is able to serve requests so
> the load balancer can direct traffic to it.
> Right now I am using a check to get the galera status number and if it is 4
> (SYNCED) I consider it able to serve requests.
This is an ok way to do it.
Percona XtraDB Cluster FAQ provides a different approach, both are good:
In theory the approach recommended by Percona is arguably better,
because then you'd be checking the thing you actually care about:
whether you can access tables in InnoDB. In theory you could have a
situation where the node claims to be synced to the cluster but is
malfunctioning in other ways, like due to an InnoDB bug or such. In
practice if that where the case, this would also affect Galera (if it
couldn't write to InnoDB tables anymore) and the cluster would quickly
become not-synced anyway.
I hope the above makes sense and I didn't confuse anyone :-)
> The issue I am having is when I run the daily backup using xtrabackup. In
> the last 5 mins of the backup when everything gets locked and the
> transaction log is streamed it seems the other nodes do not serve requests
> either although there is nothing logged about it and if they are affected by
> locked node.
This sounds like something is not quite right. Xtrabackup should not
lock for that long.
Could you provide a bit more detail around this? Like a sequence of
steps you do and what symptoms do you see. (If this might be an
xtrabackup issue, of course percona-discuss mailing list could be a
better place to ask.)
henrik
-- henrik.i...@avoinelama.fi
+358-40-8211286 skype: henrik.ingo irc: hingo
www.openlife.cc
I have a 3 nodes galera cluster and I run innobackupex --galera-info --tmpdir=/shared/backup/galera-data/tmp -user $SQL_USER --password=$SQL_PASSWORD --stream=tar /shared/backup/galera-data/tmp | gzip - > ${BACKUP_DIR}/${THIS_SERVER}_bdd_all.tar.gz On one of them. Although the load balancer is not even using actively that node (it is a failover) the other ones seem to stall serving requests for about 2 mins when the backup is in the last phase of locking and flushing to disk all tables. All the nodes are shown up on the load balancer for the whole procedure even in the end when it locks until it streams the transaction log which is quite big as we have a heavy write workload.
On Wednesday, October 10, 2012 9:54:56 PM UTC+1, Henrik Ingo wrote:
> On Wed, Oct 10, 2012 at 11:30 PM, Ilias Bertsimas <awar...@gmail.com<javascript:>> > wrote: > > Hello,
> > I am trying to figure out when a galera node is able to serve requests > so > > the load balancer can direct traffic to it. > > Right now I am using a check to get the galera status number and if it > is 4 > > (SYNCED) I consider it able to serve requests.
> This is an ok way to do it.
> Percona XtraDB Cluster FAQ provides a different approach, both are good:
> In theory the approach recommended by Percona is arguably better, > because then you'd be checking the thing you actually care about: > whether you can access tables in InnoDB. In theory you could have a > situation where the node claims to be synced to the cluster but is > malfunctioning in other ways, like due to an InnoDB bug or such. In > practice if that where the case, this would also affect Galera (if it > couldn't write to InnoDB tables anymore) and the cluster would quickly > become not-synced anyway.
> I hope the above makes sense and I didn't confuse anyone :-)
> > The issue I am having is when I run the daily backup using xtrabackup. > In > > the last 5 mins of the backup when everything gets locked and the > > transaction log is streamed it seems the other nodes do not serve > requests > > either although there is nothing logged about it and if they are > affected by > > locked node.
> This sounds like something is not quite right. Xtrabackup should not > lock for that long.
> Could you provide a bit more detail around this? Like a sequence of > steps you do and what symptoms do you see. (If this might be an > xtrabackup issue, of course percona-discuss mailing list could be a > better place to ask.)
Do you have some MyISAM tables then? Xtrabackup is of course only
non-blocking for InnoDB tables?
You can check this with
SHOW CREATE TABLE [table]
on specific tables, or
mysqldump --no-data --all-databases
for all tables.
That the whole cluster blocks when there is a lock is actually correct
behavior (as far as I can tell). If you issue a LOCK TABLE then that
is what happens. Since the replication is synchronous, all nodes will
wait for the lock to be released.
On Thu, Oct 11, 2012 at 12:05 AM, Ilias Bertsimas <award...@gmail.com> wrote:
> Hi Henrik,
> What you mentioned is pretty clear to me.
> I have a 3 nodes galera cluster and I run innobackupex --galera-info
> --tmpdir=/shared/backup/galera-data/tmp -user $SQL_USER
> --password=$SQL_PASSWORD --stream=tar /shared/backup/galera-data/tmp | gzip
> - > ${BACKUP_DIR}/${THIS_SERVER}_bdd_all.tar.gz
> On one of them. Although the load balancer is not even using actively that
> node (it is a failover) the other ones seem to stall serving requests for
> about 2 mins when the backup is in the last phase of locking and flushing to
> disk all tables. All the nodes are shown up on the load balancer for the
> whole procedure even in the end when it locks until it streams the
> transaction log which is quite big as we have a heavy write workload.
> Kind Regards,
> Ilias.
> On Wednesday, October 10, 2012 9:54:56 PM UTC+1, Henrik Ingo wrote:
>> On Wed, Oct 10, 2012 at 11:30 PM, Ilias Bertsimas <awar...@gmail.com>
>> wrote:
>> > Hello,
>> > I am trying to figure out when a galera node is able to serve requests
>> > so
>> > the load balancer can direct traffic to it.
>> > Right now I am using a check to get the galera status number and if it
>> > is 4
>> > (SYNCED) I consider it able to serve requests.
>> This is an ok way to do it.
>> Percona XtraDB Cluster FAQ provides a different approach, both are good:
>> In theory the approach recommended by Percona is arguably better,
>> because then you'd be checking the thing you actually care about:
>> whether you can access tables in InnoDB. In theory you could have a
>> situation where the node claims to be synced to the cluster but is
>> malfunctioning in other ways, like due to an InnoDB bug or such. In
>> practice if that where the case, this would also affect Galera (if it
>> couldn't write to InnoDB tables anymore) and the cluster would quickly
>> become not-synced anyway.
>> I hope the above makes sense and I didn't confuse anyone :-)
>> > The issue I am having is when I run the daily backup using xtrabackup.
>> > In
>> > the last 5 mins of the backup when everything gets locked and the
>> > transaction log is streamed it seems the other nodes do not serve
>> > requests
>> > either although there is nothing logged about it and if they are
>> > affected by
>> > locked node.
>> This sounds like something is not quite right. Xtrabackup should not
>> lock for that long.
>> Could you provide a bit more detail around this? Like a sequence of
>> steps you do and what symptoms do you see. (If this might be an
>> xtrabackup issue, of course percona-discuss mailing list could be a
>> better place to ask.)
> I am trying to figure out when a galera node is able to serve > requests so
> the load balancer can direct traffic to it.
> Right now I am using a check to get the galera status number and if > it is 4
> (SYNCED) I consider it able to serve requests.
> The issue I am having is when I run the daily backup using > xtrabackup. In
> the last 5 mins of the backup when everything gets locked and the
> transaction log is streamed it seems the other nodes do not serve > requests
> either although there is nothing logged about it and if they are > affected
> by locked node.
Well, if the node is SYNCED, then it is... right, synced. And if you block it - it blocks the whole cluster indeed. (And yes, it appears xtrabackup can block for quite some time).
2) on the backuped node set wsrep_provider_options="gcs.fc_limit=1M" - to relax flow control on this node. But it can get ugly in some other ways, so not recommended.
Regards,
Alex
> Kind Regards,
> Ilias.
-- Alexey Yurchenko,
Codership Oy, www.codership.com Skype: alexey.yurchenko, Phone: +358-400-516-011
There are no MyISAM tables in the DB and seems what Alexey is talking about is what actually seems to be happening. Are there any scripts available for doing a backup like an sst transfer ?
On Thursday, October 11, 2012 7:16:43 AM UTC+1, Alexey Yurchenko wrote:
> On 2012-10-10 23:30, Ilias Bertsimas wrote: > > Hello,
> > I am trying to figure out when a galera node is able to serve > > requests so > > the load balancer can direct traffic to it. > > Right now I am using a check to get the galera status number and if > > it is 4 > > (SYNCED) I consider it able to serve requests. > > The issue I am having is when I run the daily backup using > > xtrabackup. In > > the last 5 mins of the backup when everything gets locked and the > > transaction log is streamed it seems the other nodes do not serve > > requests > > either although there is nothing logged about it and if they are > > affected > > by locked node.
> Well, if the node is SYNCED, then it is... right, synced. And if you > block it - it blocks the whole cluster indeed. (And yes, it appears > xtrabackup can block for quite some time).
> 2) on the backuped node set wsrep_provider_options="gcs.fc_limit=1M" - > to relax flow control on this node. But it can get ugly in some other > ways, so not recommended.
I am also considering using the --no-lock option on xtrabackup as stated in the config it will skip the flush tables with read lock at the end but it will not get you an accurate binlog position of the backup which I do not need anyways. Are there any dangers with that option combined with galera ?
On Thursday, October 11, 2012 10:33:19 AM UTC+1, Ilias Bertsimas wrote:
> There are no MyISAM tables in the DB and seems what Alexey is talking > about is what actually seems to be happening. > Are there any scripts available for doing a backup like an sst transfer ?
> On Thursday, October 11, 2012 7:16:43 AM UTC+1, Alexey Yurchenko wrote:
>> On 2012-10-10 23:30, Ilias Bertsimas wrote: >> > Hello,
>> > I am trying to figure out when a galera node is able to serve >> > requests so >> > the load balancer can direct traffic to it. >> > Right now I am using a check to get the galera status number and if >> > it is 4 >> > (SYNCED) I consider it able to serve requests. >> > The issue I am having is when I run the daily backup using >> > xtrabackup. In >> > the last 5 mins of the backup when everything gets locked and the >> > transaction log is streamed it seems the other nodes do not serve >> > requests >> > either although there is nothing logged about it and if they are >> > affected >> > by locked node.
>> Well, if the node is SYNCED, then it is... right, synced. And if you >> block it - it blocks the whole cluster indeed. (And yes, it appears >> xtrabackup can block for quite some time).
>> 2) on the backuped node set wsrep_provider_options="gcs.fc_limit=1M" - >> to relax flow control on this node. But it can get ugly in some other >> ways, so not recommended.
Probably you mainly plan to use your backup in case of disaster, when
you would have lost all data from all nodes in the cluster. In this
case you don't need the galera-info. (It is only useful if you'd want
to manually provision a node from a backup.)
On Thu, Oct 11, 2012 at 12:39 PM, Ilias Bertsimas <award...@gmail.com> wrote:
> I am also considering using the --no-lock option on xtrabackup as stated in
> the config it will skip the flush tables with read lock at the end but it
> will not get you an accurate binlog position of the backup which I do not
> need anyways.
> Are there any dangers with that option combined with galera ?
> Thanks!
> On Thursday, October 11, 2012 10:33:19 AM UTC+1, Ilias Bertsimas wrote:
>> There are no MyISAM tables in the DB and seems what Alexey is talking
>> about is what actually seems to be happening.
>> Are there any scripts available for doing a backup like an sst transfer ?
>> On Thursday, October 11, 2012 7:16:43 AM UTC+1, Alexey Yurchenko wrote:
>>> On 2012-10-10 23:30, Ilias Bertsimas wrote:
>>> > Hello,
>>> > I am trying to figure out when a galera node is able to serve
>>> > requests so
>>> > the load balancer can direct traffic to it.
>>> > Right now I am using a check to get the galera status number and if
>>> > it is 4
>>> > (SYNCED) I consider it able to serve requests.
>>> > The issue I am having is when I run the daily backup using
>>> > xtrabackup. In
>>> > the last 5 mins of the backup when everything gets locked and the
>>> > transaction log is streamed it seems the other nodes do not serve
>>> > requests
>>> > either although there is nothing logged about it and if they are
>>> > affected
>>> > by locked node.
>>> Well, if the node is SYNCED, then it is... right, synced. And if you
>>> block it - it blocks the whole cluster indeed. (And yes, it appears
>>> xtrabackup can block for quite some time).
>>> 2) on the backuped node set wsrep_provider_options="gcs.fc_limit=1M" -
>>> to relax flow control on this node. But it can get ugly in some other
>>> ways, so not recommended.
Your assumption is correct. I do not need the --galera-info either. If I need to setup a new node I take a snapshot from another live one or let it do it automatically with a full SST.
> Probably you mainly plan to use your backup in case of disaster, when > you would have lost all data from all nodes in the cluster. In this > case you don't need the galera-info. (It is only useful if you'd want > to manually provision a node from a backup.)
> henrik
> On Thu, Oct 11, 2012 at 12:39 PM, Ilias Bertsimas <awar...@gmail.com<javascript:>> > wrote: > > I am also considering using the --no-lock option on xtrabackup as stated > in > > the config it will skip the flush tables with read lock at the end but > it > > will not get you an accurate binlog position of the backup which I do > not > > need anyways. > > Are there any dangers with that option combined with galera ?
> > Thanks!
> > On Thursday, October 11, 2012 10:33:19 AM UTC+1, Ilias Bertsimas wrote:
> >> There are no MyISAM tables in the DB and seems what Alexey is talking > >> about is what actually seems to be happening. > >> Are there any scripts available for doing a backup like an sst transfer > ?
> >> On Thursday, October 11, 2012 7:16:43 AM UTC+1, Alexey Yurchenko wrote:
> >>> > I am trying to figure out when a galera node is able to serve > >>> > requests so > >>> > the load balancer can direct traffic to it. > >>> > Right now I am using a check to get the galera status number and if > >>> > it is 4 > >>> > (SYNCED) I consider it able to serve requests. > >>> > The issue I am having is when I run the daily backup using > >>> > xtrabackup. In > >>> > the last 5 mins of the backup when everything gets locked and the > >>> > transaction log is streamed it seems the other nodes do not serve > >>> > requests > >>> > either although there is nothing logged about it and if they are > >>> > affected > >>> > by locked node.
> >>> Well, if the node is SYNCED, then it is... right, synced. And if you > >>> block it - it blocks the whole cluster indeed. (And yes, it appears > >>> xtrabackup can block for quite some time).
> >>> 2) on the backuped node set wsrep_provider_options="gcs.fc_limit=1M" - > >>> to relax flow control on this node. But it can get ugly in some other > >>> ways, so not recommended.