This is a new maintenance RC in 2.x series which fixes three important bugs: * causal reads guarantee violation at cluster partitioning. * exclusive keys didn't produce dependency on shared keys. * IST was not working across different EC2 accessibility zones (resolved global DNS names to local IPs).
It also adds two interesting features: * wsrep_incoming_addresses status variable contains a comma-separated list of incoming (client) addresses in the cluster component. This can be used for example for polling by load balancer. * ability to specify several node addresses in gcomm:// URL. This is a successor to "infamous" wsrep_urls parameter for mysqld_safe and is a preferred way to provide several node addresses to Galera. (BEWARE! wsrep_urls takes precedence so make sure to eliminate it from my.cnf)
Since this release contains two new features, we decided to make a release candidate first to gain some feedback. Provided there are no complains this code will be promoted to final release in a couple of weeks.
> Since this release contains two new features, we decided to make a > release
> candidate first to gain some feedback. Provided there are no > complains this
> code will be promoted to final release in a couple of weeks.
So I lied! So what? Sue me?
While we were waiting for complaints about 2.2 RC1 (got none actually) we managed to fix another bug and achieve serious improvements in performance and memory footprint for some workloads. We didn't want it go to waste, so here's RC2, the last one, honest... pinky swear!
alexey.yurche...@codership.com> wrote:
> On 2012-09-29 23:03, Alexey Yurchenko wrote:
>> Since this release contains two new features, we decided to make a release
>> candidate first to gain some feedback. Provided there are no complains
>> this
>> code will be promoted to final release in a couple of weeks.
> So I lied! So what? Sue me?
> While we were waiting for complaints about 2.2 RC1 (got none actually) we
> managed to fix another bug and achieve serious improvements in performance
> and memory footprint for some workloads. We didn't want it go to waste, so
> here's RC2, the last one, honest... pinky swear!
> So does this release support an IST after crash ?
Vadim,
It all depends on mysqld. All galera needs is a seqno interval, once it gets it from mysqld, it will do IST. So there are no changes regarding IST here (except insignificant bug fixed).
Current mysql-wsrep LP head contains a version of mysqld_safe that first tries to recover seqno (this is done by making a dummy run of mysqld with --wsrep-recover option) and only then starts mysqld for real. So no voodoo there (but takes 1 minute longer to restart).
> On Sat, Oct 13, 2012 at 2:06 AM, Alex Yurchenko <
> alexey.yurche...@codership.com> wrote:
>> On 2012-09-29 23:03, Alexey Yurchenko wrote:
>>> Since this release contains two new features, we decided to make a >>> release
>>> candidate first to gain some feedback. Provided there are no >>> complains
>>> this
>>> code will be promoted to final release in a couple of weeks.
>> So I lied! So what? Sue me?
>> While we were waiting for complaints about 2.2 RC1 (got none >> actually) we
>> managed to fix another bug and achieve serious improvements in >> performance
>> and memory footprint for some workloads. We didn't want it go to >> waste, so
>> here's RC2, the last one, honest... pinky swear!
We have four nodes, in a full-stack cluster. And when we execute drop table on a big table, old data not being used anymore, then thread-count in mysql and process-count in apache goes up, it's like the existing processes takes longer time to perform, finally some http sessions are timing out.
We have 50 concurrent users, and four server each with 32GB of memory and 6x146GB 10.000 RPM i HP Proliant DL380 G5 with 2xQuadCore 2.5 Ghz Xeon ... and we run aprox. 8 Typo3 sites on this setup.
50 users and a mysql drop table shouldnt be a problem, _at all_ for this setup, or am i wrong? What to look for and where?
On Tue, Oct 16, 2012 at 3:28 PM, Mikkel Christensen <mik...@mikjaer.com> wrote:
> We have four nodes, in a full-stack cluster. And when we execute drop table
> on a big table, old data not being used anymore, then thread-count in mysql
> and process-count in apache goes up, it's like the existing processes takes
> longer time to perform, finally some http sessions are timing out.
> We have 50 concurrent users, and four server each with 32GB of memory and
> 6x146GB 10.000 RPM i HP Proliant DL380 G5 with 2xQuadCore 2.5 Ghz Xeon ...
> and we run aprox. 8 Typo3 sites on this setup.
> 50 users and a mysql drop table shouldnt be a problem, _at all_ for this
> setup, or am i wrong?
I agree.
> What to look for and where?
Check SHOW VARIABLES LIKE "wsrep_OSU_method"
If it is 'TOI' then what you see is by design: During any DDL the
cluster is locked. Even if for DROP TABLE of an unused table this is
not necessary, Galera cannot know that the table is unused.
To do DDL in a non-blocking way, you need to set wsrep_OSU_method="RSU"
the first to "drop table" we ran, caused heavy load, but the following 8-10 did'nt cause anything, what can cause it to lock up? I.e. what do we need to look for before deleting next time?
The table's werent used at all, so it hardly qualifies s a "schema upgrade" :-)
> On Tue, Oct 16, 2012 at 3:28 PM, Mikkel Christensen <mik...@mikjaer.com> wrote:
>> We have four nodes, in a full-stack cluster. And when we execute drop table
>> on a big table, old data not being used anymore, then thread-count in mysql
>> and process-count in apache goes up, it's like the existing processes takes
>> longer time to perform, finally some http sessions are timing out.
>> We have 50 concurrent users, and four server each with 32GB of memory and
>> 6x146GB 10.000 RPM i HP Proliant DL380 G5 with 2xQuadCore 2.5 Ghz Xeon ...
>> and we run aprox. 8 Typo3 sites on this setup.
>> 50 users and a mysql drop table shouldnt be a problem, _at all_ for this
>> setup, or am i wrong?
> I agree.
>> What to look for and where?
> Check SHOW VARIABLES LIKE "wsrep_OSU_method"
> If it is 'TOI' then what you see is by design: During any DDL the
> cluster is locked. Even if for DROP TABLE of an unused table this is
> not necessary, Galera cannot know that the table is unused.
> To do DDL in a non-blocking way, you need to set wsrep_OSU_method="RSU"
-- De bedste hilsener / Best Regards
Mikkel Christensen
Mikjaer ApS
Silkeborgvej 128
8000 rhus C
Tlf. +45 4440 1337
Hosting - High Availability - High Performance Hosting - Webshops
Linux - Unix - BSD - Firewalls - Sikkerhed - S gemaskineoptimering
Systemudvikling: Php, Perl - C - C+ - C# - Javascript - Smarty
MySQL - Delphi mv.
Vi anvender udelukkende professionelle folk, de ydelser vi ikke
umiddelbart kan levere vha. inhouse konsulenter l ser vi via vort
enorme netv rk af freelancere. P samme vis kan vi via s ster-
virksomheder, samarbejdspartnere og interne resourcer levere s
godt som alt inden for IT Udstyr, Hosting, Internet forbindelser,
Udvikling, Mobil kommunikation, Auditering, Fejlfinding, mv.
On Tue, Oct 16, 2012 at 4:52 PM, Mikkel Christensen <mik...@mikjaer.com> wrote:
> the first to "drop table" we ran, caused heavy load, but the following 8-10
> did'nt cause anything, what can cause it to lock up? I.e. what do we need to
> look for before deleting next time?
Well, if the table is large, I suppose it will take longer.
There is also some dynamic between the DDL statement and immediately
preceding transactions: They have to be applied (ie finished in all
respects) before the DDL is executed.
> The table's werent used at all, so it hardly qualifies s a "schema upgrade"
> :-)
Well, then it is up to you to set wsrep_OSU_method=RSU because Galera
of course can't know that.
henrik
-- henrik.i...@avoinelama.fi
+358-40-8211286 skype: henrik.ingo irc: hingo
www.openlife.cc
On Tue, Oct 16, 2012 at 9:26 PM, Mikkel Christensen <mik...@mikjaer.com> wrote:
> It's me again, it seems as we are having periodic locking issues, and it
> seems as "innotop" does'nt work with galera.
> Is it posible to get it working ? Or do you know another way of resolving
> locking issues?
Perhaps first, could you explain a bit more what symptoms you see as
"locking issues"? Do you get some errors (rollbacks, timeouts...) in
your application? Do you see queries blocked (waiting for a lock) a
long time?
What are the specific symptoms you are trying to fix? Please
copy-paste if you can.
I second to henrik, you need to get more information of the "locking
symptom" you are seeing.
BTW. innotop works on Galera Cluster, problem is somewhere else.
> On Tue, Oct 16, 2012 at 9:26 PM, Mikkel Christensen
> <mik...@mikjaer.com> wrote:
>> It's me again, it seems as we are having periodic locking issues, and it
>> seems as "innotop" does'nt work with galera.
>> Is it posible to get it working ? Or do you know another way of resolving
>> locking issues?
> Perhaps first, could you explain a bit more what symptoms you see as
> "locking issues"? Do you get some errors (rollbacks, timeouts...) in
> your application? Do you see queries blocked (waiting for a lock) a
> long time?
> What are the specific symptoms you are trying to fix? Please
> copy-paste if you can.
> On Tue, Oct 16, 2012 at 9:26 PM, Mikkel Christensen <mik...@mikjaer.com> wrote:
>> It's me again, it seems as we are having periodic locking issues, and it
>> seems as "innotop" does'nt work with galera.
>> Is it posible to get it working ? Or do you know another way of resolving
>> locking issues?
> Perhaps first, could you explain a bit more what symptoms you see as
> "locking issues"? Do you get some errors (rollbacks, timeouts...) in
> your application? Do you see queries blocked (waiting for a lock) a
> long time?
> What are the specific symptoms you are trying to fix? Please
> copy-paste if you can.
> henrik
It's hard to explain, but i'll try. We have four node full stack cluster running lots of customize typo3, sugarcrm, third party authentification providers (like openid) and with a singlesign on system spanning all of the portals. The code is made by a lot of different more or less skilled developers, so mildly speaking ... this is spagetti code.
To make things worse we took the datebase from a mixed myisam and innodb environment to a pureley innodb environment to accomodate galera.
Most of the times, right now for an instance 22:05 i the evening, the system is running smoothly with no problems what so ever. We tried emulating 2000 concurrent users and the servers would just take the beating and serve its content. But tommorow morning around 9 o' clock (when the customers starts to use the site, and the editors starts to upload new content) i expect it to happend again.
The first thing i se is that the apache process starts to grow in numbers, but the users on the site (according to google analytics) stays the same, short after the number of slow-queries starts to grow, and when i check my apache status i will se the same 400 apache processes idleing in the top of the list as if they are waiting for something.
Our MySQL profiler claims that no tables are locked, and we don't know how to determine excatly what is going on at that excact problem when it locks up. I have tried stracing the top-most apache process, but it gave me no clues as to what was wrong.
If i do nothing the systems stays locked up like this forever (15 minuttes, the boss would't let me go longer because of complaints from the customers) and after restarting apache on the four nodes everything goes back to normal within 30 seconds. If i kill the top 10 processes (which has lived the longest) on the apache status list everything goes back to normal within 60-120 seconds.
I also tried using mytop to se the processlist, but the top most query is to my knowledge the next query in line to be excecuted, not the current one ... and besides killing the top 10 queries in there did'nt make any difference.
- i really hope you can help?
- And while im at it, im really gratefull for the help i have gotten so far and the help im hoping to receive, i have learnt so much during this process and i have made a lot of experience and small tools which i am looking forward to contribute back to the community ... once i have proven it's worth with this project :-)
-- De bedste hilsener / Best Regards
Mikkel Christensen
Do you know if any of the apps in Apache have persistent connections
on or some kind of connection pool on? If yes, are you able to turn it
off?
Basically, I'm suspecting that the apps you have migrated from MyISAM
to InnoDB might behave poorly with transactions. They might do
something like BEGIN; SELECT ...; UPDATE ...; ... and then never
commit. For MyISAM this is not an issue because it doesn't keep any
transaction state, all queries will be executed alone and forgotten.
But poor InnoDB would now be forced to forever keep uncommitted
transaction state open. It could even be just one query/code path in
one of your applications that does this. Next time, instead of killing
the 10 oldest apache processes, you could try killing just the most
oldest one.
On Wed, Oct 17, 2012 at 11:15 PM, Mikkel Christensen <mik...@mikjaer.com> wrote:
> Den 17/10/12 09.11, Henrik Ingo skrev:
>> On Tue, Oct 16, 2012 at 9:26 PM, Mikkel Christensen <mik...@mikjaer.com>
>> wrote:
>>> It's me again, it seems as we are having periodic locking issues, and it
>>> seems as "innotop" does'nt work with galera.
>>> Is it posible to get it working ? Or do you know another way of resolving
>>> locking issues?
>> Perhaps first, could you explain a bit more what symptoms you see as
>> "locking issues"? Do you get some errors (rollbacks, timeouts...) in
>> your application? Do you see queries blocked (waiting for a lock) a
>> long time?
>> What are the specific symptoms you are trying to fix? Please
>> copy-paste if you can.
>> henrik
> It's hard to explain, but i'll try. We have four node full stack cluster
> running lots of customize typo3, sugarcrm, third party authentification
> providers (like openid) and with a singlesign on system spanning all of the
> portals. The code is made by a lot of different more or less skilled
> developers, so mildly speaking ... this is spagetti code.
> To make things worse we took the datebase from a mixed myisam and innodb
> environment to a pureley innodb environment to accomodate galera.
> Most of the times, right now for an instance 22:05 i the evening, the system
> is running smoothly with no problems what so ever. We tried emulating 2000
> concurrent users and the servers would just take the beating and serve its
> content. But tommorow morning around 9 o' clock (when the customers starts
> to use the site, and the editors starts to upload new content) i expect it
> to happend again.
> The first thing i se is that the apache process starts to grow in numbers,
> but the users on the site (according to google analytics) stays the same,
> short after the number of slow-queries starts to grow, and when i check my
> apache status i will se the same 400 apache processes idleing in the top of
> the list as if they are waiting for something.
> Our MySQL profiler claims that no tables are locked, and we don't know how
> to determine excatly what is going on at that excact problem when it locks
> up. I have tried stracing the top-most apache process, but it gave me no
> clues as to what was wrong.
> If i do nothing the systems stays locked up like this forever (15 minuttes,
> the boss would't let me go longer because of complaints from the customers)
> and after restarting apache on the four nodes everything goes back to normal
> within 30 seconds. If i kill the top 10 processes (which has lived the
> longest) on the apache status list everything goes back to normal within
> 60-120 seconds.
> I also tried using mytop to se the processlist, but the top most query is to
> my knowledge the next query in line to be excecuted, not the current one ...
> and besides killing the top 10 queries in there did'nt make any difference.
> - i really hope you can help?
> - And while im at it, im really gratefull for the help i have gotten so far
> and the help im hoping to receive, i have learnt so much during this process
> and i have made a lot of experience and small tools which i am looking
> forward to contribute back to the community ... once i have proven it's
> worth with this project :-)
> --
> De bedste hilsener / Best Regards
> Mikkel Christensen
mysql> SHOW GLOBAL VARIABLES LIKE "autocommit";
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| autocommit | ON |
+---------------+-------+
1 row in set (0.00 sec)
> When it is happening (or even slightly before, when you see apache
> processes starting to grow toward 400), could you do SHOW ENGINE
> INNODB STATUS and copy-paste the section "Transactions"? (If you don't
> want to send the queries here, send to me personally.)
Yea no problem, the database answers quickly enough. Iv'e send you the processlist as well.
Typo3 uses persistent connections, don't know about the rest. But i do know that they connect to "127.0.0.1" instead of "localhost", hence forcing the stream into a socket. Can this be a problem?
> Basically, I'm suspecting that the apps you have migrated from MyISAM
> to InnoDB might behave poorly with transactions. They might do
> something like BEGIN; SELECT ...; UPDATE ...; ... and then never
> commit. For MyISAM this is not an issue because it doesn't keep any
> transaction state, all queries will be executed alone and forgotten.
> But poor InnoDB would now be forced to forever keep uncommitted
> transaction state open. It could even be just one query/code path in
> one of your applications that does this. Next time, instead of killing
> the 10 oldest apache processes, you could try killing just the most
> oldest one.
We did that this morning (it's shown in the processlist file as well) and after a couple of minutes it worked swiftly. I asked the developpers to go over that query.
Thanks for the info you sent off-list. A few more questions:
Do you take daily backups? If yes, do you use mysqldump? If yes, any
chance it was running in the morning when this happened? Note, it
could have been running on another node than the one you sent the
processlist and innodb status, it will still block replication from
succeeding.
Just checking, some symptoms match exactly what I have experienced
with mysqldump. If it's not mysqldump, then there could still be some
long running transaction dangling on any of the four nodes that is
causing similar behavior.
Ie the following could cause this:
SET tx_isolation=read-committed; # this is default
BEGIN;
SELECT * FROM innodb_table;
(go and have coffee...)
It's a bit of a gray area for me wrt InnoDB locking, but DELETEs (and
I think INSERTs) take some locks which will make them block until the
above transaction is finished. In a Galera cluster they will of course
block cluster wide if there is such a transaction on any of the nodes,
because deletes and inserts have to be committed on all nodes.
Come to think of it, this could also be an issue in Galera lock
certification process rather than InnoDB itself.
> mysql> SHOW GLOBAL VARIABLES LIKE "autocommit";
> +---------------+-------+
> | Variable_name | Value |
> +---------------+-------+
> | autocommit | ON |
> +---------------+-------+
> 1 row in set (0.00 sec)
>> When it is happening (or even slightly before, when you see apache
>> processes starting to grow toward 400), could you do SHOW ENGINE
>> INNODB STATUS and copy-paste the section "Transactions"? (If you don't
>> want to send the queries here, send to me personally.)
> Yea no problem, the database answers quickly enough. Iv'e send you the
> processlist as well.
> Typo3 uses persistent connections, don't know about the rest. But i do know
> that they connect to "127.0.0.1" instead of "localhost", hence forcing the
> stream into a socket. Can this be a problem?
>> Basically, I'm suspecting that the apps you have migrated from MyISAM
>> to InnoDB might behave poorly with transactions. They might do
>> something like BEGIN; SELECT ...; UPDATE ...; ... and then never
>> commit. For MyISAM this is not an issue because it doesn't keep any
>> transaction state, all queries will be executed alone and forgotten.
>> But poor InnoDB would now be forced to forever keep uncommitted
>> transaction state open. It could even be just one query/code path in
>> one of your applications that does this. Next time, instead of killing
>> the 10 oldest apache processes, you could try killing just the most
>> oldest one.
> We did that this morning (it's shown in the processlist file as well) and
> after a couple of minutes it worked swiftly. I asked the developpers to go
> over that query.
>> Any input?
> Thanks for the info you sent off-list. A few more questions:
> Do you take daily backups? If yes, do you use mysqldump? If yes, any
> chance it was running in the morning when this happened? Note, it
> could have been running on another node than the one you sent the
> processlist and innodb status, it will still block replication from
> succeeding.
> Just checking, some symptoms match exactly what I have experienced
> with mysqldump. If it's not mysqldump, then there could still be some
> long running transaction dangling on any of the four nodes that is
> causing similar behavior.
> Ie the following could cause this:
> SET tx_isolation=read-committed; # this is default
> BEGIN;
> SELECT * FROM innodb_table;
> (go and have coffee...)
> It's a bit of a gray area for me wrt InnoDB locking, but DELETEs (and
> I think INSERTs) take some locks which will make them block until the
> above transaction is finished. In a Galera cluster they will of course
> block cluster wide if there is such a transaction on any of the nodes,
> because deletes and inserts have to be committed on all nodes.
> Come to think of it, this could also be an issue in Galera lock
> certification process rather than InnoDB itself.
> henrik
> On Thu, Oct 18, 2012 at 11:15 AM, Mikkel Christensen <mik...@mikjaer.com> wrote:
>> Den 17/10/12 22.49, Henrik Ingo skrev:
>>> Thanks for the explanation. I'll top post for brevity...
>>> When the system starts to get locked, can any user do anything, or is
>>> everything just locked?
>> Non-database driven websites works, they at almost no time.
>>> What is the value of SHOW GLOBAL VARIABLES LIKE "autocommit"?
>>> Related:
>>> http://dev.mysql.com/doc/refman/5.0/en/server-system-variables.html#s... >> mysql> SHOW GLOBAL VARIABLES LIKE "autocommit";
>> +---------------+-------+
>> | Variable_name | Value |
>> +---------------+-------+
>> | autocommit | ON |
>> +---------------+-------+
>> 1 row in set (0.00 sec)
>>> When it is happening (or even slightly before, when you see apache
>>> processes starting to grow toward 400), could you do SHOW ENGINE
>>> INNODB STATUS and copy-paste the section "Transactions"? (If you don't
>>> want to send the queries here, send to me personally.)
>> Yea no problem, the database answers quickly enough. Iv'e send you the
>> processlist as well.
>>> Related:
>>> http://www.mysqlperformanceblog.com/2006/07/17/show-innodb-status-wal... >>> http://mtocker.livejournal.com/38194.html >>> Do you know if any of the apps in Apache have persistent connections
>>> on or some kind of connection pool on? If yes, are you able to turn it
>>> off?
>> Typo3 uses persistent connections, don't know about the rest. But i do know
>> that they connect to "127.0.0.1" instead of "localhost", hence forcing the
>> stream into a socket. Can this be a problem?
>>> Basically, I'm suspecting that the apps you have migrated from MyISAM
>>> to InnoDB might behave poorly with transactions. They might do
>>> something like BEGIN; SELECT ...; UPDATE ...; ... and then never
>>> commit. For MyISAM this is not an issue because it doesn't keep any
>>> transaction state, all queries will be executed alone and forgotten.
>>> But poor InnoDB would now be forced to forever keep uncommitted
>>> transaction state open. It could even be just one query/code path in
>>> one of your applications that does this. Next time, instead of killing
>>> the 10 oldest apache processes, you could try killing just the most
>>> oldest one.
>> We did that this morning (it's shown in the processlist file as well) and
>> after a couple of minutes it worked swiftly. I asked the developpers to go
>> over that query.
If you use --single-transaction with mysqldump, then inserts and
deletes are blocked on that node. Not in general, but if you dump all
of your tables then eventually the insert/delete will hit a spot that
is locked by mysqldump.
This is InnoDB behavior and not related to Galera. To be honest, I'm
not sure if this affects other nodes in the Galera cluster, but it
would make sense if it does.
<haris.zukanovi...@gmail.com> wrote:
> I appologize for cutting in here...
> Does mysqldump block the replication and thus make the application using the
> DB hang until it finishes?
> On 18/10/12 12.08, Henrik Ingo wrote:
>>> Any input?
>> Thanks for the info you sent off-list. A few more questions:
>> Do you take daily backups? If yes, do you use mysqldump? If yes, any
>> chance it was running in the morning when this happened? Note, it
>> could have been running on another node than the one you sent the
>> processlist and innodb status, it will still block replication from
>> succeeding.
>> Just checking, some symptoms match exactly what I have experienced
>> with mysqldump. If it's not mysqldump, then there could still be some
>> long running transaction dangling on any of the four nodes that is
>> causing similar behavior.
>> Ie the following could cause this:
>> SET tx_isolation=read-committed; # this is default
>> BEGIN;
>> SELECT * FROM innodb_table;
>> (go and have coffee...)
>> It's a bit of a gray area for me wrt InnoDB locking, but DELETEs (and
>> I think INSERTs) take some locks which will make them block until the
>> above transaction is finished. In a Galera cluster they will of course
>> block cluster wide if there is such a transaction on any of the nodes,
>> because deletes and inserts have to be committed on all nodes.
>> Come to think of it, this could also be an issue in Galera lock
>> certification process rather than InnoDB itself.
>> henrik
>> On Thu, Oct 18, 2012 at 11:15 AM, Mikkel Christensen <mik...@mikjaer.com>
>> wrote:
>>> Den 17/10/12 22.49, Henrik Ingo skrev:
>>>> Thanks for the explanation. I'll top post for brevity...
>>>> When the system starts to get locked, can any user do anything, or is
>>>> everything just locked?
>>> Non-database driven websites works, they at almost no time.
>>>> What is the value of SHOW GLOBAL VARIABLES LIKE "autocommit"?
>>>> Related:
>>> mysql> SHOW GLOBAL VARIABLES LIKE "autocommit";
>>> +---------------+-------+
>>> | Variable_name | Value |
>>> +---------------+-------+
>>> | autocommit | ON |
>>> +---------------+-------+
>>> 1 row in set (0.00 sec)
>>>> When it is happening (or even slightly before, when you see apache
>>>> processes starting to grow toward 400), could you do SHOW ENGINE
>>>> INNODB STATUS and copy-paste the section "Transactions"? (If you don't
>>>> want to send the queries here, send to me personally.)
>>> Yea no problem, the database answers quickly enough. Iv'e send you the
>>> processlist as well.
>>> Typo3 uses persistent connections, don't know about the rest. But i do
>>> know
>>> that they connect to "127.0.0.1" instead of "localhost", hence forcing
>>> the
>>> stream into a socket. Can this be a problem?
>>>> Basically, I'm suspecting that the apps you have migrated from MyISAM
>>>> to InnoDB might behave poorly with transactions. They might do
>>>> something like BEGIN; SELECT ...; UPDATE ...; ... and then never
>>>> commit. For MyISAM this is not an issue because it doesn't keep any
>>>> transaction state, all queries will be executed alone and forgotten.
>>>> But poor InnoDB would now be forced to forever keep uncommitted
>>>> transaction state open. It could even be just one query/code path in
>>>> one of your applications that does this. Next time, instead of killing
>>>> the 10 oldest apache processes, you could try killing just the most
>>>> oldest one.
>>> We did that this morning (it's shown in the processlist file as well) and
>>> after a couple of minutes it worked swiftly. I asked the developpers to
>>> go
>>> over that query.
It turns out galera replication would block on deletes to a table that
didn't have a primary key. This caused all following transactions to
queue up, since commit order is preserved in replication. After adding
primary key to the table, the problem wasn't seen anymore.
This is a well known issue (I don't know if it is a bug, I know
Codership guys have made some fixes in this area though). But in any
case it is best practice to alway define explicit primary keys when
using Galera (it's a good idea anyway).
On Thu, Oct 18, 2012 at 1:08 PM, Henrik Ingo <henrik.i...@avoinelama.fi> wrote:
>> Any input?
> Thanks for the info you sent off-list. A few more questions:
> Do you take daily backups? If yes, do you use mysqldump? If yes, any
> chance it was running in the morning when this happened? Note, it
> could have been running on another node than the one you sent the
> processlist and innodb status, it will still block replication from
> succeeding.
> Just checking, some symptoms match exactly what I have experienced
> with mysqldump. If it's not mysqldump, then there could still be some
> long running transaction dangling on any of the four nodes that is
> causing similar behavior.
> Ie the following could cause this:
> SET tx_isolation=read-committed; # this is default
> BEGIN;
> SELECT * FROM innodb_table;
> (go and have coffee...)
> It's a bit of a gray area for me wrt InnoDB locking, but DELETEs (and
> I think INSERTs) take some locks which will make them block until the
> above transaction is finished. In a Galera cluster they will of course
> block cluster wide if there is such a transaction on any of the nodes,
> because deletes and inserts have to be committed on all nodes.
> Come to think of it, this could also be an issue in Galera lock
> certification process rather than InnoDB itself.
> henrik
> On Thu, Oct 18, 2012 at 11:15 AM, Mikkel Christensen <mik...@mikjaer.com> wrote:
>> Den 17/10/12 22.49, Henrik Ingo skrev:
>>> Thanks for the explanation. I'll top post for brevity...
>>> When the system starts to get locked, can any user do anything, or is
>>> everything just locked?
>> Non-database driven websites works, they at almost no time.
>>> What is the value of SHOW GLOBAL VARIABLES LIKE "autocommit"?
>>> Related:
>> mysql> SHOW GLOBAL VARIABLES LIKE "autocommit";
>> +---------------+-------+
>> | Variable_name | Value |
>> +---------------+-------+
>> | autocommit | ON |
>> +---------------+-------+
>> 1 row in set (0.00 sec)
>>> When it is happening (or even slightly before, when you see apache
>>> processes starting to grow toward 400), could you do SHOW ENGINE
>>> INNODB STATUS and copy-paste the section "Transactions"? (If you don't
>>> want to send the queries here, send to me personally.)
>> Yea no problem, the database answers quickly enough. Iv'e send you the
>> processlist as well.
>> Typo3 uses persistent connections, don't know about the rest. But i do know
>> that they connect to "127.0.0.1" instead of "localhost", hence forcing the
>> stream into a socket. Can this be a problem?
>>> Basically, I'm suspecting that the apps you have migrated from MyISAM
>>> to InnoDB might behave poorly with transactions. They might do
>>> something like BEGIN; SELECT ...; UPDATE ...; ... and then never
>>> commit. For MyISAM this is not an issue because it doesn't keep any
>>> transaction state, all queries will be executed alone and forgotten.
>>> But poor InnoDB would now be forced to forever keep uncommitted
>>> transaction state open. It could even be just one query/code path in
>>> one of your applications that does this. Next time, instead of killing
>>> the 10 oldest apache processes, you could try killing just the most
>>> oldest one.
>> We did that this morning (it's shown in the processlist file as well) and
>> after a couple of minutes it worked swiftly. I asked the developpers to go
>> over that query.
> It turns out galera replication would block on deletes to a table > that
I guess "block" is a too strong term for that. It does not block, it just takes much-much longer ;) And I suspect it is not Galera-specific, rather ROW-based replication issue. You just don't notice it that much in the native MySQL replication since it is fully asynchronous and most people still seem to use STATEMENT.
> didn't have a primary key. This caused all following transactions to
> queue up, since commit order is preserved in replication. After > adding
> primary key to the table, the problem wasn't seen anymore.
> This is a well known issue (I don't know if it is a bug, I know
> Codership guys have made some fixes in this area though). But in any
> case it is best practice to alway define explicit primary keys when
> using Galera (it's a good idea anyway).
> henrik
-- Alexey Yurchenko,
Codership Oy, www.codership.com Skype: alexey.yurchenko, Phone: +358-400-516-011