Den 17/10/12 09.11, Henrik Ingo skrev:
> On Tue, Oct 16, 2012 at 9:26 PM, Mikkel Christensen <
mik...@mikjaer.com> wrote:
>> It's me again, it seems as we are having periodic locking issues, and it
>> seems as "innotop" does'nt work with galera.
>>
>> Is it posible to get it working ? Or do you know another way of resolving
>> locking issues?
> Perhaps first, could you explain a bit more what symptoms you see as
> "locking issues"? Do you get some errors (rollbacks, timeouts...) in
> your application? Do you see queries blocked (waiting for a lock) a
> long time?
>
> What are the specific symptoms you are trying to fix? Please
> copy-paste if you can.
>
> henrik
>
It's hard to explain, but i'll try. We have four node full stack
cluster running lots of customize typo3, sugarcrm, third party
authentification providers (like openid) and with a singlesign on
system spanning all of the portals. The code is made by a lot of
different more or less skilled developers, so mildly speaking ... this
is spagetti code.
To make things worse we took the datebase from a mixed myisam and innodb
environment to a pureley innodb environment to accomodate galera.
Most of the times, right now for an instance 22:05 i the evening, the
system is running smoothly with no problems what so ever. We tried
emulating 2000 concurrent users and the servers would just take the
beating and serve its content. But tommorow morning around 9 o' clock
(when the customers starts to use the site, and the editors starts to
upload new content) i expect it to happend again.
The first thing i se is that the apache process starts to grow in
numbers, but the users on the site (according to google analytics) stays
the same, short after the number of slow-queries starts to grow, and
when i check my apache status i will se the same 400 apache processes
idleing in the top of the list as if they are waiting for something.
Our MySQL profiler claims that no tables are locked, and we don't know
how to determine excatly what is going on at that excact problem when it
locks up. I have tried stracing the top-most apache process, but it gave
me no clues as to what was wrong.
If i do nothing the systems stays locked up like this forever (15
minuttes, the boss would't let me go longer because of complaints from
the customers) and after restarting apache on the four nodes everything
goes back to normal within 30 seconds. If i kill the top 10 processes
(which has lived the longest) on the apache status list everything goes
back to normal within 60-120 seconds.
I also tried using mytop to se the processlist, but the top most query
is to my knowledge the next query in line to be excecuted, not the
current one ... and besides killing the top 10 queries in there did'nt
make any difference.
- i really hope you can help?
- And while im at it, im really gratefull for the help i have gotten so
far and the help im hoping to receive, i have learnt so much during this
process and i have made a lot of experience and small tools which i am
looking forward to contribute back to the community ... once i have
proven it's worth with this project :-)