Message from discussion
Is it safe to use Galera with Magento?
Date: Wed, 14 Nov 2012 04:40:00 -0800 (PST)
From: Ilias Bertsimas <award...@gmail.com>
To: codership-team@googlegroups.com
Cc: myeagleflies <thr...@globi.org>, henrik.i...@avoinelama.fi,
henrik.i...@avoinelama.fi
Message-Id: <a98063a6-d697-4ba9-a76f-5a61e993d691@googlegroups.com>
In-Reply-To: <da41cfd3-42b3-4d2b-8bb9-e7dc1e5cebdb@googlegroups.com>
References: <0b8fedcd-7a07-4bff-939a-5347dc8422fc@googlegroups.com>
<CAKHykevnECW2rv6Ji90OKq9atAf-bqQmsXTtyXusSGw4g=8-tw@mail.gmail.com>
<3aac75ec-00be-4951-96c2-e9813fb97b0c@googlegroups.com>
<1a1cc9ed-d177-4459-95b7-889983edb801@googlegroups.com>
<79bc44ba-e622-4078-b032-f10bfe5e05bf@googlegroups.com>
<5c93bf93-d597-49a4-a0d9-fe8beee7b7ac@googlegroups.com>
<cea134ec-3fde-48ab-aee9-7a1b5939df9c@googlegroups.com>
<684e665b-1164-4baa-b176-4447994fc61a@googlegroups.com>
<da41cfd3-42b3-4d2b-8bb9-e7dc1e5cebdb@googlegroups.com>
Subject: Re: [codership-team] Is it safe to use Galera with Magento?
MIME-Version: 1.0
Content-Type: multipart/mixed;
boundary="----=_Part_759_30502463.1352896800632"
------=_Part_759_30502463.1352896800632
Content-Type: multipart/alternative;
boundary="----=_Part_760_10295338.1352896800633"
------=_Part_760_10295338.1352896800633
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
I can see in node1's config:
1. # Group communication system handle
2. wsrep_cluster_address="gcomm://"
You did not change that to point to another cluster node before shutdown ?
If not then maybe that is the cause, as when you started node1 again it did
not connect to the cluster and continued using the saved uuid it previously
had but with the notion of being the first node of the cluster the others
will bootstrap to.
On Wednesday, November 14, 2012 12:29:45 PM UTC, myeagleflies wrote:
>
> Please take a look at my config files http://pastebin.com/BVvwMB35
>
>
> To shed some more light. We performed failover test few days ago:
> - garb1 and galera node 1 were shut down. then tests were performed on
> magento
> - garb1 and galera node 1 were powered on
> - garb2 and galera node 2 were shut down. tests were performed on magento
>
> I wonder if this could cause current split brain scenario?
>
> On Wednesday, November 14, 2012 12:02:08 PM UTC, Ilias Bertsimas wrote:
>>
>> Node1 seems to be on it's own, you can decide which node is more
>> consistent and clear the data of the other one and make it join the
>> conmsistent node by taking a full SST.
>>
>> On Wednesday, November 14, 2012 11:55:39 AM UTC, myeagleflies wrote:
>>>
>>> OK. This is definitely serious issue. What is best way of fixing it? How
>>> to troubleshoot gardb to see which node those are connected to?
>>>
>>> On Wednesday, November 14, 2012 11:46:51 AM UTC, Ilias Bertsimas wrote:
>>>>
>>>> It seems you have 2 different clusters between those nodes:
>>>>
>>>> node1:
>>>>
>>>> 1. wsrep_cluster_conf_id 1
>>>> 2. wsrep_cluster_size 1
>>>>
>>>>
>>>> node2:
>>>>
>>>> 1. wsrep_cluster_conf_id 10
>>>> 2. wsrep_cluster_size 2
>>>>
>>>>
>>>> Different nodes on each one and different conf_ids the first one split
>>>> and formed a cluster of it's own using the same uuid as it seems.
>>>>
>>>>
>>>>
>>>> On Wednesday, November 14, 2012 11:40:57 AM UTC, myeagleflies wrote:
>>>>>
>>>>> Please take a look http://pastebin.com/e5tZ0YL3
>>>>>
>>>>> On Wednesday, November 14, 2012 10:50:22 AM UTC, Ilias Bertsimas wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> That sounds weird, could you please post the results of SHOW STATUS
>>>>>> LIKE 'WSREP%'; from both of your nodes ?
>>>>>>
>>>>>> Kind Regards,
>>>>>> Ilias.
>>>>>>
>>>>>> On Wednesday, November 14, 2012 10:46:14 AM UTC, myeagleflies wrote:
>>>>>>>
>>>>>>> We started testing Galera + Magento 10 days ago. Our setup is: 2
>>>>>>> galera nodes and 2 garbd nodes. One of the galera nodes has virtual IP
>>>>>>> assigned to it via pacemaker. This is our cluster IP. Magento webservers
>>>>>>> communicate with cluster IP. This way one node is de facto primary and
>>>>>>> second galera node is secondary. We do not need to talk with both nodes at
>>>>>>> same time. Projected traffic to our website is small. Main purpose for
>>>>>>> Galera is to allow for easy failover.
>>>>>>>
>>>>>>> I found some issues recently:
>>>>>>> - I could not run 'USE magento' on one of the nodes. googled for it
>>>>>>> and someone suggested rebooting such node. Reboot seems to fix the issue.
>>>>>>> node is now properly connected to cluster.
>>>>>>> - there are differences in data on both nodes. some tables contain
>>>>>>> more rows on one node than on another. and there is even one table which is
>>>>>>> missing on one of the nodes!
>>>>>>>
>>>>>>> Difference in data look scary. Is this typical? What is recommended
>>>>>>> way of troubleshooting this issue? Is Magento not fully compatible with
>>>>>>> Galera?
>>>>>>>
>>>>>>> We do have some tables without primary keys. If DELETE operations
>>>>>>> were performed on those tables this could explain differences in data in
>>>>>>> those tables. However this does not explain why one table is missing.
>>>>>>>
>>>>>>> I am quite puzzled and looking for an advice.
>>>>>>>
>>>>>>> Thanks in advance!
>>>>>>>
>>>>>>
------=_Part_760_10295338.1352896800633
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: quoted-printable
I can see in node1's config:<div><br></div><div><ol style=3D"margin: 0px; p=
adding: 0px 0px 0px 48px; color: rgb(172, 172, 172); font-family: Consolas,=
Menlo, Monaco, 'Lucida Console', 'Liberation Mono', 'DejaVu Sans Mono', 'B=
itstream Vera Sans Mono', monospace, serif; font-size: 12px; line-height: 2=
1px; background-color: rgb(248, 248, 248); "><li class=3D"li1" style=3D"-we=
bkit-user-select: none; "><div class=3D"de1" style=3D"-webkit-user-select: =
text; padding-right: 5px; padding-left: 5px; vertical-align: top; color: rg=
b(0, 0, 0); border-left-width: 1px; border-left-style: solid; border-left-c=
olor: rgb(204, 204, 204); margin-left: -7px; position: relative; background=
-color: rgb(255, 255, 255); line-height: 21px; "># Group communication syst=
em handle</div></li><li class=3D"li2" style=3D"-webkit-user-select: none; "=
><div class=3D"de2" style=3D"-webkit-user-select: text; padding-right: 5px;=
padding-left: 5px; vertical-align: top; color: rgb(0, 0, 0); border-left-w=
idth: 1px; border-left-style: solid; border-left-color: rgb(204, 204, 204);=
margin-left: -7px; position: relative; background-color: rgb(255, 255, 255=
); line-height: 21px; ">wsrep_cluster_address=3D"gcomm://"</div></li></ol><=
br>You did not change that to point to another cluster node before shutdown=
? If not then maybe that is the cause, as when you started node1 again it =
did not connect to the cluster and continued using the saved uuid it previo=
usly had but with the notion of being the first node of the cluster the oth=
ers will bootstrap to.</div><div><br></div><div><br></div><div>On Wednesday=
, November 14, 2012 12:29:45 PM UTC, myeagleflies wrote:<blockquote class=
=3D"gmail_quote" style=3D"margin: 0;margin-left: 0.8ex;border-left: 1px #cc=
c solid;padding-left: 1ex;">Please take a look at my config files <a href=
=3D"http://pastebin.com/BVvwMB35" target=3D"_blank">http://pastebin.com/BVv=
wMB35</a><br><br><br>To shed some more light. We performed failover test fe=
w days ago:<br>- garb1 and galera node 1 were shut down. then tests were pe=
rformed on magento<br>- garb1 and galera node 1 were powered on<br>- garb2 =
and galera node 2 were shut down. tests were performed on magento<br><br>I =
wonder if this could cause current split brain scenario?<br><br>On Wednesda=
y, November 14, 2012 12:02:08 PM UTC, Ilias Bertsimas wrote:<blockquote cla=
ss=3D"gmail_quote" style=3D"margin:0;margin-left:0.8ex;border-left:1px #ccc=
solid;padding-left:1ex">Node1 seems to be on it's own, you can decide whic=
h node is more consistent and clear the data of the other one and make it j=
oin the conmsistent node by taking a full SST. <div><br>On Wednesday, =
November 14, 2012 11:55:39 AM UTC, myeagleflies wrote:<blockquote class=3D"=
gmail_quote" style=3D"margin:0;margin-left:0.8ex;border-left:1px #ccc solid=
;padding-left:1ex">OK. This is definitely serious issue. What is best way o=
f fixing it? How to troubleshoot gardb to see which node those are connecte=
d to?<div><br>On Wednesday, November 14, 2012 11:46:51 AM UTC, Ilias Bertsi=
mas wrote:<blockquote class=3D"gmail_quote" style=3D"margin:0;margin-left:0=
.8ex;border-left:1px #ccc solid;padding-left:1ex">It seems you have 2 diffe=
rent clusters between those nodes:<div><br></div><div>node1:</div><div><ol =
style=3D"margin:0px;padding:0px 0px 0px 48px;color:rgb(172,172,172);font-fa=
mily:Consolas,Menlo,Monaco,'Lucida Console','Liberation Mono','DejaVu Sans =
Mono','Bitstream Vera Sans Mono',monospace,serif;font-size:12px;line-height=
:21px;background-color:rgb(248,248,248)"><li><div style=3D"padding-right:5p=
x;padding-left:5px;vertical-align:top;color:rgb(0,0,0);border-left-width:1p=
x;border-left-style:solid;border-left-color:rgb(204,204,204);background-col=
or:rgb(255,255,255);line-height:21px">wsrep_cluster_conf_id 1</div><=
/li><li><div style=3D"padding-right:5px;padding-left:5px;vertical-align:top=
;color:rgb(0,0,0);border-left-width:1px;border-left-style:solid;border-left=
-color:rgb(204,204,204);background-color:rgb(255,255,255);line-height:21px"=
>wsrep_cluster_size 1</div></li></ol></div><div><br>nod=
e2:</div><div><ol style=3D"margin:0px;padding:0px 0px 0px 48px;color:rgb(17=
2,172,172);font-family:Consolas,Menlo,Monaco,'Lucida Console','Liberation M=
ono','DejaVu Sans Mono','Bitstream Vera Sans Mono',monospace,serif;font-siz=
e:12px;line-height:21px;background-color:rgb(248,248,248)"><li><div style=
=3D"padding-right:5px;padding-left:5px;vertical-align:top;color:rgb(0,0,0);=
border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204=
,204);background-color:rgb(255,255,255);line-height:21px">wsrep_cluster_con=
f_id 10</div></li><li><div style=3D"padding-right:5px;padding-left:5=
px;vertical-align:top;color:rgb(0,0,0);border-left-width:1px;border-left-st=
yle:solid;border-left-color:rgb(204,204,204);background-color:rgb(255,255,2=
55);line-height:21px">wsrep_cluster_size 2</div></li></=
ol></div><div><br></div><div>Different nodes on each one and different conf=
_ids the first one split and formed a cluster of it's own using the same uu=
id as it seems.</div><div><br></div><div><br></div><div><br></div><div>On W=
ednesday, November 14, 2012 11:40:57 AM UTC, myeagleflies wrote:<blockquote=
class=3D"gmail_quote" style=3D"margin:0;margin-left:0.8ex;border-left:1px =
#ccc solid;padding-left:1ex">Please take a look <a href=3D"http://pastebin.=
com/e5tZ0YL3" target=3D"_blank">http://pastebin.com/e5tZ0YL3</a><br><br>On =
Wednesday, November 14, 2012 10:50:22 AM UTC, Ilias Bertsimas wrote:<blockq=
uote class=3D"gmail_quote" style=3D"margin:0;margin-left:0.8ex;border-left:=
1px #ccc solid;padding-left:1ex"><font face=3D"verdana, sans-serif">Hi,</fo=
nt><div><font face=3D"verdana, sans-serif"><br></font></div><div><font face=
=3D"verdana, sans-serif">That sounds weird, could you please post the resul=
ts of SHOW STATUS LIKE 'WSREP%'; from both of your nodes ?</font></div><div=
><font face=3D"verdana, sans-serif"><br></font></div><div><font face=3D"ver=
dana, sans-serif">Kind Regards,</font></div><div><font face=3D"verdana, san=
s-serif">Ilias.</font></div><div><br>On Wednesday, November 14, 2012 10:46:=
14 AM UTC, myeagleflies wrote:<blockquote class=3D"gmail_quote" style=3D"ma=
rgin:0;margin-left:0.8ex;border-left:1px #ccc solid;padding-left:1ex">We st=
arted testing Galera + Magento 10 days ago. Our setup is: 2 galera nodes an=
d 2 garbd nodes. One of the galera nodes has virtual IP assigned to it via =
pacemaker. This is our cluster IP. Magento webservers communicate with clus=
ter IP. This way one node is de facto primary and second galera node is sec=
ondary. We do not need to talk with both nodes at same time. Projected traf=
fic to our website is small. Main purpose for Galera is to allow for easy f=
ailover.<br><br>I found some issues recently:<br>- I could not run 'USE mag=
ento' on one of the nodes. googled for it and someone suggested rebooting s=
uch node. Reboot seems to fix the issue. node is now properly connected to =
cluster.<br>- there are differences in data on both nodes. some tables cont=
ain more rows on one node than on another. and there is even one table whic=
h is missing on one of the nodes!<br><br>Difference in data look scary. Is =
this typical? What is recommended way of troubleshooting this issue? Is Mag=
ento not fully compatible with Galera? <br><br>We do have some tables witho=
ut primary keys. If DELETE operations were performed on those tables this c=
ould explain differences in data in those tables. However this does not exp=
lain why one table is missing. <br><br>I am quite puzzled and looking for a=
n advice.<br><br>Thanks in advance!<br></blockquote></div></blockquote></bl=
ockquote></div></blockquote></div></blockquote></div></blockquote></blockqu=
ote></div>
------=_Part_760_10295338.1352896800633--
------=_Part_759_30502463.1352896800632--