Message from discussion
Is it safe to use Galera with Magento?
Received: by 10.59.6.72 with SMTP id cs8mr8782423ved.27.1352897481064;
Wed, 14 Nov 2012 04:51:21 -0800 (PST)
X-BeenThere: codership-team@googlegroups.com
Received: by 10.221.10.72 with SMTP id oz8ls389252vcb.5.gmail; Wed, 14 Nov
2012 04:51:20 -0800 (PST)
Received: by 10.52.97.101 with SMTP id dz5mr143439vdb.2.1352897480395;
Wed, 14 Nov 2012 04:51:20 -0800 (PST)
Date: Wed, 14 Nov 2012 04:51:20 -0800 (PST)
From: myeagleflies <thr...@globi.org>
To: codership-team@googlegroups.com
Cc: myeagleflies <thr...@globi.org>, henrik.i...@avoinelama.fi,
henrik.i...@avoinelama.fi
Message-Id: <8beeab50-b861-45b7-8d94-07dd84d66ab8@googlegroups.com>
In-Reply-To: <a98063a6-d697-4ba9-a76f-5a61e993d691@googlegroups.com>
References: <0b8fedcd-7a07-4bff-939a-5347dc8422fc@googlegroups.com>
<CAKHykevnECW2rv6Ji90OKq9atAf-bqQmsXTtyXusSGw4g=8-tw@mail.gmail.com>
<3aac75ec-00be-4951-96c2-e9813fb97b0c@googlegroups.com>
<1a1cc9ed-d177-4459-95b7-889983edb801@googlegroups.com>
<79bc44ba-e622-4078-b032-f10bfe5e05bf@googlegroups.com>
<5c93bf93-d597-49a4-a0d9-fe8beee7b7ac@googlegroups.com>
<cea134ec-3fde-48ab-aee9-7a1b5939df9c@googlegroups.com>
<684e665b-1164-4baa-b176-4447994fc61a@googlegroups.com>
<da41cfd3-42b3-4d2b-8bb9-e7dc1e5cebdb@googlegroups.com>
<a98063a6-d697-4ba9-a76f-5a61e993d691@googlegroups.com>
Subject: Re: [codership-team] Is it safe to use Galera with Magento?
MIME-Version: 1.0
Content-Type: multipart/mixed;
boundary="----=_Part_177_143812.1352897480144"
------=_Part_177_143812.1352897480144
Content-Type: multipart/alternative;
boundary="----=_Part_178_1132857.1352897480144"
------=_Part_178_1132857.1352897480144
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
No I did not change it to another cluster before shutdown. I was not sure
how failover happens but was presuming that possibly node which is brought
up will be invited by other nodes to join their cluster. From what you are
saying it seems we really should monitor for situations when one node goes
down in order to prevent such splint brain scenarious.
BTW: Aren't garbd designed to prevent such events? Can I change node config
not to start if it can't find other nodes to join to? I think it would be
better if node which is going to be alone would simply refuse to start.
On Wednesday, November 14, 2012 12:40:00 PM UTC, Ilias Bertsimas wrote:
>
> I can see in node1's config:
>
>
> 1. # Group communication system handle
> 2. wsrep_cluster_address="gcomm://"
>
>
> You did not change that to point to another cluster node before shutdown ?
> If not then maybe that is the cause, as when you started node1 again it did
> not connect to the cluster and continued using the saved uuid it previously
> had but with the notion of being the first node of the cluster the others
> will bootstrap to.
>
>
> On Wednesday, November 14, 2012 12:29:45 PM UTC, myeagleflies wrote:
>>
>> Please take a look at my config files http://pastebin.com/BVvwMB35
>>
>>
>> To shed some more light. We performed failover test few days ago:
>> - garb1 and galera node 1 were shut down. then tests were performed on
>> magento
>> - garb1 and galera node 1 were powered on
>> - garb2 and galera node 2 were shut down. tests were performed on magento
>>
>> I wonder if this could cause current split brain scenario?
>>
>> On Wednesday, November 14, 2012 12:02:08 PM UTC, Ilias Bertsimas wrote:
>>>
>>> Node1 seems to be on it's own, you can decide which node is more
>>> consistent and clear the data of the other one and make it join the
>>> conmsistent node by taking a full SST.
>>>
>>> On Wednesday, November 14, 2012 11:55:39 AM UTC, myeagleflies wrote:
>>>>
>>>> OK. This is definitely serious issue. What is best way of fixing it?
>>>> How to troubleshoot gardb to see which node those are connected to?
>>>>
>>>> On Wednesday, November 14, 2012 11:46:51 AM UTC, Ilias Bertsimas wrote:
>>>>>
>>>>> It seems you have 2 different clusters between those nodes:
>>>>>
>>>>> node1:
>>>>>
>>>>> 1. wsrep_cluster_conf_id 1
>>>>> 2. wsrep_cluster_size 1
>>>>>
>>>>>
>>>>> node2:
>>>>>
>>>>> 1. wsrep_cluster_conf_id 10
>>>>> 2. wsrep_cluster_size 2
>>>>>
>>>>>
>>>>> Different nodes on each one and different conf_ids the first one split
>>>>> and formed a cluster of it's own using the same uuid as it seems.
>>>>>
>>>>>
>>>>>
>>>>> On Wednesday, November 14, 2012 11:40:57 AM UTC, myeagleflies wrote:
>>>>>>
>>>>>> Please take a look http://pastebin.com/e5tZ0YL3
>>>>>>
>>>>>> On Wednesday, November 14, 2012 10:50:22 AM UTC, Ilias Bertsimas
>>>>>> wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> That sounds weird, could you please post the results of SHOW STATUS
>>>>>>> LIKE 'WSREP%'; from both of your nodes ?
>>>>>>>
>>>>>>> Kind Regards,
>>>>>>> Ilias.
>>>>>>>
>>>>>>> On Wednesday, November 14, 2012 10:46:14 AM UTC, myeagleflies wrote:
>>>>>>>>
>>>>>>>> We started testing Galera + Magento 10 days ago. Our setup is: 2
>>>>>>>> galera nodes and 2 garbd nodes. One of the galera nodes has virtual IP
>>>>>>>> assigned to it via pacemaker. This is our cluster IP. Magento webservers
>>>>>>>> communicate with cluster IP. This way one node is de facto primary and
>>>>>>>> second galera node is secondary. We do not need to talk with both nodes at
>>>>>>>> same time. Projected traffic to our website is small. Main purpose for
>>>>>>>> Galera is to allow for easy failover.
>>>>>>>>
>>>>>>>> I found some issues recently:
>>>>>>>> - I could not run 'USE magento' on one of the nodes. googled for it
>>>>>>>> and someone suggested rebooting such node. Reboot seems to fix the issue.
>>>>>>>> node is now properly connected to cluster.
>>>>>>>> - there are differences in data on both nodes. some tables contain
>>>>>>>> more rows on one node than on another. and there is even one table which is
>>>>>>>> missing on one of the nodes!
>>>>>>>>
>>>>>>>> Difference in data look scary. Is this typical? What is recommended
>>>>>>>> way of troubleshooting this issue? Is Magento not fully compatible with
>>>>>>>> Galera?
>>>>>>>>
>>>>>>>> We do have some tables without primary keys. If DELETE operations
>>>>>>>> were performed on those tables this could explain differences in data in
>>>>>>>> those tables. However this does not explain why one table is missing.
>>>>>>>>
>>>>>>>> I am quite puzzled and looking for an advice.
>>>>>>>>
>>>>>>>> Thanks in advance!
>>>>>>>>
>>>>>>>
------=_Part_178_1132857.1352897480144
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: quoted-printable
No I did not change it to another cluster before shutdown. I was not sure h=
ow failover happens but was presuming that possibly node which is brought u=
p will be invited by other nodes to join their cluster. From what you are s=
aying it seems we really should monitor for situations when one node goes d=
own in order to prevent such splint brain scenarious.<br><br>BTW: Aren't ga=
rbd designed to prevent such events? Can I change node config not to start =
if it can't find other nodes to join to? I think it would be better if node=
which is going to be alone would simply refuse to start. <br><br><br>On We=
dnesday, November 14, 2012 12:40:00 PM UTC, Ilias Bertsimas wrote:<blockquo=
te class=3D"gmail_quote" style=3D"margin: 0;margin-left: 0.8ex;border-left:=
1px #ccc solid;padding-left: 1ex;">I can see in node1's config:<div><br></=
div><div><ol style=3D"margin:0px;padding:0px 0px 0px 48px;color:rgb(172,172=
,172);font-family:Consolas,Menlo,Monaco,'Lucida Console','Liberation Mono',=
'DejaVu Sans Mono','Bitstream Vera Sans Mono',monospace,serif;font-size:12p=
x;line-height:21px;background-color:rgb(248,248,248)"><li><div style=3D"pad=
ding-right:5px;padding-left:5px;vertical-align:top;color:rgb(0,0,0);border-=
left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);b=
ackground-color:rgb(255,255,255);line-height:21px"># Group communication sy=
stem handle</div></li><li><div style=3D"padding-right:5px;padding-left:5px;=
vertical-align:top;color:rgb(0,0,0);border-left-width:1px;border-left-style=
:solid;border-left-color:rgb(204,204,204);background-color:rgb(255,255,255)=
;line-height:21px">wsrep_cluster_address=3D"gcomm:/<wbr>/"</div></li></ol><=
br>You did not change that to point to another cluster node before shutdown=
? If not then maybe that is the cause, as when you started node1 again it =
did not connect to the cluster and continued using the saved uuid it previo=
usly had but with the notion of being the first node of the cluster the oth=
ers will bootstrap to.</div><div><br></div><div><br></div><div>On Wednesday=
, November 14, 2012 12:29:45 PM UTC, myeagleflies wrote:<blockquote class=
=3D"gmail_quote" style=3D"margin:0;margin-left:0.8ex;border-left:1px #ccc s=
olid;padding-left:1ex">Please take a look at my config files <a href=3D"htt=
p://pastebin.com/BVvwMB35" target=3D"_blank">http://pastebin.com/BVvwMB35</=
a><br><br><br>To shed some more light. We performed failover test few days =
ago:<br>- garb1 and galera node 1 were shut down. then tests were performed=
on magento<br>- garb1 and galera node 1 were powered on<br>- garb2 and gal=
era node 2 were shut down. tests were performed on magento<br><br>I wonder =
if this could cause current split brain scenario?<br><br>On Wednesday, Nove=
mber 14, 2012 12:02:08 PM UTC, Ilias Bertsimas wrote:<blockquote class=3D"g=
mail_quote" style=3D"margin:0;margin-left:0.8ex;border-left:1px #ccc solid;=
padding-left:1ex">Node1 seems to be on it's own, you can decide which node =
is more consistent and clear the data of the other one and make it join the=
conmsistent node by taking a full SST. <div><br>On Wednesday, Novembe=
r 14, 2012 11:55:39 AM UTC, myeagleflies wrote:<blockquote class=3D"gmail_q=
uote" style=3D"margin:0;margin-left:0.8ex;border-left:1px #ccc solid;paddin=
g-left:1ex">OK. This is definitely serious issue. What is best way of fixin=
g it? How to troubleshoot gardb to see which node those are connected to?<d=
iv><br>On Wednesday, November 14, 2012 11:46:51 AM UTC, Ilias Bertsimas wro=
te:<blockquote class=3D"gmail_quote" style=3D"margin:0;margin-left:0.8ex;bo=
rder-left:1px #ccc solid;padding-left:1ex">It seems you have 2 different cl=
usters between those nodes:<div><br></div><div>node1:</div><div><ol style=
=3D"margin:0px;padding:0px 0px 0px 48px;color:rgb(172,172,172);font-family:=
Consolas,Menlo,Monaco,'Lucida Console','Liberation Mono','DejaVu Sans Mono'=
,'Bitstream Vera Sans Mono',monospace,serif;font-size:12px;line-height:21px=
;background-color:rgb(248,248,248)"><li><div style=3D"padding-right:5px;pad=
ding-left:5px;vertical-align:top;color:rgb(0,0,0);border-left-width:1px;bor=
der-left-style:solid;border-left-color:rgb(204,204,204);background-color:rg=
b(255,255,255);line-height:21px">wsrep_cluster_conf_id 1</div></li><=
li><div style=3D"padding-right:5px;padding-left:5px;vertical-align:top;colo=
r:rgb(0,0,0);border-left-width:1px;border-left-style:solid;border-left-colo=
r:rgb(204,204,204);background-color:rgb(255,255,255);line-height:21px">wsre=
p_cluster_size 1</div></li></ol></div><div><br>node2:</=
div><div><ol style=3D"margin:0px;padding:0px 0px 0px 48px;color:rgb(172,172=
,172);font-family:Consolas,Menlo,Monaco,'Lucida Console','Liberation Mono',=
'DejaVu Sans Mono','Bitstream Vera Sans Mono',monospace,serif;font-size:12p=
x;line-height:21px;background-color:rgb(248,248,248)"><li><div style=3D"pad=
ding-right:5px;padding-left:5px;vertical-align:top;color:rgb(0,0,0);border-=
left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);b=
ackground-color:rgb(255,255,255);line-height:21px">wsrep_cluster_conf_id &n=
bsp; 10</div></li><li><div style=3D"padding-right:5px;padding-left:5px;vert=
ical-align:top;color:rgb(0,0,0);border-left-width:1px;border-left-style:sol=
id;border-left-color:rgb(204,204,204);background-color:rgb(255,255,255);lin=
e-height:21px">wsrep_cluster_size 2</div></li></ol></di=
v><div><br></div><div>Different nodes on each one and different conf_ids th=
e first one split and formed a cluster of it's own using the same uuid as i=
t seems.</div><div><br></div><div><br></div><div><br></div><div>On Wednesda=
y, November 14, 2012 11:40:57 AM UTC, myeagleflies wrote:<blockquote class=
=3D"gmail_quote" style=3D"margin:0;margin-left:0.8ex;border-left:1px #ccc s=
olid;padding-left:1ex">Please take a look <a href=3D"http://pastebin.com/e5=
tZ0YL3" target=3D"_blank">http://pastebin.com/e5tZ0YL3</a><br><br>On Wednes=
day, November 14, 2012 10:50:22 AM UTC, Ilias Bertsimas wrote:<blockquote c=
lass=3D"gmail_quote" style=3D"margin:0;margin-left:0.8ex;border-left:1px #c=
cc solid;padding-left:1ex"><font face=3D"verdana, sans-serif">Hi,</font><di=
v><font face=3D"verdana, sans-serif"><br></font></div><div><font face=3D"ve=
rdana, sans-serif">That sounds weird, could you please post the results of =
SHOW STATUS LIKE 'WSREP%'; from both of your nodes ?</font></div><div><font=
face=3D"verdana, sans-serif"><br></font></div><div><font face=3D"verdana, =
sans-serif">Kind Regards,</font></div><div><font face=3D"verdana, sans-seri=
f">Ilias.</font></div><div><br>On Wednesday, November 14, 2012 10:46:14 AM =
UTC, myeagleflies wrote:<blockquote class=3D"gmail_quote" style=3D"margin:0=
;margin-left:0.8ex;border-left:1px #ccc solid;padding-left:1ex">We started =
testing Galera + Magento 10 days ago. Our setup is: 2 galera nodes and 2 ga=
rbd nodes. One of the galera nodes has virtual IP assigned to it via pacema=
ker. This is our cluster IP. Magento webservers communicate with cluster IP=
. This way one node is de facto primary and second galera node is secondary=
. We do not need to talk with both nodes at same time. Projected traffic to=
our website is small. Main purpose for Galera is to allow for easy failove=
r.<br><br>I found some issues recently:<br>- I could not run 'USE magento' =
on one of the nodes. googled for it and someone suggested rebooting such no=
de. Reboot seems to fix the issue. node is now properly connected to cluste=
r.<br>- there are differences in data on both nodes. some tables contain mo=
re rows on one node than on another. and there is even one table which is m=
issing on one of the nodes!<br><br>Difference in data look scary. Is this t=
ypical? What is recommended way of troubleshooting this issue? Is Magento n=
ot fully compatible with Galera? <br><br>We do have some tables without pri=
mary keys. If DELETE operations were performed on those tables this could e=
xplain differences in data in those tables. However this does not explain w=
hy one table is missing. <br><br>I am quite puzzled and looking for an advi=
ce.<br><br>Thanks in advance!<br></blockquote></div></blockquote></blockquo=
te></div></blockquote></div></blockquote></div></blockquote></blockquote></=
div></blockquote>
------=_Part_178_1132857.1352897480144--
------=_Part_177_143812.1352897480144--