ctdb relock file issues with glusterfs

patrick medina

unread,

Oct 8, 2012, 10:55:40 PM10/8/12

to

Howdy samba folks,

I've been running into a lot of issues lately with ctdb's re-lock file and
glusterfs as the shared storage. When I started, I could get one or the
other node to become healthy, but at least one would complain it could not
lock the re-lock file. Nowi'm at the point where neither node will become
healthy and stay in a recovery loop. Just to be sure it was the re-lock
file, I commented it out in the config and both nodes became healthy.

I verified posix locking is enabled, along with ping_pong on both nodes
(919565 locks/sec (node 1), 924201 locks/sec (node 2)).

Here is a snip from the log files that continuously loops until I stop the
service.

2012/10/08 20:24:57.165153 [recoverd:29685]: The rerecovery timeout has
elapsed. We now allow recoveries to trigger again.
2012/10/08 20:24:57.167774 [recoverd:29685]: server/ctdb_recoverd.c:2739
reclock child process returned error 5
2012/10/08 20:24:57.167840 [recoverd:29685]: server/ctdb_recoverd.c:2839
reclock child failed when checking file
2012/10/08 20:24:57.169273 [recoverd:29685]: Failed check_recovery_lock.
Force a recovery
2012/10/08 20:24:57.169288 [recoverd:29685]: server/ctdb_recoverd.c:1364
Starting do_recovery
2012/10/08 20:24:57.169297 [recoverd:29685]: Taking out recovery lock from
recovery daemon
2012/10/08 20:24:57.169306 [recoverd:29685]: Take the recovery lock
2012/10/08 20:24:57.174917 [recoverd:29685]: Recovery lock taken
successfully
2012/10/08 20:24:57.174931 [recoverd:29685]: ctdb_recovery_lock: Got
recovery lock on '/mnt/gluster/ctdb/lock'
2012/10/08 20:24:57.174999 [recoverd:29685]: Recovery lock taken
successfully by recovery daemon
2012/10/08 20:24:57.175011 [recoverd:29685]: server/ctdb_recoverd.c:1401
Recovery initiated due to problem with node 0
2012/10/08 20:24:57.175063 [recoverd:29685]: server/ctdb_recoverd.c:1426
Recovery - created remote databases
2012/10/08 20:24:57.175076 [recoverd:29685]: server/ctdb_recoverd.c:1433
Recovery - updated db priority for all databases
2012/10/08 20:24:57.175160 [29614]: Freeze priority 1
2012/10/08 20:24:57.176049 [29614]: Freeze priority 2
2012/10/08 20:24:57.177009 [29614]: Freeze priority 3
2012/10/08 20:24:57.177943 [29614]: server/ctdb_recover.c:724 Recovery mode
set to ACTIVE
2012/10/08 20:24:57.178041 [29614]: server/ctdb_recover.c:1080
startrecovery eventscript has been invoked
2012/10/08 20:24:57.322767 [recoverd:29685]: server/ctdb_recoverd.c:1470
Recovery - updated flags
2012/10/08 20:24:57.323054 [recoverd:29685]: server/ctdb_recoverd.c:1514
started transactions on all nodes
2012/10/08 20:24:57.323069 [recoverd:29685]: server/ctdb_recoverd.c:1527
Recovery - starting database commits
2012/10/08 20:24:57.323120 [recoverd:29685]: server/ctdb_recoverd.c:1539
Recovery - committed databases
2012/10/08 20:24:57.323224 [recoverd:29685]: server/ctdb_recoverd.c:1589
Recovery - updated vnnmap
2012/10/08 20:24:57.323275 [recoverd:29685]: server/ctdb_recoverd.c:1598
Recovery - updated recmaster
2012/10/08 20:24:57.323399 [recoverd:29685]: server/ctdb_recoverd.c:1615
Recovery - updated flags
2012/10/08 20:24:57.323431 [29614]: server/ctdb_recover.c:724 Recovery mode
set to NORMAL
2012/10/08 20:24:57.323455 [29614]: Thawing priority 1
2012/10/08 20:24:57.323465 [29614]: Release freeze handler for prio 1
2012/10/08 20:24:57.323494 [29614]: Thawing priority 2
2012/10/08 20:24:57.323504 [29614]: Release freeze handler for prio 2
2012/10/08 20:24:57.323517 [29614]: Thawing priority 3
2012/10/08 20:24:57.323527 [29614]: Release freeze handler for prio 3
2012/10/08 20:24:57.328926 [recoverd:29685]: server/ctdb_recoverd.c:1624
Recovery - disabled recovery mode
2012/10/08 20:24:57.329037 [recoverd:29685]: Failed to find node to cover
ip 10.100.55.198
2012/10/08 20:24:57.329161 [29614]: Forced running of eventscripts with
arguments ipreallocated
2012/10/08 20:24:57.471796 [29614]: Recovery has finished
2012/10/08 20:24:57.577986 [recoverd:29685]: server/ctdb_recoverd.c:1650
Recovery - finished the recovered event
2012/10/08 20:24:57.578053 [recoverd:29685]: server/ctdb_recoverd.c:1656
Recovery complete
2012/10/08 20:24:57.578063 [recoverd:29685]: Resetting ban count to 0 for
all nodes
2012/10/08 20:24:57.578071 [recoverd:29685]: Just finished a recovery. New
recoveries will now be supressed for the rerecovery timeout (10 seconds)
2012/10/08 20:24:57.758349 [29614]: server/ctdb_monitor.c:257 wait for
pending recoveries to end. Wait one more second.

Thanks in advance,
Gilbert

Amitay Isaacs

unread,

Oct 9, 2012, 1:32:12 AM10/9/12

to

Hi Gilbert,

What version of CTDB are you using? Can you attach the log file where
you notice CTDB is continuously going in recovery? It would be useful
to get log files from all the nodes.

Thanks.

Amitay.

Volker Lendecke

unread,

Oct 9, 2012, 4:22:28 AM10/9/12

to

On Tue, Oct 09, 2012 at 04:32:12PM +1100, Amitay Isaacs wrote:
> Hi Gilbert,
>
> On Tue, Oct 9, 2012 at 1:55 PM, patrick medina <pgmed...@gmail.com> wrote:
> > Howdy samba folks,
> >
> > I've been running into a lot of issues lately with ctdb's re-lock file and
> > glusterfs as the shared storage. When I started, I could get one or the
> > other node to become healthy, but at least one would complain it could not
> > lock the re-lock file. Nowi'm at the point where neither node will become
> > healthy and stay in a recovery loop. Just to be sure it was the re-lock
> > file, I commented it out in the config and both nodes became healthy.
> >
> > I verified posix locking is enabled, along with ping_pong on both nodes
> > (919565 locks/sec (node 1), 924201 locks/sec (node 2)).

On the same file? Wow, this is several orders of magnitude
more than I have seen with any other cluster file system I
have seen so far. Either you are using some VERY low latency
interlink, glusterfs is doing some really good magic, or
fcntl locks are not propagated. I am seriously impressed.

Volker

--
SerNet GmbH, Bahnhofsallee 1b, 37081 Göttingen
phone: +49-551-370000-0, fax: +49-551-370000-9
AG Göttingen, HRB 2816, GF: Dr. Johannes Loxen
http://www.sernet.de, mailto:kon...@sernet.de

Martin Schwenke

unread,

Oct 9, 2012, 7:20:09 AM10/9/12

to

On Tue, 9 Oct 2012 16:32:12 +1100, Amitay Isaacs <ami...@gmail.com>
wrote:

> On Tue, Oct 9, 2012 at 1:55 PM, patrick medina <pgmed...@gmail.com> wrote:
> > Howdy samba folks,
> >
> > I've been running into a lot of issues lately with ctdb's re-lock file and
> > glusterfs as the shared storage. When I started, I could get one or the
> > other node to become healthy, but at least one would complain it could not
> > lock the re-lock file. Nowi'm at the point where neither node will become
> > healthy and stay in a recovery loop. Just to be sure it was the re-lock
> > file, I commented it out in the config and both nodes became healthy.

> What version of CTDB are you using? Can you attach the log file where
> you notice CTDB is continuously going in recovery? It would be useful
> to get log files from all the nodes.

Michael Adam and I took a look at this on the weekend. Gilbert sent me
some logs and this was happening:

ctdb_recovery_lock: Got recovery lock on '/mnt/gluster/ctdb/lock'

That seems to indicate that locking isn't working as expected...

peace & happiness,
martin

patrick medina

unread,

Oct 9, 2012, 6:10:28 PM10/9/12

to

Afternoon/Morning Samba folks,

I finally made some progress this afternoon, let me explain what I found.

1. When I created the lock file, I had set it to chmod 777 (rwxrwxrwx)
Thinking about permissions, I recreated the lock file with rw-r--r--.
After doing this I am now able to bring one node to healthy at a time, but
the other node will stay unhealthy. I am able to juggle healthy nodes by
shutting the ctdb service down and the 2nd node will become healthy.

Log file on the unhealthy nodes complain about the recovery lock file not
locked:

2012/10/09 14:55:40.335328 [set_recmode:16493]: ctdb_recovery_lock: Got
recovery lock on '/mnt/gluster/ctdb/lock'
2012/10/09 14:55:40.335448 [set_recmode:16493]: ERROR: recovery lock file
/mnt/gluster/ctdb/lock not locked when recovering!

2. I created new mount point on one of the nodes, so each node has a
unique mount to gluster. Depending on which node starts first, the
unhealthy node complaints about the others recovery lock location. How can
this be if each node has it's on config file to go off of?

Node1: CTDB_RECOVERY_LOCK="/mnt/fuse/ctdb/lock"
ctdb_recovery_lock: Unable to open /mnt/gluster/ctdb/lock - (No such file
or directory)

Node2: CTDB_RECOVERY_LOCK="/mnt/gluster/ctdb/lock"
ctdb_recovery_lock: Unable to open /mnt/fuse/ctdb/lock - (No such file or
directory)

Thanks again, I'm not sure where to troubleshoot next.

Regards,
Gilbert

Morten Bøhmer

unread,

Oct 9, 2012, 6:21:31 PM10/9/12

to

Can confirm that I am experiencing the exact same issue.

Would love to be able to solve this .....

Morten
________________________________________
From: samba-techn...@lists.samba.org [samba-techn...@lists.samba.org] on behalf of patrick medina [pgmed...@gmail.com]
Sent: Wednesday, October 10, 2012 12:10 AM
To: samba-t...@lists.samba.org
Subject: Re: ctdb relock file issues with glusterfs

Michael Adam

unread,

Oct 10, 2012, 6:03:42 AM10/10/12

to

Hi folks,

as indicated elsewhere already, before even trying to start and
debug ctdb, you should make sure that your cluster setup provides
correct posix fcntl byte range locks, by using the ping_pong
tool shipped with the ctdb package:

https://wiki.samba.org/index.php/Ping_pong

It is important to verify that the locks really reach "the other
node", i.e. there is real lock contention.

This can in particular be tested with the -rw option to
ping_pong: If you run "ping_pong -rw /path/to/file 3" on
one node and then "ping_pong -rw /path/to/file 3" on a second
node, you should see the "data increment" notice (going from "1"
to "2"), indicating that you now have two processes operating
on the same file. If this stays constant (at 1) then your gluster
setup does not provide sufficient fcntl byte range lock support.

Another way to verify this without "-rw" is using file that is
one too small: run "ping_pong /path/to/file 2" on one node and
then the same command on a second node. These should block and
not print positive lock rates. If instead both happily print positive
lock rates then your locks don't reach the other node and you
need to fix your setup...

Cheers - Michael

patrick medina

unread,

Oct 10, 2012, 11:55:52 AM10/10/12

to

Thanks Michael,

The way you explained ping_pong (going from "1"
to "2") isn't explain as well on the wiki so i'll test and most likely
verify it will not increment.

Cheers - Gil

Morten Bøhmer

unread,

Oct 10, 2012, 12:21:04 PM10/10/12

to

Can also confirm that it does not increment when running -rw

I have tried to change the underlying filesystem to XFS today, to avoid the bug mentioned here: http://joejulian.name/blog/glusterfs-bit-by-ext4-structure-change/

dunno if it is an issue, but I tried anyway, same results.

Morten
________________________________
From: patrick medina [pgmed...@gmail.com]
Sent: Wednesday, October 10, 2012 5:55 PM
To: Michael Adam
Cc: Morten Bøhmer; samba-t...@lists.samba.org

Subject: Re: ctdb relock file issues with glusterfs

Thanks Michael,

The way you explained ping_pong (going from "1"
to "2") isn't explain as well on the wiki so i'll test and most likely verify it will not increment.

Cheers - Gil

> From: samba-techn...@lists.samba.org<mailto:samba-techn...@lists.samba.org> [samba-techn...@lists.samba.org<mailto:samba-techn...@lists.samba.org>] on behalf of patrick medina [pgmed...@gmail.com<mailto:pgmed...@gmail.com>]

> Sent: Wednesday, October 10, 2012 12:10 AM

> To: samba-t...@lists.samba.org<mailto:samba-t...@lists.samba.org>

> On Tue, Oct 9, 2012 at 5:20 AM, Martin Schwenke <mar...@meltin.net<mailto:mar...@meltin.net>> wrote:
>
> > On Tue, 9 Oct 2012 16:32:12 +1100, Amitay Isaacs <ami...@gmail.com<mailto:ami...@gmail.com>>
> > wrote:
> >
> > > On Tue, Oct 9, 2012 at 1:55 PM, patrick medina <pgmed...@gmail.com<mailto:pgmed...@gmail.com>>

Morten Bøhmer

unread,

Oct 12, 2012, 9:34:04 AM10/12/12

to

Hi Patrick

Any luck with your setup yet ?

I am now seriously looking into trying some other clusterfs to make ctdb work.

Morten

ronnie sahlberg

unread,

Oct 12, 2012, 3:06:39 PM10/12/12

to

Ctdb requires that you mount the filesystem in the same place on all the nodes.
During recovery, the node that is elected the recovery master node
will make sure that all nodes
are using the same file/path for the rec-lock file.

This is to prevent user mistakes from creating issues.
I.e. if you changed the CTDB_RECOVERY_LOCK setting on just some, but
not all, of the nodes.
This ensures that all nodes use the same setting.

The mechanism to ensure split-brain avoidance, ctdb uses fcntl()
locking on the rec-lock file as an arbitrator of which node is the
recover master.
Here is why correct fcntl() locking behaviour in the underlying
cluster filesystem becomes important.
I dont think your filesystem supports working/correct fcntl() locking,
so that is where these warning emssages come from.

You can still use ctdb on such fuilesystems, but you will have to
comment out the CTDB_RECOVERY_LOCK file from /etc/sysconfig/ctdb.
This disables the split-brain prevention completely, but allows you to
use filesystems with broken fcntl() locking.

Additionally, if you disable this because of broken fcntl() locking,
you should probably also
make sure to NOT export data via NFS since the lock manager in nfs
will not work reliable.

Since you can not use NFS at the same time as samba safely with that
filesystem, you can thus also change the samba config to "posix
locking = no". This will make samba do all file locking internally and
never propagate them to the kernel/filesystem and may provide some
performance boost.

regards
ronnie sahlberg

>> wrote:
>>
>> > On Tue, Oct 9, 2012 at 1:55 PM, patrick medina <pgmed...@gmail.com>

patrick medina

unread,

Oct 15, 2012, 11:56:39 AM10/15/12

to

Morning Morten,

I have been out of the office since Thursday, but am back today and ready
to knock this out. I'll keep you posted on what i find later this
afternoon.

Cheers

On Fri, Oct 12, 2012 at 7:34 AM, Morten Bøhmer <Morten...@pilaro.no>wrote:

> Hi Patrick****
>
> ** **
>
> Any luck with your setup yet ?****
>
> ** **
>
> ** **

>
> I am now seriously looking into trying some other clusterfs to make ctdb

> work.****
>
> ** **
>
> ** **
>
> Morten****
>
> ** **
>
> ** **
>
> *Fra:* patrick medina [mailto:pgmed...@gmail.com]
> *Sendt:* 10. oktober 2012 17:56
> *Til:* Michael Adam
> *Kopi:* Morten Bøhmer; samba-t...@lists.samba.org
> *Emne:* Re: ctdb relock file issues with glusterfs****
>
> ** **
>
> Thanks Michael, ****
>
> ** **
>
> The way you explained ping_pong (going from "1"****

>
> to "2") isn't explain as well on the wiki so i'll test and most likely

> verify it will not increment. ****
>
> ** **
>
> Cheers - Gil ****
>
> On Wed, Oct 10, 2012 at 4:03 AM, Michael Adam <ob...@samba.org> wrote:****

>
> Hi folks,
>
> as indicated elsewhere already, before even trying to start and
> debug ctdb, you should make sure that your cluster setup provides
> correct posix fcntl byte range locks, by using the ping_pong
> tool shipped with the ctdb package:
>
> https://wiki.samba.org/index.php/Ping_pong
>
> It is important to verify that the locks really reach "the other
> node", i.e. there is real lock contention.
>
> This can in particular be tested with the -rw option to
> ping_pong: If you run "ping_pong -rw /path/to/file 3" on
> one node and then "ping_pong -rw /path/to/file 3" on a second
> node, you should see the "data increment" notice (going from "1"
> to "2"), indicating that you now have two processes operating
> on the same file. If this stays constant (at 1) then your gluster
> setup does not provide sufficient fcntl byte range lock support.
>
> Another way to verify this without "-rw" is using file that is
> one too small: run "ping_pong /path/to/file 2" on one node and
> then the same command on a second node. These should block and
> not print positive lock rates. If instead both happily print positive
> lock rates then your locks don't reach the other node and you
> need to fix your setup...
>

> Cheers - Michael****

>
>
> On 2012-10-09 at 22:21 +0000, Morten Bøhmer wrote:
> > Can confirm that I am experiencing the exact same issue.
> >
> > Would love to be able to solve this .....
> >
> >
> > Morten
> > ________________________________________

> > From: samba-techn...@lists.samba.org [

> samba-techn...@lists.samba.org] on behalf of patrick medina [

> pgmed...@gmail.com]

> > Sent: Wednesday, October 10, 2012 12:10 AM
> > To: samba-t...@lists.samba.org

> > Subject: Re: ctdb relock file issues with glusterfs
> >

> > > martin****
>
> ** **
>

Morten Bøhmer

unread,

Oct 15, 2012, 11:59:41 AM10/15/12

to

Thank you.

For the heck of it I installed a couple of Centos virtual servers and configure ctdb+glusterfs+xfs+samba, got it working, but without relock.

Not sure how important it is, but I guess time will show :)

Morten

Fra: patrick medina [mailto:pgmed...@gmail.com]

Sendt: 15. oktober 2012 17:57
Til: Morten Bøhmer
Kopi: Michael Adam; samba-t...@lists.samba.org
Emne: Re: ctdb relock file issues with glusterfs

Morning Morten,

I have been out of the office since Thursday, but am back today and ready to knock this out. I'll keep you posted on what i find later this afternoon.

Cheers

On Fri, Oct 12, 2012 at 7:34 AM, Morten Bøhmer <Morten...@pilaro.no<mailto:Morten...@pilaro.no>> wrote:
Hi Patrick

Any luck with your setup yet ?

I am now seriously looking into trying some other clusterfs to make ctdb work.

Morten

Fra: patrick medina [mailto:pgmed...@gmail.com<mailto:pgmed...@gmail.com>]
Sendt: 10. oktober 2012 17:56
Til: Michael Adam

Kopi: Morten Bøhmer; samba-t...@lists.samba.org<mailto:samba-t...@lists.samba.org>
Emne: Re: ctdb relock file issues with glusterfs

Thanks Michael,

The way you explained ping_pong (going from "1"

to "2") isn't explain as well on the wiki so i'll test and most likely verify it will not increment.

Cheers - Gil

On Wed, Oct 10, 2012 at 4:03 AM, Michael Adam <ob...@samba.org<mailto:ob...@samba.org>> wrote:
Hi folks,

as indicated elsewhere already, before even trying to start and
debug ctdb, you should make sure that your cluster setup provides
correct posix fcntl byte range locks, by using the ping_pong
tool shipped with the ctdb package:

https://wiki.samba.org/index.php/Ping_pong

It is important to verify that the locks really reach "the other
node", i.e. there is real lock contention.

This can in particular be tested with the -rw option to
ping_pong: If you run "ping_pong -rw /path/to/file 3" on
one node and then "ping_pong -rw /path/to/file 3" on a second
node, you should see the "data increment" notice (going from "1"
to "2"), indicating that you now have two processes operating
on the same file. If this stays constant (at 1) then your gluster
setup does not provide sufficient fcntl byte range lock support.

Another way to verify this without "-rw" is using file that is
one too small: run "ping_pong /path/to/file 2" on one node and
then the same command on a second node. These should block and
not print positive lock rates. If instead both happily print positive
lock rates then your locks don't reach the other node and you
need to fix your setup...

Cheers - Michael

On 2012-10-09 at 22:21 +0000, Morten Bøhmer wrote:
> Can confirm that I am experiencing the exact same issue.
>
> Would love to be able to solve this .....
>
>
> Morten
> ________________________________________

> From: samba-techn...@lists.samba.org<mailto:samba-techn...@lists.samba.org> [samba-techn...@lists.samba.org<mailto:samba-techn...@lists.samba.org>] on behalf of patrick medina [pgmed...@gmail.com<mailto:pgmed...@gmail.com>]

> Sent: Wednesday, October 10, 2012 12:10 AM

> To: samba-t...@lists.samba.org<mailto:samba-t...@lists.samba.org>

> On Tue, Oct 9, 2012 at 5:20 AM, Martin Schwenke <mar...@meltin.net<mailto:mar...@meltin.net>> wrote:
>
> > On Tue, 9 Oct 2012 16:32:12 +1100, Amitay Isaacs <ami...@gmail.com<mailto:ami...@gmail.com>>
> > wrote:
> >
> > > On Tue, Oct 9, 2012 at 1:55 PM, patrick medina <pgmed...@gmail.com<mailto:pgmed...@gmail.com>>

Christopher R. Hertel

unread,

Oct 15, 2012, 2:49:37 PM10/15/12

to

Morten, Patrick:

Please try your tests with the following option on mount:
--direct-io-mode=enable

Let us know whether that changes your test results.

Thanks.

Chris -)-----

--
"Implementing CIFS - the Common Internet FileSystem" ISBN: 013047116X
Samba Team -- http://www.samba.org/ -)----- Christopher R. Hertel
jCIFS Team -- http://jcifs.samba.org/ -)----- ubiqx development, uninq.
ubiqx Team -- http://www.ubiqx.org/ -)----- c...@ubiqx.mn.org
OnLineBook -- http://ubiqx.org/cifs/ -)----- c...@ubiqx.org

Martin Schwenke

unread,

Oct 15, 2012, 3:30:27 PM10/15/12

to

Hi Morten,

On Mon, 15 Oct 2012 15:59:41 +0000, Morten Bøhmer
<Morten...@pilaro.no> wrote:

> For the heck of it I installed a couple of Centos virtual servers and configure ctdb+glusterfs+xfs+samba, got it working, but without relock.
>
> Not sure how important it is, but I guess time will show :)

As others have pointed out, the reclock file is crucial unless you
have some other way of avoiding a split brain. You won't know you need
it until you get a split brain and something gets badly corrupted.

Additionally, as Ronnie mentioned, if the reclock file isn't working
then there's something wrong with your locking, so you have more than a
reclock problem...

peace & happiness,
martin

ronnie sahlberg

unread,

Oct 15, 2012, 4:16:22 PM10/15/12

to

At least for larger clusters, say 45 or more nodes
we could implement something like a
"as long as floor(n/2)+1 nodes are connected"
quorum

that would make it safe to use without a reclock file on clusters with
5 or more nodes.

Morten Bøhmer

unread,

Oct 15, 2012, 6:07:35 PM10/15/12

to

THANK YOU!!!

This did it for me :)

Ping_pong is now showing correct results

Morten

-----Opprinnelig melding-----
Fra: samba-techn...@lists.samba.org [mailto:samba-techn...@lists.samba.org] På vegne av Christopher R. Hertel
Sendt: 15. oktober 2012 20:50
Til: samba-t...@lists.samba.org
Emne: Re: SV: ctdb relock file issues with glusterfs

Morten, Patrick:

Please try your tests with the following option on mount:
--direct-io-mode=enable

Let us know whether that changes your test results.

Thanks.

Chris -)-----

On 10/15/2012 10:59 AM, Morten Bøhmer wrote:
> Thank you.
>

> For the heck of it I installed a couple of Centos virtual servers and configure ctdb+glusterfs+xfs+samba, got it working, but without relock.
>
> Not sure how important it is, but I guess time will show :)
>
>

>> samba-techn...@lists.samba.org<mailto:samba-technical-bounce
>> s...@lists.samba.org>
>> [samba-techn...@lists.samba.org<mailto:samba-technical-bounc
>> e...@lists.samba.org>] on behalf of patrick medina

patrick medina

unread,

Oct 17, 2012, 5:01:44 PM10/17/12

to

Yes, a very big thank you to Christopher and everyone else, that mount
command did the trick for me as well.

At first this didn't work, but I moved the location of the lock file to the
root of the gluster share. (/mnt/gluster/ctdb/lock to /mnt/gluster/lock)
Now we're all healthy and happy!

Morten, was your lock file in the same location or did you have to move it
as well?

Regards,
PG

Christopher R. Hertel

unread,

Oct 17, 2012, 5:27:53 PM10/17/12

to

I am still doing some testing on this, so I appreciate your feedback and
comments. It just happened that I was working on the same problem at the
same time. :)

Chris -)-----

Morten Bøhmer

unread,

Oct 17, 2012, 5:40:05 PM10/17/12

to

Hi

I had my gluster-volume area on /mnt/gluster/lock and mounted the glustervolume on /mnt/lock

In other words I made on gluster volume exclusively for ctdb.

Morten
________________________________
From: patrick medina [pgmed...@gmail.com]
Sent: Wednesday, October 17, 2012 11:01 PM
To: Morten Bøhmer
Cc: Christopher R. Hertel; samba-t...@lists.samba.org
Subject: Re: SV: ctdb relock file issues with glusterfs

Yes, a very big thank you to Christopher and everyone else, that mount command did the trick for me as well.

At first this didn't work, but I moved the location of the lock file to the root of the gluster share. (/mnt/gluster/ctdb/lock to /mnt/gluster/lock) Now we're all healthy and happy!

Morten, was your lock file in the same location or did you have to move it as well?

Regards,
PG

On Mon, Oct 15, 2012 at 4:07 PM, Morten Bøhmer <Morten...@pilaro.no<mailto:Morten...@pilaro.no>> wrote:
THANK YOU!!!

This did it for me :)

Ping_pong is now showing correct results

Morten

-----Opprinnelig melding-----

Fra: samba-techn...@lists.samba.org<mailto:samba-techn...@lists.samba.org> [mailto:samba-techn...@lists.samba.org<mailto:samba-techn...@lists.samba.org>] På vegne av Christopher R. Hertel

Sendt: 15. oktober 2012 20:50

Til: samba-t...@lists.samba.org<mailto:samba-t...@lists.samba.org>

Emne: Re: SV: ctdb relock file issues with glusterfs

Morten, Patrick:

Please try your tests with the following option on mount:
--direct-io-mode=enable

Let us know whether that changes your test results.

Thanks.

Chris -)-----

On 10/15/2012 10:59 AM, Morten Bøhmer wrote:
> Thank you.
>
> For the heck of it I installed a couple of Centos virtual servers and configure ctdb+glusterfs+xfs+samba, got it working, but without relock.
>
> Not sure how important it is, but I guess time will show :)
>
>
> Morten
>

> Fra: patrick medina [mailto:pgmed...@gmail.com<mailto:pgmed...@gmail.com>]
> Sendt: 15. oktober 2012 17:57
> Til: Morten Bøhmer
> Kopi: Michael Adam; samba-t...@lists.samba.org<mailto:samba-t...@lists.samba.org>
> Emne: Re: ctdb relock file issues with glusterfs
>

> Morning Morten,
>
> I have been out of the office since Thursday, but am back today and ready to knock this out. I'll keep you posted on what i find later this afternoon.
>
> Cheers
>

> On Fri, Oct 12, 2012 at 7:34 AM, Morten Bøhmer <Morten...@pilaro.no<mailto:Morten...@pilaro.no><mailto:Morten...@pilaro.no<mailto:Morten...@pilaro.no>>> wrote:
> Hi Patrick
>
> Any luck with your setup yet ?
>
>
> I am now seriously looking into trying some other clusterfs to make ctdb work.
>
>
> Morten
>
>
> Fra: patrick medina
> [mailto:pgmed...@gmail.com<mailto:pgmed...@gmail.com><mailto:pgmed...@gmail.com<mailto:pgmed...@gmail.com>>]
> Sendt: 10. oktober 2012 17:56
> Til: Michael Adam
> Kopi: Morten Bøhmer;
> samba-t...@lists.samba.org<mailto:samba-t...@lists.samba.org><mailto:samba-t...@lists.samba.org<mailto:samba-t...@lists.samba.org>
> >
> Emne: Re: ctdb relock file issues with glusterfs
>
> Thanks Michael,
>
> The way you explained ping_pong (going from "1"
> to "2") isn't explain as well on the wiki so i'll test and most likely verify it will not increment.
>
> Cheers - Gil

>> samba-techn...@lists.samba.org<mailto:samba-techn...@lists.samba.org><mailto:samba-technical-bounce<mailto:samba-technical-bounce>
>> s...@lists.samba.org<mailto:s...@lists.samba.org>>
>> [samba-techn...@lists.samba.org<mailto:samba-techn...@lists.samba.org><mailto:samba-technical-bounc<mailto:samba-technical-bounc>
>> e...@lists.samba.org<mailto:e...@lists.samba.org>>] on behalf of patrick medina
>> [pgmed...@gmail.com<mailto:pgmed...@gmail.com><mailto:pgmed...@gmail.com<mailto:pgmed...@gmail.com>>]

>> Sent: Wednesday, October 10, 2012 12:10 AM
>> To:

>> samba-t...@lists.samba.org<mailto:samba-t...@lists.samba.org><mailto:samba-t...@lists.samba.or<mailto:samba-t...@lists.samba.or>

>>> <ami...@gmail.com<mailto:ami...@gmail.com><mailto:ami...@gmail.com<mailto:ami...@gmail.com>>>

>>> wrote:
>>>
>>>> On Tue, Oct 9, 2012 at 1:55 PM, patrick medina

>>>> <pgmed...@gmail.com<mailto:pgmed...@gmail.com><mailto:pgmed...@gmail.com<mailto:pgmed...@gmail.com>>>

ubiqx Team -- http://www.ubiqx.org/ -)----- c...@ubiqx.mn.org<mailto:c...@ubiqx.mn.org>
OnLineBook -- http://ubiqx.org/cifs/ -)----- c...@ubiqx.org<mailto:c...@ubiqx.org>

Morten Bøhmer

unread,

Oct 17, 2012, 5:42:34 PM10/17/12

to

I have done some testing, when I reboot one of my nodes, the other takes over. But when the one comes back online and ctdb begins doing it's magic to bring the node back up, both nodes goes bad for a short moment.

Is this be design ?

It does not seem like the share gets disconnected in this time period.

Morten
________________________________________
From: Christopher R. Hertel [c...@samba.org]
Sent: Wednesday, October 17, 2012 11:27 PM
To: patrick medina
Cc: Morten Bøhmer; samba-t...@lists.samba.org

Subject: Re: SV: ctdb relock file issues with glusterfs

I am still doing some testing on this, so I appreciate your feedback and

comments. It just happened that I was working on the same problem at the
same time. :)

Chris -)-----

On 10/17/2012 04:01 PM, patrick medina wrote:

> Yes, a very big thank you to Christopher and everyone else, that mount
> command did the trick for me as well.
>
> At first this didn't work, but I moved the location of the lock file to the
> root of the gluster share. (/mnt/gluster/ctdb/lock to /mnt/gluster/lock)
> Now we're all healthy and happy!
>
> Morten, was your lock file in the same location or did you have to move it
> as well?
>
> Regards,
> PG
>

> On Mon, Oct 15, 2012 at 4:07 PM, Morten Bøhmer <Morten...@pilaro.no>wrote:
>
>> THANK YOU!!!
>>
>>
>> This did it for me :)
>>
>> Ping_pong is now showing correct results
>>
>>
>> Morten
>>
>> -----Opprinnelig melding-----
>> Fra: samba-techn...@lists.samba.org [mailto:

>> samba-techn...@lists.samba.org] På vegne av Christopher R. Hertel

>> Sendt: 15. oktober 2012 20:50
>> Til: samba-t...@lists.samba.org

>> Emne: Re: SV: ctdb relock file issues with glusterfs
>>
>> Morten, Patrick:
>>
>> Please try your tests with the following option on mount:
>> --direct-io-mode=enable
>>
>> Let us know whether that changes your test results.
>>
>> Thanks.
>>
>> Chris -)-----
>>
>> On 10/15/2012 10:59 AM, Morten Bøhmer wrote:
>>> Thank you.
>>>
>>> For the heck of it I installed a couple of Centos virtual servers and
>> configure ctdb+glusterfs+xfs+samba, got it working, but without relock.
>>>
>>> Not sure how important it is, but I guess time will show :)
>>>
>>>
>>> Morten
>>>

>>> Fra: patrick medina [mailto:pgmed...@gmail.com]
>>> Sendt: 15. oktober 2012 17:57
>>> Til: Morten Bøhmer
>>> Kopi: Michael Adam; samba-t...@lists.samba.org
>>> Emne: Re: ctdb relock file issues with glusterfs
>>>
>>> Morning Morten,
>>>
>>> I have been out of the office since Thursday, but am back today and
>> ready to knock this out. I'll keep you posted on what i find later this
>> afternoon.
>>>
>>> Cheers
>>>
>>> On Fri, Oct 12, 2012 at 7:34 AM, Morten Bøhmer <Morten...@pilaro.no

>> <mailto:Morten...@pilaro.no>> wrote:
>>> Hi Patrick
>>>
>>> Any luck with your setup yet ?
>>>
>>>
>>> I am now seriously looking into trying some other clusterfs to make ctdb
>> work.
>>>
>>>
>>> Morten
>>>
>>>
>>> Fra: patrick medina
>>> [mailto:pgmed...@gmail.com<mailto:pgmed...@gmail.com>]
>>> Sendt: 10. oktober 2012 17:56
>>> Til: Michael Adam
>>> Kopi: Morten Bøhmer;
>>> samba-t...@lists.samba.org<mailto:samba-t...@lists.samba.org
>>>>

>>> Emne: Re: ctdb relock file issues with glusterfs
>>>
>>> Thanks Michael,
>>>
>>> The way you explained ping_pong (going from "1"
>>> to "2") isn't explain as well on the wiki so i'll test and most likely
>> verify it will not increment.
>>>
>>> Cheers - Gil
>>> On Wed, Oct 10, 2012 at 4:03 AM, Michael Adam <ob...@samba.org<mailto:

>>>> samba-techn...@lists.samba.org<mailto:samba-technical-bounce
>>>> s...@lists.samba.org>
>>>> [samba-techn...@lists.samba.org<mailto:samba-technical-bounc

>>>> e...@lists.samba.org>] on behalf of patrick medina

>>>> [pgmed...@gmail.com<mailto:pgmed...@gmail.com>]

>>>> Sent: Wednesday, October 10, 2012 12:10 AM
>>>> To:

>>>> samba-t...@lists.samba.org<mailto:samba-t...@lists.samba.or

>> <mailto:mar...@meltin.net>> wrote:
>>>>
>>>>> On Tue, 9 Oct 2012 16:32:12 +1100, Amitay Isaacs
>>>>> <ami...@gmail.com<mailto:ami...@gmail.com>>

>>>>> wrote:
>>>>>
>>>>>> On Tue, Oct 9, 2012 at 1:55 PM, patrick medina
>>>>>> <pgmed...@gmail.com<mailto:pgmed...@gmail.com>>

>> OnLineBook -- http://ubiqx.org/cifs/ -)----- c...@ubiqx.org
>>

--

"Implementing CIFS - the Common Internet FileSystem" ISBN: 013047116X
Samba Team -- http://www.samba.org/ -)----- Christopher R. Hertel
jCIFS Team -- http://jcifs.samba.org/ -)----- ubiqx development, uninq.
ubiqx Team -- http://www.ubiqx.org/ -)----- c...@ubiqx.mn.org

patrick medina

unread,

Oct 17, 2012, 6:12:29 PM10/17/12

to

I ran into this issue as well, the gluster folks said this was a bug and
will be corrected in the next release (3.3.1 I think..) The only work
around I found is to stop ctdb on the node you're going to reboot first,
then issue the reboot command.

PG

Christopher R. Hertel

unread,

Nov 5, 2012, 1:16:17 PM11/5/12

to

On 10/17/2012, Morten Bøhmer wrote:
> Hi
>
> I had my gluster-volume area on /mnt/gluster/lock and mounted the
> glustervolume on /mnt/lock
>
> In other words I made on gluster volume exclusively for ctdb.

Morten, et. al.,

I have tried to learn a lot about CTDB over the years, and have managed to
forget most of it (this is why I typically write stuff down...helps me
remember).

I spent a bit of time last week talking with active CTDB developers and I've
started to rebuild my understanding of the inner workings of CTDB.

Based on what you have above, I have a couple of questions for you:

- When you say you made a Gluster volume exclusively for CTDB, you mean
the recovery file. Is that correct?
- What else (CTDB-related) are you storing in the Gluster volume you
created for CTDB?

I ask, because my understanding is that the TDB files that CTDB manages are
intended to be local only, and not shared. If your configuration is
otherwise, I'd be interested in knowing.

Chris -)-----