how to properly recreate DRBD mirrors in a two node cluster

472 views
Skip to first unread message

Lucas, Sascha

unread,
Sep 19, 2016, 9:02:43 AM9/19/16
to gan...@googlegroups.com
Hi,

given is a two node cluster (version stable-2.14/92392ef from git/source) and one node has failed (lost all its data).

The following steps went fine without problems:
- master failover with no-voting
- offline the dead node
- failover instances of the dead node
- repair/reinstall the node and readd it (with the same name)

Now I want to recreate the DRBD mirrors of the instances. I tried "replace-disks" and "recreate-disks" but without success.

node1:~ # gnt-instance replace-disks -s test.vm
Mon Sep 19 14:04:41 2016 Replacing disk(s) 0 for instance 'test.vm'
Mon Sep 19 14:04:41 2016 Current primary node: node1.local
Mon Sep 19 14:04:41 2016 Current secondary node: node0.local
Mon Sep 19 14:04:41 2016 STEP 1/6 Check device existence
Mon Sep 19 14:04:41 2016 - INFO: Checking disk/0 on node1.local
Mon Sep 19 14:04:41 2016 - INFO: Checking disk/0 on node0.local
Mon Sep 19 14:04:41 2016 - INFO: Checking disk/0 on node1.local
Mon Sep 19 14:04:41 2016 - INFO: Checking disk/0 on node0.local
Failure: command execution error:
Can't find disk/0 on node node0.local: disk not found
Disks seem to be not properly activated. Try running activate-disks on the instance before using replace-disks.

node1:~ # gnt-instance recreate-disks test.vm
Failure: prerequisites not met for this operation:
error type: wrong_state, error details:
Instance 'test.vm' is marked to be up, cannot recreate disks

I already know the way "modify --disk-template plain / drbd". But this needs downtimes for the instances.

Is there a proper way to recreate the DRBD mirror without downtime and without a third node?

Thanks, Sascha.

Aufsichtsratsvorsitzender: Norbert Rotter
Geschäftsführung: Michael Krüger
Sitz der Gesellschaft: Halle/Saale
Registergericht: Amtsgericht Stendal | Handelsregister-Nr. HRB 208414
UST-ID-Nr. DE 158253683

Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Empfänger sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail oder des Inhalts dieser Mail sind nicht gestattet. Diese Kommunikation per E-Mail ist nicht gegen den Zugriff durch Dritte geschützt. Die GISA GmbH haftet ausdrücklich nicht für den Inhalt und die Vollständigkeit von E-Mails und den gegebenenfalls daraus entstehenden Schaden. Sollte trotz der bestehenden Viren-Schutzprogramme durch diese E-Mail ein Virus in Ihr System gelangen, so haftet die GISA GmbH - soweit gesetzlich zulässig - nicht für die hieraus entstehenden Schäden.

Phil Regnauld

unread,
Sep 19, 2016, 9:40:12 AM9/19/16
to gan...@googlegroups.com
Lucas, Sascha (Sascha.Lucas) writes:
> - failover instances of the dead node
> - repair/reinstall the node and readd it (with the same name)
>
> Now I want to recreate the DRBD mirrors of the instances. I tried "replace-disks" and "recreate-disks" but without success.
>
> node1:~ # gnt-instance replace-disks -s test.vm

Make the new node be a new third node, not the old one, them replace-disks
to *that* node (replace-disks -n newnode)


Ansgar Jazdzewski

unread,
Sep 19, 2016, 2:42:24 PM9/19/16
to Ganeti
Hi,

if you can turn off the VM's for a moment you can also change the
disc-template to plain and back to drbd

hope it helps,
Ansgar

Ansgar Jazdzewski

unread,
Sep 19, 2016, 2:44:00 PM9/19/16
to Ganeti
sorry, next time i read until the end of your mail befor i write an answer ;-)

Jean-François Maeyhieux

unread,
Sep 20, 2016, 7:16:29 AM9/20/16
to ganeti

Is there a proper way to recreate the DRBD mirror without downtime and without a third node?


Only:
- add the fixed node back (same IP/FSDN)
- gnt-node modify -O no $NODE-FIXED
- gnt-cluster verify-disks

All the drbd will start to sync back from other node.

Jean-François Maeyhieux

unread,
Sep 20, 2016, 7:18:18 AM9/20/16
to ganeti
You can check the drbd reconstruction with:
- watch -n1 'drbdoverview '
- watch -n1 'cat /proc/drbd'

Lucas, Sascha

unread,
Sep 21, 2016, 4:59:35 AM9/21/16
to gan...@googlegroups.com
Hi Phil,

on Mon, 19. Sep 2016 15:40 Phil Regnauld wrote:

> Make the new node be a new third node, not the old one, them
> replace-disks
> to *that* node (replace-disks -n newnode)

I was afraid that this method is the only way to convince ganeti to create new secondary disks :-). I gave the repaired node another name/IP, added it to the cluster and "replace-disks -n newnode" works as expected. It's still a workaround, but doesn't annoy me.

Lucas, Sascha

unread,
Sep 21, 2016, 5:01:14 AM9/21/16
to gan...@googlegroups.com
Hi Ansgar,

on Mon, 19. Sep 2016 20:44 'Ansgar Jazdzewski' wrote:

> sorry, next time i read until the end of your mail befor i write an
> answer ;-)

You're welcome :-).

Lucas, Sascha

unread,
Sep 21, 2016, 5:06:08 AM9/21/16
to ganeti
Hi Jean-François,

on Tue, 20. Sep 2016 13:16 Jean-François Maeyhieux wrote:

> - gnt-cluster verify-disks

I think verify-disks only works if the secondary node has still the underlying LVs. In my case they were lost:

node0:~ # gnt-cluster verify-disks
Submitted jobs 1648
Waiting for job 1648 ...
Instance test.vm has missing logical volumes:
47401938-17ff-4abd-bcf2-ab157762d547 /dev/data/05d9da3d-3159-412c-a672-d89ed6173759.disk0_data
47401938-17ff-4abd-bcf2-ab157762d547 /dev/data/05d9da3d-3159-412c-a672-d89ed6173759.disk0_meta
You need to replace or recreate disks for all the above instances if this message persists after fixing broken nodes.

Phil Regnauld

unread,
Sep 21, 2016, 5:23:44 AM9/21/16
to gan...@googlegroups.com
Lucas, Sascha (Sascha.Lucas) writes:
>
> > Make the new node be a new third node, not the old one, them
> > replace-disks
> > to *that* node (replace-disks -n newnode)
>
> I was afraid that this method is the only way to convince ganeti to create new secondary disks :-). I gave the repaired node another name/IP, added it to the cluster and "replace-disks -n newnode" works as expected. It's still a workaround, but doesn't annoy me.

Obviously it's an unoptimal scenario to have two nodes in a
cluster, but it's good to hear that solution worked.

Maybe we can discuss that at GanetiCon :D
Reply all
Reply to author
Forward
0 new messages