Stuck with primary and secondary disks in *DEGRADED* status for an instance

John N.

unread,

Mar 21, 2015, 10:21:36 AM3/21/15

to gan...@googlegroups.com

Hello,

I am actually rebooting my ganeti nodes and for that I did a live migration of the instances. While migrating the instances back to their original nodes after the reboot I have one instance which has both its primary and secondary disks in *DEGRADED* status as you can see here from the output of gnt-instance info:

      on primary: /dev/drbd22 (147:22) in sync, status *DEGRADED*
      on secondary: /dev/drbd2 (147:2) in sync, status *DEGRADED*

/dev/drbd22 looks like this:

22: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r-----
    ns:0 nr:0 dw:81248 dr:228021 al:262 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:79068

/dev/drbd2 looks like this:

2: cs:StandAlone ro:Secondary/Unknown ds:UpToDate/DUnknown   r-----
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:48

Now I tried a verify-disks and activate disks nothing works I still can't migrate back my instance as I get the following error message:

Sat Mar 21 15:18:52 2015 * checking disk consistency between source and target
Failure: command execution error:
Disk 0 is degraded or not fully synchronized on target node, aborting migration

As a last resort I tried a replace-disks but that did not work neither and I got this error:

Failure: command execution error:
Node node1.domain.com has degraded storage, unsafe to replace disks for instance inst3.domain.com

What should I do now? Any ideas?

Best regards
John

sascha...@web.de

unread,

Mar 22, 2015, 5:38:18 AM3/22/15

to gan...@googlegroups.com

Hi John,

it seems your instance is in DRBD split brain[1].

I wonder what kind of replace-disks you have tried (-p, -s or -n). According
to [2] -n should work?

If you decide to go with [1], remember you can't use drbdadm, instead
drbdsetup will do the job

Thanks, Sascha.

[1] https://drbd.linbit.com/users-guide/s-resolve-split-brain.html
[2] http://blkperl.github.io/split-brain-ganeti.html

John N. writes:

John N.

unread,

Mar 22, 2015, 7:14:10 AM3/22/15

to gan...@googlegroups.com

Hi Sascha,

Thanks for your help. Indeed it was a split brain situation.

So after reading the link [2] you posted I understood why my replace-disks -n did not work: as new node I was using the same secondary node as I was expecting it to delete the degraded disk and re-create it. So I simply used another node in the -n option and it worked. Now I could still get my initial secondary node by running replace-disks -n again, this time with the initial secondary node with the degraded disks.

In fact the solution was quite easy, I just wasn't using it correctly ;)

Thanks again and have a nice Sunday!
John

Reply all

Reply to author

Forward