drbd both disk recovering degraded

David Sedeño

unread,

Nov 12, 2014, 5:23:40 AM11/12/14

to gan...@googlegroups.com

Hi,

I have a machine that is running fine but has the disk in this state:

on primary: /dev/drbd23 (147:23) *RECOVERING* 0.00%, ETA unknown, status *DEGRADED*
on secondary: /dev/drbd23 (147:23) *RECOVERING* 0.00%, ETA unknown, status *DEGRADED* *UNCERTAIN STATE*

I have tried to run gnt-instance replace-disks -s instance but it gives the error:

Failure: command execution error:
Node node1.mydomain.com has degraded storage, unsafe to replace disks for instance

How can I recover the drbd ?

Regards,

David Sedeño

Aaron Karper

unread,

Nov 12, 2014, 5:38:05 AM11/12/14

to gan...@googlegroups.com

This sounds like there is a problem with the node communication. Does 'gnt-instance activate-disks <instance name using /dev/drbd23>' help? That activates the necessary nodes and allows them to sync.

David Sedeño

unread,

Nov 12, 2014, 6:01:53 AM11/12/14

to gan...@googlegroups.com

Hi Aaron,

El miércoles, 12 de noviembre de 2014 11:38:05 UTC+1, Aaron Karper escribió:

This sounds like there is a problem with the node communication. Does 'gnt-instance activate-disks <instance name using /dev/drbd23>' help? That activates the necessary nodes and allows them to sync.

No, it seems that I am in the same state:

# gnt-instance activate-disks instancename
node1.mydomain.com:disk/0:/dev/drbd23

# gnt-instance info instancename | grep drbd
Disk template: drbd
- disk/0: drbd, size 40.0G

on primary: /dev/drbd23 (147:23) *RECOVERING* 0.00%, ETA unknown, status *DEGRADED*
on secondary: /dev/drbd23 (147:23) *RECOVERING* 0.00%, ETA unknown, status *DEGRADED* *UNCERTAIN STATE*

I really don't know how the machine got into this state, I have others with drbd and are working fine.

Regards,

David

Aaron Karper

unread,

Nov 12, 2014, 7:36:07 AM11/12/14

to gan...@googlegroups.com

Hey David,

this looks like a DRBD split brain problem (both nodes assume that they are the DRBD primary for this device), which currently isn't fixable by ganeti (but you should check your kernel logs to see if DRBD detected that). This is usually due to the connection temporarily failing. What you probably want to do is to mark the (ganeti) primary for the instance as the (DRBD) primary of the device and discard the changes on the (ganeti) secondary, this guide http://www.hastexo.com/resources/hints-and-kinks/solve-drbd-split-brain-4-steps describes how.

David Sedeño

unread,

Nov 12, 2014, 11:10:14 AM11/12/14

to gan...@googlegroups.com

Hi,

The link you provide use drbadm that needs the resource name that I don't know how to get from ganeti.

Anyway, I have shutdown and start the vm again and now the drbd synced ok:

I have another machine with:

     on primary: /dev/drbd10 (147:10) in sync, status *DEGRADED*
     on secondary: /dev/drbd10 (147:10) in sync, status *DEGRADED*

and this issue I have solved with (with the machine shutdown):

  * on secondary drbd:
   drbdsetup /dev/drbd10 invalidate

  * on ganeti:
  gnt-instance activate-disks instancename

Hope that helps to another user :)

Regards,
---
David Sedeño

Reply all

Reply to author

Forward