drbd both disk recovering degraded

317 views
Skip to first unread message

David Sedeño

unread,
Nov 12, 2014, 5:23:40 AM11/12/14
to gan...@googlegroups.com
Hi,

I have a machine that is running fine but has the disk in this state:

     on primary: /dev/drbd23 (147:23) *RECOVERING*  0.00%, ETA unknown, status *DEGRADED*
     on secondary: /dev/drbd23 (147:23) *RECOVERING*  0.00%, ETA unknown, status *DEGRADED* *UNCERTAIN STATE*


I have tried to run gnt-instance replace-disks -s instance but it gives the error:

  Failure: command execution error:
  Node node1.mydomain.com has degraded storage, unsafe to replace disks for instance

How can I recover the drbd ?

Regards,

David Sedeño

Aaron Karper

unread,
Nov 12, 2014, 5:38:05 AM11/12/14
to gan...@googlegroups.com
This sounds like there is a problem with the node communication. Does 'gnt-instance activate-disks <instance name using /dev/drbd23>' help? That activates the necessary nodes and allows them to sync.

David Sedeño

unread,
Nov 12, 2014, 6:01:53 AM11/12/14
to gan...@googlegroups.com
Hi Aaron,


El miércoles, 12 de noviembre de 2014 11:38:05 UTC+1, Aaron Karper escribió:
This sounds like there is a problem with the node communication. Does 'gnt-instance activate-disks <instance name using /dev/drbd23>' help? That activates the necessary nodes and allows them to sync.

No, it seems that I am in the same state:

  # gnt-instance activate-disks instancename
  node1.mydomain.com:disk/0:/dev/drbd23

  # gnt-instance info instancename | grep drbd
  Disk template: drbd
    - disk/0: drbd, size 40.0G

      on primary: /dev/drbd23 (147:23) *RECOVERING*  0.00%, ETA unknown, status *DEGRADED*
      on secondary: /dev/drbd23 (147:23) *RECOVERING*  0.00%, ETA unknown, status *DEGRADED* *UNCERTAIN STATE*

I really don't know how the machine got into this state, I have others with drbd and are working fine.

Regards,

David

Aaron Karper

unread,
Nov 12, 2014, 7:36:07 AM11/12/14
to gan...@googlegroups.com
Hey David,

this looks like a DRBD split brain problem (both nodes assume that they are the DRBD primary for this device), which currently isn't fixable by ganeti (but you should check your kernel logs to see if DRBD detected that). This is usually due to the connection temporarily failing. What you probably want to do is to mark the (ganeti) primary for the instance as the (DRBD) primary of the device and discard the changes on the (ganeti) secondary, this guide http://www.hastexo.com/resources/hints-and-kinks/solve-drbd-split-brain-4-steps describes how.

David Sedeño

unread,
Nov 12, 2014, 11:10:14 AM11/12/14
to gan...@googlegroups.com
Hi,

The link you provide use drbadm that needs the resource name that I don't know how to get from ganeti.

Anyway, I have shutdown and start the vm again and now the drbd synced ok:

I have another machine with:

     on primary: /dev/drbd10 (147:10) in sync, status *DEGRADED*
     on secondary: /dev/drbd10 (147:10) in sync, status *DEGRADED*

and this
issue I have solved with (with the machine shutdown):

* on secondary drbd:
drbdsetup /dev/drbd10 invalidate

* on ganeti:
gnt-instance activate-disks instancename

Hope that helps to another user :)

Regards,
---
David Sedeño
Reply all
Reply to author
Forward
0 new messages