Both disks are marked as *DEGRADED*

903 views
Skip to first unread message

Andreas

unread,
Mar 2, 2012, 9:49:49 AM3/2/12
to ganeti
When I migrated an instance between the two nodes it got a connection
timeout to the server it was supposed to migrate the instance to
saying it couldn't copy /var/lib/ganeti/config.data.
Now I have an instance that has both disks marked as *DEGRADED*. This
has happened before but sometimes it could be fixed by just start/stop
the instance, but not this time.
If someone please could tell me how to proceed to get the disks OK
again, I guess you would mark one of the disks as NOT degraded and
then resync the secondary disk?

Most of the documentation regarding this kind of issues only talks
about how to get it working when only One of the disks has failed and
not both.

Best regards,
Andreas

Iustin Pop

unread,
Mar 2, 2012, 10:37:01 AM3/2/12
to gan...@googlegroups.com

Before doing this, more information is needed.

Could you give:

- the output of gnt-instance info
- and the output of /proc/drbd on both nodes

If you know that (e.g.) on node A the data is still correct, then you
can just remove the LVs on node B, but I still would like to see
/proc/drbd to be sure.

iustin

Nilshar

unread,
Mar 5, 2012, 3:56:26 AM3/5/12
to gan...@googlegroups.com
Hello,

I got the exact same problem : both disks marker as *DEGRADED*

here the outputs of gnt-instance info and /proc/drbd


Disks:
- disk/0: drbd8, size 7.0G
access mode: rw
nodeA: vs010, minor=0
nodeB: vs005, minor=0
port: 11031
auth key: f4b46ac3fe895cb1686a5a2bf863744c1850a3ce
on primary: /dev/drbd0 (147:0) in sync, status *DEGRADED*
on secondary: /dev/drbd0 (147:0) in sync, status *DEGRADED*
child devices:
- child 0: lvm, size 7.0G
logical_id: xenvg/5524725b-3b69-4134-af8f-582ea70da953.disk0_data
on primary:
/dev/xenvg/5524725b-3b69-4134-af8f-582ea70da953.disk0_data (254:3)
on secondary:
/dev/xenvg/5524725b-3b69-4134-af8f-582ea70da953.disk0_data (254:5)
- child 1: lvm, size 128M
logical_id: xenvg/5524725b-3b69-4134-af8f-582ea70da953.disk0_meta
on primary:
/dev/xenvg/5524725b-3b69-4134-af8f-582ea70da953.disk0_meta (254:4)
on secondary:
/dev/xenvg/5524725b-3b69-4134-af8f-582ea70da953.disk0_meta (254:6)


vs005:~# cat /proc/drbd
version: 8.3.7 (api:88/proto:86-91)
srcversion: EE47D8BF18AC166BE219757
0: cs:WFConnection ro:Secondary/Unknown ds:UpToDate/DUnknown C r----
ns:0 nr:7753624 dw:7753624 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1
wo:b oos:0

vs010:~# cat /proc/drbd
version: 8.3.7 (api:88/proto:86-91)
srcversion: EE47D8BF18AC166BE219757
0: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown r----
ns:7753624 nr:0 dw:10256844 dr:4347883 al:124 bm:0 lo:0 pe:0 ua:0
ap:0 ep:1 wo:b oos:158200

Iustin Pop

unread,
Mar 5, 2012, 5:58:19 AM3/5/12
to gan...@googlegroups.com

Based on this 'StandAlone' status, I would guess there is a split-brain
issue. I would therefore invalidate the secondary, if the instance is
running fine on the primary:

- on vs005, run drbdsetup /dev/drbd0 invalidate

And then run 'gnt-instance activate-disks'.

Unfortunately Ganeti doesn't know how to automatically handle cases when
DRBD believes there's a split brain-issue.

regards,
iustin

Nilshar

unread,
Mar 5, 2012, 7:04:41 AM3/5/12
to gan...@googlegroups.com


Worked perfectly, thank you iustin.

Andreas

unread,
Mar 7, 2012, 4:32:33 AM3/7/12
to ganeti
Here are the output from gnt-instance-info and /proc/drbd

I tried to invalidate the secondary but it just gave me: "/dev/drbd7:
State change failed: (-2) Refusing to be Primary without at least one
UpToDate disk".
I assume that when ganeti says for example that sdx01 in this case is
primary node for the instance, does this mean that the primary node
also is the one that has the UpToDate disk?
Was thinking if I tried to invalidate the disk on the wrong node?


sdx01:~# gnt-instance info jira.test.com
Instance name: jira.test.com
UUID: ee8412ec-fc93-4799-8dd5-9b4d3b3ee31a
Serial number: 9
Creation time: 2012-01-27 13:39:36
Modification time: 2012-03-01 16:00:08
State: configured to be up, actual state is up
  Nodes:
    - primary: sdx02.test.com
    - secondaries: sdx01.test.com
  Operating system: debootstrap+default
  Allocated network port: 11080
  Hypervisor: kvm
    - acpi: default (True)
    - boot_order: default (disk)
    - cdrom_image_path: default ()
    - disk_cache: default (default)
    - disk_type: default (paravirtual)
    - initrd_path: default (/boot/initrd.img-2.6.32-bpo.5-amd64)
    - kernel_args: default (ro)
    - kernel_path: default (/boot/vmlinuz-2.6.32-bpo.5-amd64)
    - kvm_flag: default ()
    - migration_downtime: default (30)
    - nic_type: default (paravirtual)
    - root_path: default (/dev/vda1)
    - security_domain: default ()
    - security_model: default (none)
    - serial_console: default (True)
    - usb_mouse: default ()
    - use_chroot: default (False)
    - use_localtime: default (False)
    - vhost_net: default (False)
    - vnc_bind_address: default ()
    - vnc_password_file: default ()
    - vnc_tls: default (False)
    - vnc_x509_path: default ()
    - vnc_x509_verify: default (False)
  Hardware:
    - VCPUs: 4
    - memory: 3072MiB
    - NICs:
      - nic/0: MAC: aa:00:00:17:9b:a6, IP: None, mode: bridged, link:
xen-br0
  Disks:
    - disk/0: drbd8, size 15.0G
      access mode:  rw
      nodeA:        sdx01.test.com, minor=7
      nodeB:        sdx02.test.com, minor=7
      port:         11081
      auth key:     8b512056d4d562b75cd1bd214fbf170f7468970c
      on primary:   /dev/drbd7 (147:7) in sync, status *DEGRADED*
      on secondary: /dev/drbd7 (147:7) in sync, status *DEGRADED*
      child devices:
        - child 0: lvm, size 15.0G
          logical_id:   vol_grp_virt/567c12a5-436a-40b0-
a04d-30380dcde0ed.disk0_data
          on primary:   /dev/vol_grp_virt/567c12a5-436a-40b0-
a04d-30380dcde0ed.disk0_data (254:14)
          on secondary: /dev/vol_grp_virt/567c12a5-436a-40b0-
a04d-30380dcde0ed.disk0_data (254:14)
        - child 1: lvm, size 128M
          logical_id:   vol_grp_virt/567c12a5-436a-40b0-
a04d-30380dcde0ed.disk0_meta
          on primary:   /dev/vol_grp_virt/567c12a5-436a-40b0-
a04d-30380dcde0ed.disk0_meta (254:15)
          on secondary: /dev/vol_grp_virt/567c12a5-436a-40b0-
a04d-30380dcde0ed.disk0_meta (254:15)


7: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown r----
ns:1460 nr:0 dw:48202956 dr:1881460 al:734 bm:15 lo:0 pe:0 ua:0 ap:
0 ep:1 wo:b oos:0

/a

Andreas

unread,
Mar 7, 2012, 4:34:20 AM3/7/12
to ganeti
And this is the output from /proc/drbd on the secondary:

7: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown r----
ns:16 nr:1460 dw:62111316 dr:2480732 al:321 bm:330 lo:0 pe:0 ua:0
ap:0 ep:1 wo:b oos:480404

/a

Guido Trotter

unread,
Mar 7, 2012, 4:37:46 AM3/7/12
to gan...@googlegroups.com
On Wed, Mar 7, 2012 at 10:34, Andreas <andreas...@gmail.com> wrote:
> And this is the output from /proc/drbd on the secondary:
>
>  7: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown   r----
>    ns:16 nr:1460 dw:62111316 dr:2480732 al:321 bm:330 lo:0 pe:0 ua:0
> ap:0 ep:1 wo:b oos:480404
>

If they're both in primary/unknown (and the data on the primary is
correct+the instance there is running) I guest the best would be to
delete the data on the secondary (turn off the drbd device at least)
and then just run gnt-instance replace-disks to turn up the mirroring
again.

Regards,

Guido

Andreas

unread,
Mar 8, 2012, 4:25:58 AM3/8/12
to ganeti
The thing is that I can't invalidate the disk on either the primary or
the secondary node, it only says:
"/dev/drbd6: State change failed: (-2) Refusing to be Primary without
at least one UpToDate disk"
But if you check /proc/drbd you can see that one the nodes Says that
it has a UpToDate disk on one of the nodes.
What is causing this?

/andreas

On Mar 7, 10:37 am, Guido Trotter <ultrot...@gmail.com> wrote:

Guido Trotter

unread,
Mar 8, 2012, 5:23:33 AM3/8/12
to gan...@googlegroups.com
On Thu, Mar 8, 2012 at 9:25 AM, Andreas <andreas...@gmail.com> wrote:
> The thing is that I can't invalidate the disk on either the primary or
> the secondary node, it only says:
> "/dev/drbd6: State change failed: (-2) Refusing to be Primary without
> at least one UpToDate disk"
> But if you check /proc/drbd you can see that one the nodes Says that
> it has a UpToDate disk on one of the nodes.
> What is causing this?
>

Yes, I don't think you can invalidate, since they are disconnected and
both of them think to be the only up to date copy.
That's why I suggested shutting down the one on the secondary (the
unused one) and just replacing-disk from the primary, which should
create a new secondary.

Regards,

Guido

Thomas Rieschl

unread,
Mar 8, 2012, 5:46:12 AM3/8/12
to gan...@googlegroups.com
just a question...
you play with that degraded disk for almost a week now.
why don't you just export/backup the vm, delete the old one and recreate it.

...or create a new instance, --no-start, activate disks, mount source
and destination and tar the whole bunch to the new instance:
tar c -C /mnt/source . | tar x -C /mnt/destination

:)

or do you want to "learn" drbd? .. or get gray hair? ;)


cheers,
thomas

Andreas

unread,
Mar 9, 2012, 6:14:24 AM3/9/12
to ganeti
I know there is a couple of solutions to get it up and running again,
but this happens to me very often and I want to know Why this keeps
happening.
Do you all use direct connection between the primary and secondary
node?

/a
>  smime.p7s
> 5KViewDownload

Andreas

unread,
Mar 9, 2012, 6:24:13 AM3/9/12
to ganeti
Tried this, did run drbdsetup /dev/drbd6 disconnect, nothing seemed to
happen, can still see the disk when looking in /proc/drbd.
And when trying to run replace-disks on the primary just gives me:

Fri Mar 9 12:19:39 2012 Replacing disk(s) 0 for wiki.test.com
Fri Mar 9 12:19:39 2012 STEP 1/6 Check device existence
Fri Mar 9 12:19:39 2012 - INFO: Checking disk/0 on sdx01.test.com
Fri Mar 9 12:19:39 2012 - INFO: Checking disk/0 on sdx02.test.com
Fri Mar 9 12:19:39 2012 - INFO: Checking volume groups
Fri Mar 9 12:19:39 2012 STEP 2/6 Check peer consistency
Fri Mar 9 12:19:39 2012 - INFO: Checking disk/0 consistency on node
sdx01.test.com
Failure: command execution error:
Node sdx01.test.com has degraded storage, unsafe to replace disks for
instance wiki.test.com

/a

On Mar 8, 11:23 am, Guido Trotter <ultrot...@google.com> wrote:

Iustin Pop

unread,
Mar 9, 2012, 1:30:38 PM3/9/12
to gan...@googlegroups.com
On Fri, Mar 09, 2012 at 03:24:13AM -0800, Andreas wrote:
> Tried this, did run drbdsetup /dev/drbd6 disconnect, nothing seemed to
> happen, can still see the disk when looking in /proc/drbd.
> And when trying to run replace-disks on the primary just gives me:

No, not disconnect. The device is _already_ disconnected.

You need to "drbdsetup /dev/drbd6 down", or at least "detach", and then
run replace-disks.

regards,
iustin

Iustin Pop

unread,
Mar 9, 2012, 1:31:17 PM3/9/12
to gan...@googlegroups.com
On Fri, Mar 09, 2012 at 03:14:24AM -0800, Andreas wrote:
> I know there is a couple of solutions to get it up and running again,
> but this happens to me very often and I want to know Why this keeps
> happening.
> Do you all use direct connection between the primary and secondary
> node?

No. If this happens often, there must be some other issue at play. Do
you know under what conditions it happens, e.g. always during live
migration, or… ?

iustin

Andreas

unread,
Mar 12, 2012, 4:39:34 AM3/12/12
to ganeti
Yeah tried that to, but the secondary holds the instance disks open by
something, the primary node just says this when shuting down the
instance:
"Mon Mar 12 09:37:05 2012 - WARNING: Could not shutdown block device
disk/0 on node sdx02.test.com: drbd4: can't shutdown drbd device: /dev/
drbd4: State change failed: (-12) Device is held open by someone\n"
This goes for three different instances that I have that are DEGRADED.

/a

Andreas

unread,
Mar 12, 2012, 4:40:42 AM3/12/12
to ganeti
I can see this happening on different tasks, no matter what I do as
long as it has to do with the nodes talking to each other, I keep
getting timeouts to the secondary.

/a

Iustin Pop

unread,
Mar 12, 2012, 4:52:17 AM3/12/12
to gan...@googlegroups.com
On Mon, Mar 12, 2012 at 01:40:42AM -0700, Andreas wrote:
> I can see this happening on different tasks, no matter what I do as
> long as it has to do with the nodes talking to each other, I keep
> getting timeouts to the secondary.

That is strange. Just timeouts should not result in split brain…

Anyway, even the timeouts are not good. I would investigate the
OS/network setup to make sure it's good.

iustin

Reply all
Reply to author
Forward
0 new messages