Before doing this, more information is needed.
Could you give:
- the output of gnt-instance info
- and the output of /proc/drbd on both nodes
If you know that (e.g.) on node A the data is still correct, then you
can just remove the LVs on node B, but I still would like to see
/proc/drbd to be sure.
iustin
I got the exact same problem : both disks marker as *DEGRADED*
here the outputs of gnt-instance info and /proc/drbd
Disks:
- disk/0: drbd8, size 7.0G
access mode: rw
nodeA: vs010, minor=0
nodeB: vs005, minor=0
port: 11031
auth key: f4b46ac3fe895cb1686a5a2bf863744c1850a3ce
on primary: /dev/drbd0 (147:0) in sync, status *DEGRADED*
on secondary: /dev/drbd0 (147:0) in sync, status *DEGRADED*
child devices:
- child 0: lvm, size 7.0G
logical_id: xenvg/5524725b-3b69-4134-af8f-582ea70da953.disk0_data
on primary:
/dev/xenvg/5524725b-3b69-4134-af8f-582ea70da953.disk0_data (254:3)
on secondary:
/dev/xenvg/5524725b-3b69-4134-af8f-582ea70da953.disk0_data (254:5)
- child 1: lvm, size 128M
logical_id: xenvg/5524725b-3b69-4134-af8f-582ea70da953.disk0_meta
on primary:
/dev/xenvg/5524725b-3b69-4134-af8f-582ea70da953.disk0_meta (254:4)
on secondary:
/dev/xenvg/5524725b-3b69-4134-af8f-582ea70da953.disk0_meta (254:6)
vs005:~# cat /proc/drbd
version: 8.3.7 (api:88/proto:86-91)
srcversion: EE47D8BF18AC166BE219757
0: cs:WFConnection ro:Secondary/Unknown ds:UpToDate/DUnknown C r----
ns:0 nr:7753624 dw:7753624 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1
wo:b oos:0
vs010:~# cat /proc/drbd
version: 8.3.7 (api:88/proto:86-91)
srcversion: EE47D8BF18AC166BE219757
0: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown r----
ns:7753624 nr:0 dw:10256844 dr:4347883 al:124 bm:0 lo:0 pe:0 ua:0
ap:0 ep:1 wo:b oos:158200
Based on this 'StandAlone' status, I would guess there is a split-brain
issue. I would therefore invalidate the secondary, if the instance is
running fine on the primary:
- on vs005, run drbdsetup /dev/drbd0 invalidate
And then run 'gnt-instance activate-disks'.
Unfortunately Ganeti doesn't know how to automatically handle cases when
DRBD believes there's a split brain-issue.
regards,
iustin
Worked perfectly, thank you iustin.
If they're both in primary/unknown (and the data on the primary is
correct+the instance there is running) I guest the best would be to
delete the data on the secondary (turn off the drbd device at least)
and then just run gnt-instance replace-disks to turn up the mirroring
again.
Regards,
Guido
Yes, I don't think you can invalidate, since they are disconnected and
both of them think to be the only up to date copy.
That's why I suggested shutting down the one on the secondary (the
unused one) and just replacing-disk from the primary, which should
create a new secondary.
Regards,
Guido
...or create a new instance, --no-start, activate disks, mount source
and destination and tar the whole bunch to the new instance:
tar c -C /mnt/source . | tar x -C /mnt/destination
:)
or do you want to "learn" drbd? .. or get gray hair? ;)
cheers,
thomas
No, not disconnect. The device is _already_ disconnected.
You need to "drbdsetup /dev/drbd6 down", or at least "detach", and then
run replace-disks.
regards,
iustin
No. If this happens often, there must be some other issue at play. Do
you know under what conditions it happens, e.g. always during live
migration, or… ?
iustin
That is strange. Just timeouts should not result in split brain…
Anyway, even the timeouts are not good. I would investigate the
OS/network setup to make sure it's good.
iustin