How to resync a split-brain drbd volume?

candlerb

unread,

Nov 7, 2016, 1:05:21 PM11/7/16

to ganeti

I have a two-node ganeti-2.11 cluster where one of the drbd instances is in a bad state:

# drbd-overview

0:??not-found?? Connected Primary/Secondary UpToDate/UpToDate C r-----

1:??not-found?? Connected Primary/Secondary UpToDate/UpToDate C r-----

2:??not-found?? Connected Primary/Secondary UpToDate/UpToDate C r-----

4:??not-found?? Connected Primary/Secondary UpToDate/UpToDate C r-----

5:??not-found?? Connected Primary/Secondary UpToDate/UpToDate C r-----

6:??not-found?? Connected Primary/Secondary UpToDate/UpToDate C r-----

7:??not-found?? Connected Primary/Secondary UpToDate/UpToDate C r-----

10:??not-found?? StandAlone Primary/Unknown UpToDate/DUnknown r-----

11:??not-found?? Connected Primary/Secondary UpToDate/UpToDate C r-----

The other side shows:

...

10:??not-found?? StandAlone Secondary/Unknown UpToDate/DUnknown

...

I have found the instance which uses /dev/drbd10. gnt-instance info shows:

Disk template: drbd

Disks:

- disk/0: drbd, size 20.0G

access mode: rw

nodeA: wrn-vm1.int.example.net, minor=10

nodeB: wrn-vm2.int.example.net, minor=10

port: 11026

auth key: 416bdb66767e7664133766f7e2b41d55af1ecdb0

on primary: /dev/drbd10 (147:10) in sync, status *DEGRADED*

on secondary: /dev/drbd10 (147:10) in sync, status *DEGRADED*

UUID: be8c8774-6727-4c82-b83a-ff51e5aca425

child devices:

- child 0: plain, size 20.0G

logical_id: xenvg/f50973f4-e40f-46fb-bee8-831cecaf051c.disk0_data

on primary: /dev/xenvg/f50973f4-e40f-46fb-bee8-831cecaf051c.disk0_data (253:25)

on secondary: /dev/xenvg/f50973f4-e40f-46fb-bee8-831cecaf051c.disk0_data (253:23)

UUID: 53b1b9f3-5d33-449f-b6c1-59a95d861379

- child 1: plain, size 128M

logical_id: xenvg/f50973f4-e40f-46fb-bee8-831cecaf051c.disk0_meta

on primary: /dev/xenvg/f50973f4-e40f-46fb-bee8-831cecaf051c.disk0_meta (253:26)

on secondary: /dev/xenvg/f50973f4-e40f-46fb-bee8-831cecaf051c.disk0_meta (253:24)

UUID: 02330058-838a-4ae3-ba6f-6085aa00bdf5

(Aside: I don't understand how it can be "in sync" and yet "degraded" at the same time...)

dmesg | grep drbd10 shows:

[9623265.691560] block drbd10: conn( StandAlone -> Unconnected )

[9623265.697449] block drbd10: Starting receiver thread (from drbd10_worker [4780])

[9623265.705032] block drbd10: receiver (re)started

[9623265.709782] block drbd10: conn( Unconnected -> WFConnection )

[9623266.214534] block drbd10: Handshake successful: Agreed network protocol version 96

[9623266.229363] block drbd10: Peer authenticated using 16 bytes of 'md5' HMAC

[9623266.236380] block drbd10: conn( WFConnection -> WFReportParams )

[9623266.242686] block drbd10: Starting asender thread (from drbd10_receiver [20302])

[9623266.250333] block drbd10: data-integrity-alg: <not-used>

[9623266.255853] block drbd10: drbd_sync_handshake:

[9623266.260498] block drbd10: self B5BE145BFFB86D7B:D66CEA14B365CF09:AA37400902A6071A:AA36400902A6071B bits:40099 flags:0

[9623266.271347] block drbd10: peer 6CD10A4C30600E74:D66CEA14B365CF09:AA37400902A6071B:AA36400902A6071B bits:11 flags:0

[9623266.281898] block drbd10: uuid_compare()=100 by rule 90

[9623266.287320] block drbd10: helper command: /bin/true initial-split-brain minor-10

[9623266.295266] block drbd10: helper command: /bin/true initial-split-brain minor-10 exit code 0 (0x0)

[9623266.304451] block drbd10: Split-Brain detected but unresolved, dropping connection!

[9623266.312319] block drbd10: helper command: /bin/true split-brain minor-10

[9623266.319519] block drbd10: helper command: /bin/true split-brain minor-10 exit code 0 (0x0)

[9623266.328089] block drbd10: conn( WFReportParams -> Disconnecting )

[9623266.334564] block drbd10: error receiving ReportState, l: 4!

[9623266.340521] block drbd10: asender terminated

[9623266.344991] block drbd10: Terminating drbd10_asender

[9623266.345037] block drbd10: Connection closed

[9623266.345043] block drbd10: conn( Disconnecting -> StandAlone )

[9623266.345055] block drbd10: receiver terminated

[9623266.345056] block drbd10: Terminating drbd10_receiver

OK, so it's a split brain. The instance is running on node1, so I just want to force it to resync so that node2 has an identical copy (i.e. sync primary to secondary). But how to do this on a 2-node cluster?

I found https://blkperl.github.io/split-brain-ganeti.html which suggests moving the secondary to a third node. I don't have one.

I could convert to plain and back to drbd again, but that would involve shutting down the instance twice for an extended time, which I'd rather avoid.

I tried using gnt-instance replace-disks -s, but that didn't work:

root@wrn-vm1:~# gnt-instance replace-disks -s instance-name

Mon Nov 7 17:53:51 2016 Replacing disk(s) 0 for instance 'instance-name.int.example.net'

Mon Nov 7 17:53:51 2016 Current primary node: wrn-vm1.int.example.net

Mon Nov 7 17:53:51 2016 Current seconary node: wrn-vm2.int.example.net

Mon Nov 7 17:53:51 2016 STEP 1/6 Check device existence

Mon Nov 7 17:53:51 2016 - INFO: Checking disk/0 on wrn-vm1.int.example.net

Mon Nov 7 17:53:52 2016 - INFO: Checking disk/0 on wrn-vm2.int.example.net

Mon Nov 7 17:53:53 2016 - INFO: Checking volume groups

Mon Nov 7 17:53:54 2016 STEP 2/6 Check peer consistency

Mon Nov 7 17:53:54 2016 - INFO: Checking disk/0 consistency on node wrn-vm1.int.example.net

Failure: command execution error:

Node wrn-vm1.int.example.net has degraded storage, unsafe to replace disks for instance instance-name.int.example.net

There doesn't seem to be an option to force it. Using "-a" instead of "-s" it says there's no problem:

root@wrn-vm1:~# gnt-instance replace-disks -a instance-name

Mon Nov 7 17:59:00 2016 - INFO: Checking disk/0 on wrn-vm1.int.example.net

Mon Nov 7 17:59:02 2016 - INFO: Checking disk/0 on wrn-vm2.int.example.net

Mon Nov 7 17:59:04 2016 No disks need replacement for instance 'instance-name.int.example.net'

Any other clues for how to proceed?

Thanks,

Brian.

Phil Regnauld

unread,

Nov 7, 2016, 2:58:50 PM11/7/16

to gan...@googlegroups.com

candlerb (b.candler) writes:
>
> I could convert to plain and back to drbd again, but that would involve
> shutting down the instance twice for an extended time, which I'd rather
> avoid.

Extended time ? About 1-2 minutes each time:

gnt-instance shutdown instance
gnt-instance modify -t plain instance (takes 10-20 secs)
gnt-instance modify -t drbd --no-wait-for-sync
gnt-instance start instance

> Any other clues for how to proceed?

What does "active-disks" say ?

But I think the last time I helped someone fix this, it was
drbd->plain->drbd.

candlerb

unread,

Nov 7, 2016, 3:16:07 PM11/7/16

to ganeti

> What does "active-disks" say ?

You mean "activate-disks" ? It said everything was OK.

Anyway, after digging around the drbdsetup manpage, I think I found the solution. On the secondary node I just did this:

# drbdsetup /dev/drbd10 invalidate

After a few minutes, everything kicked off. Primary shows:

10:??not-found?? SyncSource Primary/Secondary UpToDate/Inconsistent C r-----

[>...................] sync'ed: 9.7% (18500/20480)Mfinish: 0:18:50 speed: 16,748 (20,236) K/sec

Secondary shows:

10:??not-found?? SyncTarget Secondary/Primary Inconsistent/UpToDate

[=>..................] sync'ed: 10.4% (18364/20480)Mfinish: 0:29:07 speed: 10,752 (18,660) want: 5,840 K/sec

dmesg shows:

[10033.664658] block drbd10: disk( UpToDate -> Inconsistent )

[10033.709663] block drbd10: bitmap WRITE of 160 pages took 3 jiffies

[10033.727035] block drbd10: 20 GB (5242880 bits) marked out-of-sync by on disk bit-map.

...

[10106.618678] block drbd10: drbd_sync_handshake:

[10106.623179] block drbd10: self 6CD10A4C30600E74:D66CEA14B365CF09:AA37400902A6071B:AA36400902A6071B bits:5242880 flags:0

[10106.634000] block drbd10: peer B5BE145BFFB86D7B:D66CEA14B365CF09:AA37400902A6071A:AA36400902A6071B bits:40216 flags:0

[10106.644726] block drbd10: uuid_compare()=100 by rule 90

[10106.649998] block drbd10: Becoming sync target due to disk states.

[10106.656225] block drbd10: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )

[10106.688508] block drbd10: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 15(1), total 15; compression: 100.0%

[10106.698942] block drbd10: conn( WFBitMapT -> WFSyncUUID )

[10106.783566] block drbd10: updated sync uuid D66DEA14B365CF08:0000000000000000:AA37400902A6071B:AA36400902A6071B

[10106.803809] block drbd10: helper command: /bin/true before-resync-target minor-10

[10106.811972] block drbd10: helper command: /bin/true before-resync-target minor-10 exit code 0 (0x0)

[10106.821109] block drbd10: conn( WFSyncUUID -> SyncTarget )

[10106.826733] block drbd10: Began resync as SyncTarget (will sync 20971520 KB [5242880 bits set]).

Cheers,

Brian.

Iustin Pop

unread,

Nov 7, 2016, 3:21:14 PM11/7/16

to gan...@googlegroups.com

You can always manually force DRBD to resync (thus without any
downtime), but it's a bit more complex and potentially more dangerous.

If one wants to do it, lookup for drbdsetup invalidate/invalidate-remote
commands. It's been a long time since I've done it, so I'm not entirely
sure, but I think a "drbdsetup invalidate <minor>" on the secondary and
then trying to re-activate the disks should do it. If not, try to
invalidate-remote on the primary as well.

regards,
iustin

Phil Regnauld

unread,

Nov 7, 2016, 3:23:08 PM11/7/16

to gan...@googlegroups.com

candlerb (b.candler) writes:
> > What does "active-disks" say ?
>
> You mean "activate-disks" ? It said everything was OK.

Yes, activate-disks, sorry. Ah ok, hadn't seen you trying
that, only "gnt-instance replace-disks -a instance-name"

> Anyway, after digging around the drbdsetup manpage, I think I found the
> solution. On the secondary node I just did this:
>
> # drbdsetup /dev/drbd10 invalidate

Ah neat. I see a thread on this from 2012 (although in that particular
case it didn't help):

https://groups.google.com/forum/#!topic/ganeti/7GsZ7zHIH1g

Reply all

Reply to author

Forward