DRBD 8.3 and "--c-plan-ahead" option

John McNally

unread,

Oct 9, 2018, 3:44:45 PM10/9/18

to ganeti

Dear Ganeti Experts,

I recently followed a post by candlerb (https://github.com/ganeti/ganeti/issues/1229#issuecomment-399055393) on how to improve DRBD sync speed with 10-gigabit storage NIC's. The suggested ganeti command is this:

gnt-cluster modify -D drbd:resync-rate=204800,disk-custom='--c-plan-ahead 0',net-custom='--max-buffers 16384 --max-epoch-size 16384'

It works like a charm on three of my clusters (CentOS 7, ganeti 2.15.2, DRBD 8.4.11). However, there is a problem with an older cluster of mine (CentOS 6, ganeti 2.9.2, DRBD 8.3.16). Sadly, DRBD 8.4 is not available on CentOS 6.

Now, when I try to add a new instance on the older cluster, I get this error:

# gnt-instance add -n quinoa-a.intranet.psfc.coop:quinoa-b.intranet.psfc.coop -o raw-image+default -t drbd -s 20G --no-start -H kvm:vnc_bind_address=127.0.0.1 membership-dev.intranet.psfc.coop

Tue Oct 9 14:54:23 2018 * creating instance disks...

Tue Oct 9 14:54:31 2018 - WARNING: Device creation failed

Failure: command execution error:

Can't create block device <DRBD8(hosts=f35c5eef-c4b9-4674-aff4-1b3f1ee54c59/16-5dac1e82-3d53-4a65-ade1-9797cbf5ab77/16, port=11031, configured as 192.168.6.17:11031 192.168.6.16:11031, backend=<LogicalVolume(/dev/ganeti/d27164d0-5130-4716-bc6d-5f7b8be22f1b.disk0_data, not visible, size=20480m)>, metadev=<LogicalVolume(/dev/ganeti/d27164d0-5130-4716-bc6d-5f7b8be22f1b.disk0_meta, not visible, size=128m)>, visible as /dev/disk/0, size=20480m)> on node quinoa-b.intranet.psfc.coop for instance membership-dev.intranet.psfc.coop: Can't assemble device after creation, unusual event: drbd16: can't attach local disk: drbdsetup disk: unrecognized option '--c-plan-ahead'

I think the underlying problem is that DRBD 8.3 doesn't support the "--c-plan-ahead" option. This error does NOT occur on the clusters with DRBD 8.4, where the "--c-plan-ahead" option is in place. There, I can add new instances normally and the sync is nice and speedy.

Furthermore, when I run "gnt-cluster verify" on the older cluster, I now see this:

# gnt-cluster verify

Submitted jobs 985332, 985333

Waiting for job 985332 ...

Tue Oct 9 15:11:01 2018 * Verifying cluster config

Tue Oct 9 15:11:01 2018 * Verifying cluster certificate files

Tue Oct 9 15:11:01 2018 * Verifying hypervisor parameters

Tue Oct 9 15:11:01 2018 * Verifying all nodes belong to an existing group

Waiting for job 985333 ...

Tue Oct 9 15:11:01 2018 * Verifying group 'default'

Tue Oct 9 15:11:01 2018 * Gathering data (2 nodes)

Tue Oct 9 15:11:04 2018 * Gathering disk information (2 nodes)

Tue Oct 9 15:11:20 2018 * Verifying configuration file consistency

Tue Oct 9 15:11:20 2018 * Verifying node status

Tue Oct 9 15:11:20 2018 - ERROR: cluster: ghost instance 'e1a36a0d-4c1f-47ac-90d6-ca42ee1d7324' in temporary DRBD map

Job 985333 has failed: Failure: command execution error:

Unknown instance: e1a36a0d-4c1f-47ac-90d6-ca42ee1d7324

1 job(s) failed while verifying the cluster.

Consequently, my questions are:

1. How do I "back out" of the "gnt-cluster modify" and safely remove the "--c-plan-ahead" option", restoring my previous DRBD options?

2. How do I clean up the DRBD "ghost instance" so "gnt-cluster verify" runs cleanly?

Thanks for your help.

Best,

John McNally

jmcn...@acm.org

candlerb

unread,

Oct 10, 2018, 7:12:59 AM10/10/18

to ganeti

gnt-cluster modify -D drbd:resync-rate=204800,disk-custom='',net-custom=''

ought to put it back.

I have no idea what the "temporary DRBD map" is though; and the v2.15.2 code says "ghost disk" not "ghost instance":

$ grep -R 'ghost' .

./lib/cmdlib/cluster/verify.py: @type ghost: boolean

./lib/cmdlib/cluster/verify.py: @ivar ghost: whether this is a known node or not (config)

./lib/cmdlib/cluster/verify.py: self.ghost = False

./lib/cmdlib/cluster/verify.py: # the 'ghost node' construction in Exec() ensures that we have a

./lib/cmdlib/cluster/verify.py: bad_snode = snode.ghost or snode.offline

./lib/cmdlib/cluster/verify.py: # ... or ghost/non-vm_capable nodes

./lib/cmdlib/cluster/verify.py: self._ErrorIf(node_image[node_uuid].ghost, constants.CV_EINSTANCEBADNODE,

./lib/cmdlib/cluster/verify.py: instance.name, "instance lives on ghost node %s",

./lib/cmdlib/cluster/verify.py: "ghost disk '%s' in temporary DRBD map", disk_uuid)

./lib/cmdlib/cluster/verify.py: # ghost disk should not be active, but otherwise we

./lib/cmdlib/cluster/verify.py: # don't give double warnings (both ghost disk and

./lib/cmdlib/cluster/verify.py: gnode.ghost = (nuuid not in self.all_node_info)

It looks like it finds an unexpected drbd entry from ComputeDRBDMap, at which point it descends into Haskell. If you can identify the drbd node you might be able to remove it with drbdsetup, otherwise at worst you can reboot the node although sadly it doesn't say which node...

John McNally

unread,

Oct 10, 2018, 11:20:37 AM10/10/18

to gan...@googlegroups.com

Candlerb,

Your suggestion worked for removing the "disk-custom" and "net-custom" DRBD options. Also, I cleaned up the "ghost instance" simply by restarting ganeti services on all nodes -- no need to reboot.

Thanks for your super-speedy response!

Cheers,

_________________
John McNally
(718) 834-0549
jmcn...@acm.org

Reply all

Reply to author

Forward