On Ganeti 2.15 - master is Ubuntu 16.04 LTS, but the two
nodes referenced are running Debian 8.5 with Ganeti 2.15
from backports.
The nodes are recently installed, and I'm seeing a funky error when trying
to convert a plain node to drbd.
- nodes are connected via a secondary 10G replication network
- the node config reflects this
- connectivity has been tested
On the master, I see:
root@ganeti5:/var/log/ganeti# gnt-instance modify -t drbd -n ganeti7
customer-server.xyz
Mon Dec 19 14:49:02 2016 Converting disk template from 'plain' to 'drbd'
Mon Dec 19 14:49:02 2016 Creating additional volumes...
Mon Dec 19 14:49:06 2016 Renaming original volumes...
Mon Dec 19 14:49:06 2016 Initializing DRBD devices...
Mon Dec 19 14:49:13 2016 - INFO: Waiting for instance
customer-server.xyz to sync disks
Mon Dec 19 14:49:27 2016 - INFO: Instance
customer-server.xyz's disks are in sync
Failure: command execution error:
There are some degraded disks for this instance, please cleanup manually
Now, this is a 400 GB instance, so I doubt the disks were replicated in 14 seconds :)
The error looks incomplete...
In jobs.log:
2016-12-19 14:49:13,884: job-1060193 pid=32037 INFO Waiting for instance
customer-server.xyz to sync disks
2016-12-19 14:49:14,323: job-1060193 pid=32037 INFO Degraded disks found, 10 retries left
2016-12-19 14:49:15,688: job-1060193 pid=32037 INFO Degraded disks found, 9 retries left
2016-12-19 14:49:17,026: job-1060193 pid=32037 INFO Degraded disks found, 8 retries left
2016-12-19 14:49:18,349: job-1060193 pid=32037 INFO Degraded disks found, 7 retries left
2016-12-19 14:49:19,667: job-1060193 pid=32037 INFO Degraded disks found, 6 retries left
2016-12-19 14:49:21,080: job-1060193 pid=32037 INFO Degraded disks found, 5 retries left
2016-12-19 14:49:22,420: job-1060193 pid=32037 INFO Degraded disks found, 4 retries left
2016-12-19 14:49:23,754: job-1060193 pid=32037 INFO Degraded disks found, 3 retries left
2016-12-19 14:49:25,082: job-1060193 pid=32037 INFO Degraded disks found, 2 retries left
2016-12-19 14:49:26,447: job-1060193 pid=32037 INFO Degraded disks found, 1 retries left
2016-12-19 14:49:27,791: job-1060193 pid=32037 INFO Instance
customer-server.xyz's disks are in sync
2016-12-19 14:49:28,288: job-1060193 pid=32037 ERROR Op 1/1: Caught exception in INSTANCE_SET_PARAMS(
customer-server.xyz)
Traceback (most recent call last):
File "/usr/share/ganeti/2.15/ganeti/jqueue/__init__.py", line 933, in _ExecOpCodeUnlocked
timeout=timeout)
File "/usr/share/ganeti/2.15/ganeti/jqueue/__init__.py", line 1227, in _WrapExecOpCode
return execop_fn(op, *args, **kwargs)
File "/usr/share/ganeti/2.15/ganeti/mcpu.py", line 697, in ExecOpCode
calc_timeout)
File "/usr/share/ganeti/2.15/ganeti/mcpu.py", line 624, in _LockAndExecLU
pending=pending)
File "/usr/share/ganeti/2.15/ganeti/mcpu.py", line 624, in _LockAndExecLU
pending=pending)
File "/usr/share/ganeti/2.15/ganeti/mcpu.py", line 624, in _LockAndExecLU
pending=pending)
File "/usr/share/ganeti/2.15/ganeti/mcpu.py", line 624, in _LockAndExecLU
pending=pending)
File "/usr/share/ganeti/2.15/ganeti/mcpu.py", line 631, in _LockAndExecLU
result = self._LockAndExecLU(lu, level + 1, calc_timeout, pending=pending)
File "/usr/share/ganeti/2.15/ganeti/mcpu.py", line 538, in _LockAndExecLU
result = self._ExecLU(lu)
File "/usr/share/ganeti/2.15/ganeti/mcpu.py", line 496, in _ExecLU
result = _ProcessResult(submit_mj_fn, lu.op, lu.Exec(self.Log))
File "/usr/share/ganeti/2.15/ganeti/cmdlib/instance_set_params.py", line 1879, in Exec
self._DISK_CONVERSIONS[mode](self, feedback_fn)
File "/usr/share/ganeti/2.15/ganeti/cmdlib/instance_set_params.py", line 1466, in _ConvertPlainToDrbd
raise errors.OpExecError("There are some degraded disks for"
OpExecError: There are some degraded disks for this instance, please cleanup manually
2016-12-19 14:49:30,938: job-1060193 pid=32037 INFO Finished job 1060193, status = error
I checked node-daemon.log on both nodes, and apart from drbd setup and some lvm commands,
none of which are returning failures, I'm not sure what the issue could be.