Can't Reboot an instance after master node failure (DRBD State Error)

745 views
Skip to first unread message

Joe Esposito

unread,
Aug 16, 2010, 9:28:02 AM8/16/10
to ganeti
Hi everyone,

First off, this is in a test environment, and not a crisis :) I'm
running through a bunch of test scenarios, and hit a snag. All tests
are being run on a 2 node cluster (Dell R710, 24GB 6+2 Raid 10)
running Ubuntu 10.04 x64 Server with Ganeti 2.1.6 compiled and
installed manually.

Hostnames are:
clusternode1 (master that was shutdown to simulate failure)
clusternode2 (secondary that became master)
instance2 (working node)
instance4 (broken node!)
instance5 (working node)

After shutting down the primary (master) node, I issued the following
commands on the secondary..

#gnt-cluster masterfailover --no-voting
#ganeti-masterd --no-voting
#gnt-node failover --ignore-consistency clusternode1

At this point instance2 and instance5 are working fine. Which brings
us to the problem.. Instance2 doesn't want to boot!

#gnt-instance start --force instance4
Waiting for job 3763 for instance4.x.com...
Mon Aug 16 09:14:50 2010 - WARNING: Could not prepare block device
disk/0 on node clusternode1.x.com (is_primary=False, pass=1): Node is
marked offline
Mon Aug 16 09:14:50 2010 - WARNING: Could not shutdown block device
disk/0 on node clusternode1.x.com: Node is marked offline
Mon Aug 16 09:14:50 2010 - WARNING: Could not shutdown block device
disk/0 on node clusternode2.x.com: drbd2: can't shutdown drbd device: /
dev/drbd2: State change failed: (-12) Device is held open by someone\n
Job 3763 for instance4.x.com has failed: Failure: command execution
error:
Could not start instance: Hypervisor error: Failed to start instance
instance4.x.com

And Here's my /proc/drbd
# cat /proc/drbd
version: 8.3.7 (api:88/proto:86-91)
GIT-hash: ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by
root@clusternode2, 2010-08-11 22:43:21
0: cs:Unconfigured

2: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r----
ns:0 nr:0 dw:0 dr:516 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:
0
3: cs:Unconfigured


Now I think my problem is related to not having installed acpid, as
referenced at http://groups.google.com/group/ganeti/browse_thread/thread/a9bb8b0c7dc3f9c8,
on the guests before starting this simulation.

Question is can I fix it, and of course, how?

Iustin Pop

unread,
Aug 16, 2010, 9:41:31 AM8/16/10
to gan...@googlegroups.com

Being "held open" means that there is some process on clusternode2 that
keeps the drbd device active.

> Job 3763 for instance4.x.com has failed: Failure: command execution
> error:
> Could not start instance: Hypervisor error: Failed to start instance
> instance4.x.com
>
> And Here's my /proc/drbd
> # cat /proc/drbd
> version: 8.3.7 (api:88/proto:86-91)
> GIT-hash: ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by
> root@clusternode2, 2010-08-11 22:43:21
> 0: cs:Unconfigured
>
> 2: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r----
> ns:0 nr:0 dw:0 dr:516 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:
> 0

Does a 'drbdsetup /dev/drbd2 down' work? I expect not, just checking…

> 3: cs:Unconfigured
>
>
> Now I think my problem is related to not having installed acpid, as
> referenced at http://groups.google.com/group/ganeti/browse_thread/thread/a9bb8b0c7dc3f9c8,
> on the guests before starting this simulation.

Not sure…

> Question is can I fix it, and of course, how?

You have to check what's the state on clusternode2. I think we lost some
error messages, "Failed to start instance …" should also give the
reason, but it's not reported properly. Can you check the node daemon
logs?

iustin

Guido Trotter

unread,
Aug 16, 2010, 9:45:44 AM8/16/10
to gan...@googlegroups.com
On Mon, Aug 16, 2010 at 2:28 PM, Joe Esposito <j...@j-espo.com> wrote:

Hi,

> Hi everyone,
>
> First off, this is in a test environment, and not a crisis :)  I'm
> running through a bunch of test scenarios, and hit a snag.   All tests
> are being run on a  2 node cluster (Dell R710, 24GB 6+2 Raid 10)
> running Ubuntu 10.04 x64 Server with Ganeti 2.1.6 compiled and
> installed manually.
>
> Hostnames are:
> clusternode1  (master that was shutdown to simulate failure)
> clusternode2 (secondary that became master)
> instance2 (working node)
> instance4 (broken node!)
> instance5 (working node)
>
> After shutting down the primary (master) node, I issued the following
> commands on the secondary..
>
> #gnt-cluster masterfailover --no-voting
> #ganeti-masterd --no-voting
> #gnt-node failover --ignore-consistency clusternode1
>

Do you have the output, of this? Anyway after it clusternode2 is now
the primary for instance4, right?

> Mon Aug 16 09:14:50 2010  - WARNING: Could not shutdown block device
> disk/0 on node clusternode2.x.com: drbd2: can't shutdown drbd device: /
> dev/drbd2: State change failed: (-12) Device is held open by someone\n
> Job 3763 for instance4.x.com has failed: Failure: command execution
> error:

Here seems to be the issue. Could you try to see if there is any
process trying to access /dev/drbd2, or if it's kpartx-mapped or
mounted on the second node? Anyway that device needs to go down,
before the restart can happen. (drbdsetup might help bringing it down,
too)

> Now I think my problem is related to not having installed acpid, as
> referenced at http://groups.google.com/group/ganeti/browse_thread/thread/a9bb8b0c7dc3f9c8,
> on the guests before starting this simulation.

Ah, no, it's not. This is only required for clean shutdown of kvm
guests. and in this case you wouldn't have got a clean shutdown
anyway. :)

Thanks,

Guido

Joe Esposito

unread,
Aug 16, 2010, 10:34:04 AM8/16/10
to gan...@googlegroups.com
Thanks for the quick replies.. 

Does a 'drbdsetup /dev/drbd2 down' work? I expect not, just checking…

Doesn't help..
#lsof | grep drbd2
drbd2_wor 8782   root  cwd       DIR                8,2     4096          2 /
drbd2_wor 8782   root  rtd       DIR                8,2     4096          2 /
drbd2_wor 8782   root  txt   unknown                                        /proc/8782/exe
drbd2_rec 8794   root  cwd       DIR                8,2     4096          2 /
drbd2_rec 8794   root  rtd       DIR                8,2     4096          2 /
drbd2_rec 8794   root  txt   unknown                                        /proc/8794/exe

#drbdsetup /dev/drbd2 status
<resource minor="2" cs="WFConnection" ro1="Primary" ro2="Unknown" ds1="UpToDate" ds2="DUnknown" />

#drbdsetup /dev/drbd2 down

#drbdsetup /dev/drbd2 status
<resource minor="2" cs="Unconfigured" /> 

# lsof | grep drbd2
<returns nothing>

 
You have to check what's the state on clusternode2. I think we lost some
error messages, "Failed to start instance …" should also give the
reason, but it's not reported properly. Can you check the node daemon
logs?

Well, "Failed to start instance" is the last line returned by gnt-instance..  Here's the section of the node daemon log..

<start dump>
2010-08-16 10:29:19,404: ganeti-noded pid=9184 INFO 10.0.0.72:42130 PUT /bridges_exist HTTP/1.0 200
2010-08-16 10:29:19,417: ganeti-noded pid=9185 INFO 10.0.0.72:42131 PUT /instance_info HTTP/1.0 200
2010-08-16 10:29:19,447: ganeti-noded pid=9186 INFO 10.0.0.72:42132 PUT /node_info HTTP/1.0 200
2010-08-16 10:29:19,461: ganeti-noded pid=9188 INFO 10.0.0.72:42133 PUT /hooks_runner HTTP/1.0 200
2010-08-16 10:29:19,585: ganeti-noded pid=9189 INFO 10.0.0.72:42134 PUT /blockdev_assemble HTTP/1.0 200
2010-08-16 10:29:19,691: ganeti-noded pid=9230 INFO 10.0.0.72:42135 PUT /blockdev_assemble HTTP/1.0 200
2010-08-16 10:29:19,773: ganeti-noded pid=9242 ERROR Hypervisor error: Failed to start instance instance4.redfiresolutions.com
Traceback (most recent call last):
  File "/usr/local/lib/python2.6/dist-packages/ganeti/backend.py", line 1044, in StartInstance
    hyper.StartInstance(instance, block_devices)
  File "/usr/local/lib/python2.6/dist-packages/ganeti/hypervisor/hv_kvm.py", line 741, in StartInstance
    self._ExecuteKVMRuntime(instance, kvm_runtime)
  File "/usr/local/lib/python2.6/dist-packages/ganeti/hypervisor/hv_kvm.py", line 725, in _ExecuteKVMRuntime
    self._RunKVMCmd(name, kvm_cmd)
  File "/usr/local/lib/python2.6/dist-packages/ganeti/hypervisor/hv_kvm.py", line 634, in _RunKVMCmd
    raise errors.HypervisorError("Failed to start instance %s" % name)
HypervisorError: Failed to start instance instance4.redfiresolutions.com
2010-08-16 10:29:19,774: ganeti-noded pid=9242 INFO 10.0.0.72:42136 PUT /instance_start HTTP/1.0 200
2010-08-16 10:29:19,812: ganeti-noded pid=9283 ERROR drbd2: can't shutdown drbd device: /dev/drbd2: State change failed: (-12) Device is h

2010-08-16 10:29:19,833: ganeti-noded pid=9283 ERROR drbd2: can't shutdown drbd device: /dev/drbd2: State change failed: (-12) Device is h

2010-08-16 10:29:19,833: ganeti-noded pid=9283 INFO 10.0.0.72:42137 PUT /blockdev_shutdown HTTP/1.0 200
</end dump>

 Finally, yes, clusternode2 is the primary.. here's the output from gnt-node info
#gnt-node info 
  primary ip: 10.0.0.71
  secondary ip: 10.0.1.101
  master candidate: False
  drained: False
  offline: True
  primary for no instances
  secondary for instances:
  primary ip: 10.0.0.72
  secondary ip: 10.0.1.102
  master candidate: True
  drained: False
  offline: False
  primary for instances:
  secondary for no instances

Guido Trotter

unread,
Aug 16, 2010, 10:44:43 AM8/16/10
to gan...@googlegroups.com
On Mon, Aug 16, 2010 at 3:34 PM, Joe Esposito <j...@j-espo.com> wrote:
> Doesn't help..
> #lsof | grep drbd2
> drbd2_wor 8782   root  cwd       DIR                8,2     4096          2
> /
> drbd2_wor 8782   root  rtd       DIR                8,2     4096          2
> /
> drbd2_wor 8782   root  txt   unknown
>  /proc/8782/exe
> drbd2_rec 8794   root  cwd       DIR                8,2     4096          2
> /
> drbd2_rec 8794   root  rtd       DIR                8,2     4096          2
> /
> drbd2_rec 8794   root  txt   unknown
>  /proc/8794/exe
> #drbdsetup /dev/drbd2 status
> <resource minor="2" cs="WFConnection" ro1="Primary" ro2="Unknown"
> ds1="UpToDate" ds2="DUnknown" />
> #drbdsetup /dev/drbd2 down
> #drbdsetup /dev/drbd2 status
> <resource minor="2" cs="Unconfigured" />
> # lsof | grep drbd2
> <returns nothing>
>

No, this is actually pretty good.

>   File "/usr/local/lib/python2.6/dist-packages/ganeti/hypervisor/hv_kvm.py",
> line 725, in _ExecuteKVMRuntime
>     self._RunKVMCmd(name, kvm_cmd)
>   File "/usr/local/lib/python2.6/dist-packages/ganeti/hypervisor/hv_kvm.py",
> line 634, in _RunKVMCmd
>     raise errors.HypervisorError("Failed to start instance %s" % name)
> HypervisorError: Failed to start instance instance4.redfiresolutions.com

Very insightful. So, your disks have no problems (except that they are
degraded and then stop to stop cleanly, might be a minor bug in the
shutdown path when instance start fails). The real problem here is
that kvm can't start the instance. If you run noded on node2 with "-d"
(or -f -d for foreground) you'll see the actual command, and then you
can check why it's failing. Perhaps it's missing something which on
the old node was present? A kernel or initrd? Anyway, we have to
figure out why kvm fails.


>  Finally, yes, clusternode2 is the primary.. here's the output from gnt-node
> info
> #gnt-node info

Ok, thanks. Can we also see gnt-instance info instance4? There might
be some hv parameter that worked on the previous node, but is not
working on this.

Thanks,

Guido

Joe Esposito

unread,
Aug 16, 2010, 10:55:59 AM8/16/10
to gan...@googlegroups.com
Here's the instance info.  I think I get what you were saying..  By starting the node daemon in debug/foreground mode, I'll be able to see the actual KVM commands.  I'll work on that next.

# gnt-instance info instance4.redfiresolutions.com 
Instance name: instance4.x.com
UUID: d12e8c14-6c5e-4b9f-b20f-7e49d57fefa9
Serial number: 13
Creation time: 2010-08-12 18:00:09
Modification time: 2010-08-16 09:04:33
State: configured to be up, actual state is down
  Nodes:
    - primary: clusternode2.x.com
    - secondaries: clusternode1.x.com
  Operating system: debootstrap+default
  Allocated network port: 11017
  Hypervisor: kvm
    - acpi: default (True)
    - boot_order: default (disk)
    - cdrom_image_path: default ()
    - disk_cache: default (default)
    - disk_type: default (paravirtual)
    - initrd_path: default ()
    - kernel_args: default (ro)
    - kernel_path: default (/boot/vmlinuz-2.6-kvmU)
    - kvm_flag: default ()
    - migration_downtime: default (30)
    - nic_type: default (paravirtual)
    - root_path: default (/dev/vda1)
    - security_domain: default ()
    - security_model: default (none)
    - serial_console: default (True)
    - usb_mouse: default ()
    - use_chroot: default (False)
    - use_localtime: default (False)
    - vhost_net: default (False)
    - vnc_bind_address: default ()
    - vnc_password_file: default ()
    - vnc_tls: default (False)
    - vnc_x509_path: default ()
    - vnc_x509_verify: default (False)
  Hardware:
    - VCPUs: 1
    - memory: 4096MiB
    - NICs:
      - nic/0: MAC: aa:00:00:cd:27:fb, IP: None, mode: bridged, link: br0
  Disks:
    - disk/0: drbd8, size 20.0G
      access mode: rw
      nodeA:       clusternode1.x.com, minor=2
      nodeB:       clusternode2.x.com, minor=2
      port:        11018
      auth key:    a02255cb5ea3054ade84fae891adc5d8307cfde1
      on primary:  /dev/drbd2 (147:2) in sync, status *DEGRADED*
      child devices:
        - child 0: lvm, size 20.0G
          logical_id: kvmvg/e67cc690-7645-4566-b8e0-76d3734ea1ec.disk0_data
          on primary: /dev/kvmvg/e67cc690-7645-4566-b8e0-76d3734ea1ec.disk0_data (251:2)
        - child 1: lvm, size 128M
          logical_id: kvmvg/e67cc690-7645-4566-b8e0-76d3734ea1ec.disk0_meta
          on primary: /dev/kvmvg/e67cc690-7645-4566-b8e0-76d3734ea1ec.disk0_meta (251:3)

Joe Esposito

unread,
Aug 16, 2010, 12:47:10 PM8/16/10
to gan...@googlegroups.com
Here's the log from ganeti-noded -f -d


2010-08-16 12:43:34,519: ganeti-noded pid=11308 utils:1806 DEBUG Memory lock set
2010-08-16 12:43:34,519: ganeti-noded pid=11308 server:263 DEBUG Connection from 10.0.0.72:45125
2010-08-16 12:43:34,523: ganeti-noded pid=11308 server:174 DEBUG HTTP request: PUT /bridges_exist HTTP/1.0
2010-08-16 12:43:34,524: ganeti-noded pid=11308 __init__:801 DEBUG Message body: [['br0']]
2010-08-16 12:43:34,524: ganeti-noded pid=11308 server:324 DEBUG Handling request <ganeti.http.server._HttpServerRequest PUT /bridges_exist headers='Connection: close\r\nContent-Length: 10\r\nHost: 10.0.0.72:1811\r\nUser-Agent: Ganeti 2.1.6\r\n' body=[['br0']] at 0x1352c90>
2010-08-16 12:43:34,524: ganeti-noded pid=11308 server:378 INFO 10.0.0.72:45125 PUT /bridges_exist HTTP/1.0 200
2010-08-16 12:43:34,525: ganeti-noded pid=11308 server:296 DEBUG Disconnected 10.0.0.72:45125
2010-08-16 12:43:34,533: ganeti-noded pid=11309 utils:1806 DEBUG Memory lock set
2010-08-16 12:43:34,533: ganeti-noded pid=11309 server:263 DEBUG Connection from 10.0.0.72:45126
2010-08-16 12:43:34,537: ganeti-noded pid=11309 server:174 DEBUG HTTP request: PUT /instance_info HTTP/1.0
2010-08-16 12:43:34,537: ganeti-noded pid=11309 __init__:801 DEBUG Message body: ['instance4.x.com', 'kvm']
2010-08-16 12:43:34,538: ganeti-noded pid=11309 server:324 DEBUG Handling request <ganeti.http.server._HttpServerRequest PUT /instance_info headers='Connection: close\r\nContent-Length: 42\r\nHost: 10.0.0.72:1811\r\nUser-Agent: Ganeti 2.1.6\r\n' body=['instance4.x.com', 'kvm'] at 0x1352cd0>
2010-08-16 12:43:34,538: ganeti-noded pid=11309 server:378 INFO 10.0.0.72:45126 PUT /instance_info HTTP/1.0 200
2010-08-16 12:43:34,539: ganeti-noded pid=11309 server:296 DEBUG Disconnected 10.0.0.72:45126
2010-08-16 12:43:34,547: ganeti-noded pid=11310 utils:1806 DEBUG Memory lock set
2010-08-16 12:43:34,547: ganeti-noded pid=11310 server:263 DEBUG Connection from 10.0.0.72:45127
2010-08-16 12:43:34,551: ganeti-noded pid=11310 server:174 DEBUG HTTP request: PUT /node_info HTTP/1.0
2010-08-16 12:43:34,552: ganeti-noded pid=11310 __init__:801 DEBUG Message body: ['kvmvg', 'kvm']
2010-08-16 12:43:34,552: ganeti-noded pid=11310 server:324 DEBUG Handling request <ganeti.http.server._HttpServerRequest PUT /node_info headers='Connection: close\r\nContent-Length: 17\r\nHost: 10.0.0.72:1811\r\nUser-Agent: Ganeti 2.1.6\r\n' body=['kvmvg', 'kvm'] at 0x1352d50>
2010-08-16 12:43:34,552: ganeti-noded pid=11310 utils:181 DEBUG RunCmd 'vgs -ovg_size,vg_free,pv_count --noheadings --nosuffix --units=m --separator=: kvmvg'
2010-08-16 12:43:34,586: ganeti-noded pid=11310 server:378 INFO 10.0.0.72:45127 PUT /node_info HTTP/1.0 200
2010-08-16 12:43:34,587: ganeti-noded pid=11310 server:296 DEBUG Disconnected 10.0.0.72:45127
2010-08-16 12:43:34,596: ganeti-noded pid=11312 utils:1806 DEBUG Memory lock set
2010-08-16 12:43:34,596: ganeti-noded pid=11312 server:263 DEBUG Connection from 10.0.0.72:45128
2010-08-16 12:43:34,601: ganeti-noded pid=11312 server:174 DEBUG HTTP request: PUT /hooks_runner HTTP/1.0
2010-08-16 12:43:34,601: ganeti-noded pid=11312 __init__:801 DEBUG Message body: ['instance-start', 'pre', {'GANETI_INSTANCE_MEMORY': '4096', 'GANETI_INSTANCE_SECONDARIES': 'clusternode1.x.com', 'GANETI_INSTANCE_HV_cdrom_image_path': '', 'GANETI_INSTANCE_HV_security_domain': '', 'GANETI_INSTANCE_HV_nic_type': 'paravirtual', 'GANETI_INSTANCE_STATUS': 'up', 'GANETI_OP_TARGET': 'instance4.x.com', 'GANETI_INSTANCE_BE_auto_balance': 'True', 'GANETI_INSTANCE_HV_vnc_password_file': '', 'GANETI_INSTANCE_NIC0_LINK': 'br0', 'GANETI_HOOKS_VERSION': '2', 'GANETI_INSTANCE_HV_migration_bandwidth': '32', 'GANETI_INSTANCE_HV_security_model': 'none', 'GANETI_INSTANCE_HV_vnc_x509_verify': 'False', 'GANETI_INSTANCE_HV_migration_downtime': '30', 'GANETI_INSTANCE_BE_memory': '4096', 'GANETI_INSTANCE_HYPERVISOR': 'kvm', 'GANETI_MASTER': 'clusternode2.x.com', 'GANETI_INSTANCE_HV_vnc_tls': 'False', 'GANETI_HOOKS_PATH': 'instance-start', 'GANETI_INSTANCE_OS_TYPE': 'debootstrap+default', 'GANETI_INSTANCE_HV_use_localtime': 'False', 'GANETI_INSTANCE_HV_kvm_flag': '', 'GANETI_INSTANCE_HV_migration_port': '8102', 'GANETI_INSTANCE_NIC0_MODE': 'bridged', 'GANETI_CLUSTER': 'thecluster.x.com', 'GANETI_INSTANCE_DISK_TEMPLATE': 'drbd', 'GANETI_INSTANCE_HV_usb_mouse': '', 'GANETI_INSTANCE_HV_boot_order': 'disk', 'GANETI_FORCE': 'True', 'GANETI_INSTANCE_HV_kernel_args': 'ro', 'GANETI_OP_CODE': 'OP_INSTANCE_STARTUP', 'GANETI_INSTANCE_NIC_COUNT': '1', 'GANETI_INSTANCE_HV_vnc_bind_address': '', 'GANETI_INSTANCE_BE_vcpus': '1', 'GANETI_INSTANCE_VCPUS': '1', 'GANETI_INSTANCE_NIC0_BRIDGE': 'br0', 'GANETI_INSTANCE_DISK_COUNT': '1', 'PATH': '/sbin:/bin:/usr/sbin:/usr/bin', 'GANETI_INSTANCE_HV_disk_cache': 'default', 'GANETI_INSTANCE_DISK0_MODE': 'rw', 'GANETI_INSTANCE_HV_initrd_path': '', 'GANETI_INSTANCE_NIC0_MAC': 'aa:00:00:cd:27:fb', 'GANETI_DATA_DIR': '/var/lib/ganeti', 'GANETI_INSTANCE_NIC0_IP': '', 'GANETI_INSTANCE_HV_vnc_x509_path': '', 'GANETI_INSTANCE_HV_use_chroot': 'False', 'GANETI_INSTANCE_DISK0_SIZE': '20480', 'GANETI_INSTANCE_HV_vhost_net': 'False', 'GANETI_HOOKS_PHASE': 'pre', 'GANETI_INSTANCE_HV_root_path': '/dev/vda1', 'GANETI_INSTANCE_PRIMARY': 'clusternode2.x.com', 'GANETI_INSTANCE_NAME': 'instance4.x.com', 'GANETI_INSTANCE_HV_serial_console': 'True', 'GANETI_INSTANCE_HV_acpi': 'True', 'GANETI_OBJECT_TYPE': 'INSTANCE', 'GANETI_INSTANCE_HV_kernel_path': '/boot/vmlinuz-2.6-kvmU', 'GANETI_INSTANCE_HV_disk_type': 'paravirtual'}]
2010-08-16 12:43:34,601: ganeti-noded pid=11312 server:324 DEBUG Handling request <ganeti.http.server._HttpServerRequest PUT /hooks_runner headers='Connection: close\r\nContent-Length: 2452\r\nHost: 10.0.0.72:1811\r\nUser-Agent: Ganeti 2.1.6\r\n' body=['instance-start', 'pre', {'GANETI_INSTANCE_MEMORY': '4096', 'GANETI_INSTANCE_SECONDARIES': 'clusternode1.x.com', 'GANETI_INSTANCE_HV_cdrom_image_path': '', 'GANETI_INSTANCE_HV_security_domain': '', 'GANETI_INSTANCE_HV_nic_type': 'paravirtual', 'GANETI_INSTANCE_STATUS': 'up', 'GANETI_OP_TARGET': 'instance4.x.com', 'GANETI_INSTANCE_BE_auto_balance': 'True', 'GANETI_INSTANCE_HV_vnc_password_file': '', 'GANETI_INSTANCE_NIC0_LINK': 'br0', 'GANETI_HOOKS_VERSION': '2', 'GANETI_INSTANCE_HV_migration_bandwidth': '32', 'GANETI_INSTANCE_HV_security_model': 'none', 'GANETI_INSTANCE_HV_vnc_x509_verify': 'False', 'GANETI_INSTANCE_HV_migration_downtime': '30', 'GANETI_INSTANCE_BE_memory': '4096', 'GANETI_INSTANCE_HYPERVISOR': 'kvm', 'GANETI_MASTER': 'clusternode2.x.com', 'GANETI_INSTANCE_HV_vnc_tls': 'False', 'GANETI_HOOKS_PATH': 'instance-start', 'GANETI_INSTANCE_OS_TYPE': 'debootstrap+default', 'GANETI_INSTANCE_HV_use_localtime': 'False', 'GANETI_INSTANCE_HV_kvm_flag': '', 'GANETI_INSTANCE_HV_migration_port': '8102', 'GANETI_INSTANCE_NIC0_MODE': 'bridged', 'GANETI_CLUSTER': 'thecluster.x.com', 'GANETI_INSTANCE_DISK_TEMPLATE': 'drbd', 'GANETI_INSTANCE_HV_usb_mouse': '', 'GANETI_INSTANCE_HV_boot_order': 'disk', 'GANETI_FORCE': 'True', 'GANETI_INSTANCE_HV_kernel_args': 'ro', 'GANETI_OP_CODE': 'OP_INSTANCE_STARTUP', 'GANETI_INSTANCE_NIC_COUNT': '1', 'GANETI_INSTANCE_HV_vnc_bind_address': '', 'GANETI_INSTANCE_BE_vcpus': '1', 'GANETI_INSTANCE_VCPUS': '1', 'GANETI_INSTANCE_NIC0_BRIDGE': 'br0', 'GANETI_INSTANCE_DISK_COUNT': '1', 'PATH': '/sbin:/bin:/usr/sbin:/usr/bin', 'GANETI_INSTANCE_HV_disk_cache': 'default', 'GANETI_INSTANCE_DISK0_MODE': 'rw', 'GANETI_INSTANCE_HV_initrd_path': '', 'GANETI_INSTANCE_NIC0_MAC': 'aa:00:00:cd:27:fb', 'GANETI_DATA_DIR': '/var/lib/ganeti', 'GANETI_INSTANCE_NIC0_IP': '', 'GANETI_INSTANCE_HV_vnc_x509_path': '', 'GANETI_INSTANCE_HV_use_chroot': 'False', 'GANETI_INSTANCE_DISK0_SIZE': '20480', 'GANETI_INSTANCE_HV_vhost_net': 'False', 'GANETI_HOOKS_PHASE': 'pre', 'GANETI_INSTANCE_HV_root_path': '/dev/vda1', 'GANETI_INSTANCE_PRIMARY': 'clusternode2.x.com', 'GANETI_INSTANCE_NAME': 'instance4.x.com', 'GANETI_INSTANCE_HV_serial_console': 'True', 'GANETI_INSTANCE_HV_acpi': 'True', 'GANETI_OBJECT_TYPE': 'INSTANCE', 'GANETI_INSTANCE_HV_kernel_path': '/boot/vmlinuz-2.6-kvmU', 'GANETI_INSTANCE_HV_disk_type': 'paravirtual'}] at 0x1357150>
2010-08-16 12:43:34,602: ganeti-noded pid=11312 server:378 INFO 10.0.0.72:45128 PUT /hooks_runner HTTP/1.0 200
2010-08-16 12:43:34,602: ganeti-noded pid=11312 server:296 DEBUG Disconnected 10.0.0.72:45128
2010-08-16 12:43:34,611: ganeti-noded pid=11313 utils:1806 DEBUG Memory lock set
2010-08-16 12:43:34,611: ganeti-noded pid=11313 server:263 DEBUG Connection from 10.0.0.72:45129
2010-08-16 12:43:34,615: ganeti-noded pid=11313 server:174 DEBUG HTTP request: PUT /blockdev_assemble HTTP/1.0
2010-08-16 12:43:34,615: ganeti-noded pid=11313 __init__:801 DEBUG Message body: [{'logical_id': ['clusternode1.x.com', 'clusternode2.x.com', 11018, 2, 2, 'a02255cb5ea3054ade84fae891adc5d8307cfde1'], 'dev_type': 'drbd8', 'mode': 'rw', 'physical_id': ['10.0.1.102', 11018, '10.0.1.101', 11018, 2, 'a02255cb5ea3054ade84fae891adc5d8307cfde1'], 'children': [{'dev_type': 'lvm', 'logical_id': ['kvmvg', 'e67cc690-7645-4566-b8e0-76d3734ea1ec.disk0_data'], 'physical_id': ['kvmvg', 'e67cc690-7645-4566-b8e0-76d3734ea1ec.disk0_data'], 'size': 20480}, {'dev_type': 'lvm', 'logical_id': ['kvmvg', 'e67cc690-7645-4566-b8e0-76d3734ea1ec.disk0_meta'], 'physical_id': ['kvmvg', 'e67cc690-7645-4566-b8e0-76d3734ea1ec.disk0_meta'], 'size': 128}], 'iv_name': 'disk/0', 'size': 20480}, 'instance4.x.com', False]
2010-08-16 12:43:34,616: ganeti-noded pid=11313 server:324 DEBUG Handling request <ganeti.http.server._HttpServerRequest PUT /blockdev_assemble headers='Connection: close\r\nContent-Length: 758\r\nHost: 10.0.0.72:1811\r\nUser-Agent: Ganeti 2.1.6\r\n' body=[{'logical_id': ['clusternode1.x.com', 'clusternode2.x.com', 11018, 2, 2, 'a02255cb5ea3054ade84fae891adc5d8307cfde1'], 'dev_type': 'drbd8', 'mode': 'rw', 'physical_id': ['10.0.1.102', 11018, '10.0.1.101', 11018, 2, 'a02255cb5ea3054ade84fae891adc5d8307cfde1'], 'children': [{'dev_type': 'lvm', 'logical_id': ['kvmvg', 'e67cc690-7645-4566-b8e0-76d3734ea1ec.disk0_data'], 'physical_id': ['kvmvg', 'e67cc690-7645-4566-b8e0-76d3734ea1ec.disk0_data'], 'size': 20480}, {'dev_type': 'lvm', 'logical_id': ['kvmvg', 'e67cc690-7645-4566-b8e0-76d3734ea1ec.disk0_meta'], 'physical_id': ['kvmvg', 'e67cc690-7645-4566-b8e0-76d3734ea1ec.disk0_meta'], 'size': 128}], 'iv_name': 'disk/0', 'size': 20480}, 'instance4.x.com', False] at 0x1352d10>
2010-08-16 12:43:34,616: ganeti-noded pid=11313 utils:181 DEBUG RunCmd 'lvs --noheadings --separator=, --units=m --nosuffix -olv_attr,lv_kernel_major,lv_kernel_minor,vg_extent_size,stripes /dev/kvmvg/e67cc690-7645-4566-b8e0-76d3734ea1ec.disk0_data'
2010-08-16 12:43:34,628: ganeti-noded pid=11313 utils:181 DEBUG RunCmd 'lvchange -ay /dev/kvmvg/e67cc690-7645-4566-b8e0-76d3734ea1ec.disk0_data'
2010-08-16 12:43:34,638: ganeti-noded pid=11313 utils:181 DEBUG RunCmd 'lvs --noheadings --separator=, --units=m --nosuffix -olv_attr,lv_kernel_major,lv_kernel_minor,vg_extent_size,stripes /dev/kvmvg/e67cc690-7645-4566-b8e0-76d3734ea1ec.disk0_meta'
2010-08-16 12:43:34,648: ganeti-noded pid=11313 utils:181 DEBUG RunCmd 'lvchange -ay /dev/kvmvg/e67cc690-7645-4566-b8e0-76d3734ea1ec.disk0_meta'
2010-08-16 12:43:34,659: ganeti-noded pid=11313 utils:181 DEBUG RunCmd 'drbdsetup /dev/drbd2 show'
2010-08-16 12:43:34,707: ganeti-noded pid=11313 utils:181 DEBUG RunCmd 'drbdsetup /dev/drbd2 syncer -r 61440 --create-device'
2010-08-16 12:43:34,710: ganeti-noded pid=11313 server:378 INFO 10.0.0.72:45129 PUT /blockdev_assemble HTTP/1.0 200
2010-08-16 12:43:34,711: ganeti-noded pid=11313 server:296 DEBUG Disconnected 10.0.0.72:45129
2010-08-16 12:43:34,720: ganeti-noded pid=11321 utils:1806 DEBUG Memory lock set
2010-08-16 12:43:34,720: ganeti-noded pid=11321 server:263 DEBUG Connection from 10.0.0.72:45130
2010-08-16 12:43:34,724: ganeti-noded pid=11321 server:174 DEBUG HTTP request: PUT /blockdev_assemble HTTP/1.0
2010-08-16 12:43:34,724: ganeti-noded pid=11321 __init__:801 DEBUG Message body: [{'logical_id': ['clusternode1.x.com', 'clusternode2.x.com', 11018, 2, 2, 'a02255cb5ea3054ade84fae891adc5d8307cfde1'], 'dev_type': 'drbd8', 'mode': 'rw', 'physical_id': ['10.0.1.102', 11018, '10.0.1.101', 11018, 2, 'a02255cb5ea3054ade84fae891adc5d8307cfde1'], 'children': [{'dev_type': 'lvm', 'logical_id': ['kvmvg', 'e67cc690-7645-4566-b8e0-76d3734ea1ec.disk0_data'], 'physical_id': ['kvmvg', 'e67cc690-7645-4566-b8e0-76d3734ea1ec.disk0_data'], 'size': 20480}, {'dev_type': 'lvm', 'logical_id': ['kvmvg', 'e67cc690-7645-4566-b8e0-76d3734ea1ec.disk0_meta'], 'physical_id': ['kvmvg', 'e67cc690-7645-4566-b8e0-76d3734ea1ec.disk0_meta'], 'size': 128}], 'iv_name': 'disk/0', 'size': 20480}, 'instance4.x.com', True]
2010-08-16 12:43:34,725: ganeti-noded pid=11321 server:324 DEBUG Handling request <ganeti.http.server._HttpServerRequest PUT /blockdev_assemble headers='Connection: close\r\nContent-Length: 757\r\nHost: 10.0.0.72:1811\r\nUser-Agent: Ganeti 2.1.6\r\n' body=[{'logical_id': ['clusternode1.x.com', 'clusternode2.x.com', 11018, 2, 2, 'a02255cb5ea3054ade84fae891adc5d8307cfde1'], 'dev_type': 'drbd8', 'mode': 'rw', 'physical_id': ['10.0.1.102', 11018, '10.0.1.101', 11018, 2, 'a02255cb5ea3054ade84fae891adc5d8307cfde1'], 'children': [{'dev_type': 'lvm', 'logical_id': ['kvmvg', 'e67cc690-7645-4566-b8e0-76d3734ea1ec.disk0_data'], 'physical_id': ['kvmvg', 'e67cc690-7645-4566-b8e0-76d3734ea1ec.disk0_data'], 'size': 20480}, {'dev_type': 'lvm', 'logical_id': ['kvmvg', 'e67cc690-7645-4566-b8e0-76d3734ea1ec.disk0_meta'], 'physical_id': ['kvmvg', 'e67cc690-7645-4566-b8e0-76d3734ea1ec.disk0_meta'], 'size': 128}], 'iv_name': 'disk/0', 'size': 20480}, 'instance4.x.com', True] at 0x1352d10>
2010-08-16 12:43:34,725: ganeti-noded pid=11321 utils:181 DEBUG RunCmd 'lvs --noheadings --separator=, --units=m --nosuffix -olv_attr,lv_kernel_major,lv_kernel_minor,vg_extent_size,stripes /dev/kvmvg/e67cc690-7645-4566-b8e0-76d3734ea1ec.disk0_data'
2010-08-16 12:43:34,735: ganeti-noded pid=11321 utils:181 DEBUG RunCmd 'lvchange -ay /dev/kvmvg/e67cc690-7645-4566-b8e0-76d3734ea1ec.disk0_data'
2010-08-16 12:43:34,746: ganeti-noded pid=11321 utils:181 DEBUG RunCmd 'lvs --noheadings --separator=, --units=m --nosuffix -olv_attr,lv_kernel_major,lv_kernel_minor,vg_extent_size,stripes /dev/kvmvg/e67cc690-7645-4566-b8e0-76d3734ea1ec.disk0_meta'
2010-08-16 12:43:34,755: ganeti-noded pid=11321 utils:181 DEBUG RunCmd 'lvchange -ay /dev/kvmvg/e67cc690-7645-4566-b8e0-76d3734ea1ec.disk0_meta'
2010-08-16 12:43:34,766: ganeti-noded pid=11321 utils:181 DEBUG RunCmd 'drbdsetup /dev/drbd2 show'
2010-08-16 12:43:34,814: ganeti-noded pid=11321 utils:181 DEBUG RunCmd 'drbdsetup /dev/drbd2 syncer -r 61440 --create-device'
2010-08-16 12:43:34,817: ganeti-noded pid=11321 utils:181 DEBUG RunCmd 'drbdsetup /dev/drbd2 primary'
2010-08-16 12:43:34,820: ganeti-noded pid=11321 server:378 INFO 10.0.0.72:45130 PUT /blockdev_assemble HTTP/1.0 200
2010-08-16 12:43:34,821: ganeti-noded pid=11321 server:296 DEBUG Disconnected 10.0.0.72:45130
2010-08-16 12:43:34,830: ganeti-noded pid=11332 utils:1806 DEBUG Memory lock set
2010-08-16 12:43:34,831: ganeti-noded pid=11332 server:263 DEBUG Connection from 10.0.0.72:45131
2010-08-16 12:43:34,835: ganeti-noded pid=11332 server:174 DEBUG HTTP request: PUT /instance_start HTTP/1.0
2010-08-16 12:43:34,835: ganeti-noded pid=11332 __init__:801 DEBUG Message body: [{'network_port': 11017, 'primary_node': 'clusternode2.x.com', 'serial_no': 15, 'ctime': 1281650409.123771, 'hypervisor': 'kvm', 'disks': [{'logical_id': ['clusternode1.x.com', 'clusternode2.x.com', 11018, 2, 2, 'a02255cb5ea3054ade84fae891adc5d8307cfde1'], 'dev_type': 'drbd8', 'mode': 'rw', 'physical_id': ['10.0.1.102', 11018, '10.0.1.101', 11018, 2, 'a02255cb5ea3054ade84fae891adc5d8307cfde1'], 'children': [{'dev_type': 'lvm', 'logical_id': ['kvmvg', 'e67cc690-7645-4566-b8e0-76d3734ea1ec.disk0_data'], 'physical_id': ['kvmvg', 'e67cc690-7645-4566-b8e0-76d3734ea1ec.disk0_data'], 'size': 20480}, {'dev_type': 'lvm', 'logical_id': ['kvmvg', 'e67cc690-7645-4566-b8e0-76d3734ea1ec.disk0_meta'], 'physical_id': ['kvmvg', 'e67cc690-7645-4566-b8e0-76d3734ea1ec.disk0_meta'], 'size': 128}], 'iv_name': 'disk/0', 'size': 20480}], 'uuid': 'd12e8c14-6c5e-4b9f-b20f-7e49d57fefa9', 'admin_up': True, 'hvparams': {'nic_type': 'paravirtual', 'use_chroot': False, 'migration_port': 8102, 'vnc_x509_path': '', 'vnc_bind_address': '', 'usb_mouse': '', 'migration_downtime': 30, 'security_model': 'none', 'cdrom_image_path': '', 'boot_order': 'disk', 'vhost_net': False, 'disk_cache': 'default', 'kernel_path': '/boot/vmlinuz-2.6-kvmU', 'acpi': True, 'vnc_x509_verify': False, 'vnc_tls': False, 'use_localtime': False, 'security_domain': '', 'serial_console': True, 'kvm_flag': '', 'vnc_password_file': '', 'migration_bandwidth': 32, 'disk_type': 'paravirtual', 'kernel_args': 'ro', 'root_path': '/dev/vda1', 'initrd_path': ''}, 'nics': [{'mac': 'aa:00:00:cd:27:fb', 'nicparams': {'link': 'br0', 'mode': 'bridged'}}], 'disk_template': 'drbd', 'mtime': 1281972993.328512, 'os': 'debootstrap+default', 'beparams': {'auto_balance': True, 'vcpus': 1, 'memory': 4096}, 'name': 'instance4.x.com'}]
2010-08-16 12:43:34,835: ganeti-noded pid=11332 server:324 DEBUG Handling request <ganeti.http.server._HttpServerRequest PUT /instance_start headers='Connection: close\r\nContent-Length: 1838\r\nHost: 10.0.0.72:1811\r\nUser-Agent: Ganeti 2.1.6\r\n' body=[{'network_port': 11017, 'primary_node': 'clusternode2.x.com', 'serial_no': 15, 'ctime': 1281650409.123771, 'hypervisor': 'kvm', 'disks': [{'logical_id': ['clusternode1.x.com', 'clusternode2.x.com', 11018, 2, 2, 'a02255cb5ea3054ade84fae891adc5d8307cfde1'], 'dev_type': 'drbd8', 'mode': 'rw', 'physical_id': ['10.0.1.102', 11018, '10.0.1.101', 11018, 2, 'a02255cb5ea3054ade84fae891adc5d8307cfde1'], 'children': [{'dev_type': 'lvm', 'logical_id': ['kvmvg', 'e67cc690-7645-4566-b8e0-76d3734ea1ec.disk0_data'], 'physical_id': ['kvmvg', 'e67cc690-7645-4566-b8e0-76d3734ea1ec.disk0_data'], 'size': 20480}, {'dev_type': 'lvm', 'logical_id': ['kvmvg', 'e67cc690-7645-4566-b8e0-76d3734ea1ec.disk0_meta'], 'physical_id': ['kvmvg', 'e67cc690-7645-4566-b8e0-76d3734ea1ec.disk0_meta'], 'size': 128}], 'iv_name': 'disk/0', 'size': 20480}], 'uuid': 'd12e8c14-6c5e-4b9f-b20f-7e49d57fefa9', 'admin_up': True, 'hvparams': {'nic_type': 'paravirtual', 'use_chroot': False, 'migration_port': 8102, 'vnc_x509_path': '', 'vnc_bind_address': '', 'usb_mouse': '', 'migration_downtime': 30, 'security_model': 'none', 'cdrom_image_path': '', 'boot_order': 'disk', 'vhost_net': False, 'disk_cache': 'default', 'kernel_path': '/boot/vmlinuz-2.6-kvmU', 'acpi': True, 'vnc_x509_verify': False, 'vnc_tls': False, 'use_localtime': False, 'security_domain': '', 'serial_console': True, 'kvm_flag': '', 'vnc_password_file': '', 'migration_bandwidth': 32, 'disk_type': 'paravirtual', 'kernel_args': 'ro', 'root_path': '/dev/vda1', 'initrd_path': ''}, 'nics': [{'mac': 'aa:00:00:cd:27:fb', 'nicparams': {'link': 'br0', 'mode': 'bridged'}}], 'disk_template': 'drbd', 'mtime': 1281972993.328512, 'os': 'debootstrap+default', 'beparams': {'auto_balance': True, 'vcpus': 1, 'memory': 4096}, 'name': 'instance4.x.com'}] at 0x1352e50>
2010-08-16 12:43:34,836: ganeti-noded pid=11332 utils:181 DEBUG RunCmd 'lvs --noheadings --separator=, --units=m --nosuffix -olv_attr,lv_kernel_major,lv_kernel_minor,vg_extent_size,stripes /dev/kvmvg/e67cc690-7645-4566-b8e0-76d3734ea1ec.disk0_data'
2010-08-16 12:43:34,864: ganeti-noded pid=11332 utils:181 DEBUG RunCmd 'lvs --noheadings --separator=, --units=m --nosuffix -olv_attr,lv_kernel_major,lv_kernel_minor,vg_extent_size,stripes /dev/kvmvg/e67cc690-7645-4566-b8e0-76d3734ea1ec.disk0_meta'
2010-08-16 12:43:34,875: ganeti-noded pid=11332 utils:181 DEBUG RunCmd 'drbdsetup /dev/drbd2 primary'
2010-08-16 12:43:34,879: ganeti-noded pid=11332 utils:181 DEBUG RunCmd '/usr/bin/kvm -name instance4.x.com -m 4096 -smp 1 -pidfile /var/run/ganeti/kvm-hypervisor/pid/instance4.x.com -daemonize -boot c -drive file=/var/run/ganeti/instance-disks/instance4.x.com:0,format=raw,if=virtio,boot=on -kernel /boot/vmlinuz-2.6-kvmU -append root=/dev/vda1 ro console=ttyS0,38400 -nographic -monitor unix:/var/run/ganeti/kvm-hypervisor/ctrl/instance4.x.com.monitor,server,nowait -serial unix:/var/run/ganeti/kvm-hypervisor/ctrl/instance4.x.com.serial,server,nowait -net nic,vlan=0,macaddr=aa:00:00:cd:27:fb,model=virtio -net tap,vlan=0,script=/tmp/tmpAUtkCz'
2010-08-16 12:43:34,922: ganeti-noded pid=11332 backend:94 ERROR Hypervisor error: Failed to start instance instance4.x.com
Traceback (most recent call last):
  File "/usr/local/lib/python2.6/dist-packages/ganeti/backend.py", line 1044, in StartInstance
    hyper.StartInstance(instance, block_devices)
  File "/usr/local/lib/python2.6/dist-packages/ganeti/hypervisor/hv_kvm.py", line 741, in StartInstance
    self._ExecuteKVMRuntime(instance, kvm_runtime)
  File "/usr/local/lib/python2.6/dist-packages/ganeti/hypervisor/hv_kvm.py", line 725, in _ExecuteKVMRuntime
    self._RunKVMCmd(name, kvm_cmd)
  File "/usr/local/lib/python2.6/dist-packages/ganeti/hypervisor/hv_kvm.py", line 634, in _RunKVMCmd
    raise errors.HypervisorError("Failed to start instance %s" % name)
HypervisorError: Failed to start instance instance4.x.com
2010-08-16 12:43:34,923: ganeti-noded pid=11332 server:378 INFO 10.0.0.72:45131 PUT /instance_start HTTP/1.0 200
2010-08-16 12:43:34,924: ganeti-noded pid=11332 server:296 DEBUG Disconnected 10.0.0.72:45131
2010-08-16 12:43:34,933: ganeti-noded pid=11371 utils:1806 DEBUG Memory lock set
2010-08-16 12:43:34,933: ganeti-noded pid=11371 server:263 DEBUG Connection from 10.0.0.72:45132
2010-08-16 12:43:34,937: ganeti-noded pid=11371 server:174 DEBUG HTTP request: PUT /blockdev_shutdown HTTP/1.0
2010-08-16 12:43:34,938: ganeti-noded pid=11371 __init__:801 DEBUG Message body: [{'logical_id': ['clusternode1.x.com', 'clusternode2.x.com', 11018, 2, 2, 'a02255cb5ea3054ade84fae891adc5d8307cfde1'], 'dev_type': 'drbd8', 'mode': 'rw', 'physical_id': ['10.0.1.102', 11018, '10.0.1.101', 11018, 2, 'a02255cb5ea3054ade84fae891adc5d8307cfde1'], 'children': [{'dev_type': 'lvm', 'logical_id': ['kvmvg', 'e67cc690-7645-4566-b8e0-76d3734ea1ec.disk0_data'], 'physical_id': ['kvmvg', 'e67cc690-7645-4566-b8e0-76d3734ea1ec.disk0_data'], 'size': 20480}, {'dev_type': 'lvm', 'logical_id': ['kvmvg', 'e67cc690-7645-4566-b8e0-76d3734ea1ec.disk0_meta'], 'physical_id': ['kvmvg', 'e67cc690-7645-4566-b8e0-76d3734ea1ec.disk0_meta'], 'size': 128}], 'iv_name': 'disk/0', 'size': 20480}]
2010-08-16 12:43:34,938: ganeti-noded pid=11371 server:324 DEBUG Handling request <ganeti.http.server._HttpServerRequest PUT /blockdev_shutdown headers='Connection: close\r\nContent-Length: 717\r\nHost: 10.0.0.72:1811\r\nUser-Agent: Ganeti 2.1.6\r\n' body=[{'logical_id': ['clusternode1.x.com', 'clusternode2.x.com', 11018, 2, 2, 'a02255cb5ea3054ade84fae891adc5d8307cfde1'], 'dev_type': 'drbd8', 'mode': 'rw', 'physical_id': ['10.0.1.102', 11018, '10.0.1.101', 11018, 2, 'a02255cb5ea3054ade84fae891adc5d8307cfde1'], 'children': [{'dev_type': 'lvm', 'logical_id': ['kvmvg', 'e67cc690-7645-4566-b8e0-76d3734ea1ec.disk0_data'], 'physical_id': ['kvmvg', 'e67cc690-7645-4566-b8e0-76d3734ea1ec.disk0_data'], 'size': 20480}, {'dev_type': 'lvm', 'logical_id': ['kvmvg', 'e67cc690-7645-4566-b8e0-76d3734ea1ec.disk0_meta'], 'physical_id': ['kvmvg', 'e67cc690-7645-4566-b8e0-76d3734ea1ec.disk0_meta'], 'size': 128}], 'iv_name': 'disk/0', 'size': 20480}] at 0x1352d10>
2010-08-16 12:43:34,938: ganeti-noded pid=11371 utils:181 DEBUG RunCmd 'lvs --noheadings --separator=, --units=m --nosuffix -olv_attr,lv_kernel_major,lv_kernel_minor,vg_extent_size,stripes /dev/kvmvg/e67cc690-7645-4566-b8e0-76d3734ea1ec.disk0_data'
2010-08-16 12:43:34,949: ganeti-noded pid=11371 utils:181 DEBUG RunCmd 'lvs --noheadings --separator=, --units=m --nosuffix -olv_attr,lv_kernel_major,lv_kernel_minor,vg_extent_size,stripes /dev/kvmvg/e67cc690-7645-4566-b8e0-76d3734ea1ec.disk0_meta'
2010-08-16 12:43:34,959: ganeti-noded pid=11371 utils:181 DEBUG RunCmd 'drbdsetup /dev/drbd2 down'
2010-08-16 12:43:35,040: ganeti-noded pid=11371 utils:136 DEBUG Command 'drbdsetup /dev/drbd2 down' failed (exited with exit code 11); output: /dev/drbd2: State change failed: (-12) Device is held open by someone

2010-08-16 12:43:35,040: ganeti-noded pid=11371 bdev:70 ERROR drbd2: can't shutdown drbd device: /dev/drbd2: State change failed: (-12) Device is held open by someone

2010-08-16 12:43:35,041: ganeti-noded pid=11371 utils:181 DEBUG RunCmd 'lvs --noheadings --separator=, --units=m --nosuffix -olv_attr,lv_kernel_major,lv_kernel_minor,vg_extent_size,stripes /dev/kvmvg/e67cc690-7645-4566-b8e0-76d3734ea1ec.disk0_data'
2010-08-16 12:43:35,052: ganeti-noded pid=11371 utils:181 DEBUG RunCmd 'lvs --noheadings --separator=, --units=m --nosuffix -olv_attr,lv_kernel_major,lv_kernel_minor,vg_extent_size,stripes /dev/kvmvg/e67cc690-7645-4566-b8e0-76d3734ea1ec.disk0_meta'
2010-08-16 12:43:35,061: ganeti-noded pid=11371 backend:96 ERROR drbd2: can't shutdown drbd device: /dev/drbd2: State change failed: (-12) Device is held open by someone

2010-08-16 12:43:35,062: ganeti-noded pid=11371 server:378 INFO 10.0.0.72:45132 PUT /blockdev_shutdown HTTP/1.0 200
2010-08-16 12:43:35,063: ganeti-noded pid=11371 server:296 DEBUG Disconnected 10.0.0.72:45132

Joe Esposito

unread,
Aug 16, 2010, 1:23:31 PM8/16/10
to gan...@googlegroups.com
So a bit confused now.  I turned off the server to add a new IPMI card (the stock ipmi interface from Dell doesn't allow console access!), and while poking around the BIOS realized that Vritualization Technology wasn't enabled on this machine.  I turned it on, and after rebooting, the vm came up.

Not sure how that's related, or if having the machine unplugged for a period of time helped more.  Odd.. 

Guido Trotter

unread,
Aug 16, 2010, 1:24:38 PM8/16/10
to gan...@googlegroups.com
On Mon, Aug 16, 2010 at 5:47 PM, Joe Esposito <j...@j-espo.com> wrote:

> 2010-08-16 12:43:34,879: ganeti-noded pid=11332 utils:181 DEBUG RunCmd
> '/usr/bin/kvm -name instance4.x.com -m 4096 -smp 1 -pidfile
> /var/run/ganeti/kvm-hypervisor/pid/instance4.x.com -daemonize -boot c -drive
> file=/var/run/ganeti/instance-disks/instance4.x.com:0,format=raw,if=virtio,boot=on
> -kernel /boot/vmlinuz-2.6-kvmU -append root=/dev/vda1 ro console=ttyS0,38400
> -nographic -monitor
> unix:/var/run/ganeti/kvm-hypervisor/ctrl/instance4.x.com.monitor,server,nowait
> -serial
> unix:/var/run/ganeti/kvm-hypervisor/ctrl/instance4.x.com.serial,server,nowait
> -net nic,vlan=0,macaddr=aa:00:00:cd:27:fb,model=virtio -net
> tap,vlan=0,script=/tmp/tmpAUtkCz'

Ok, the problem is actually here. Now, kvm is slightly harder than xen
to debug, but what needs to be figure out is why it actually starts,
but then immediately dies (apparently that's what's going on). What
you could try is try to first down the instance and its disk, then
"gnt-instance activate-disks instrance4" and then run a slight
variation of that command, removing the networking part (which is hard
to emulate, just pass -net none rather than the two -net options which
are passed) and use /dev/drb2 rather than
"/var/run/ganeti/instance-disks/instance4.x.com:0" (which should be a
symlink to it). If you also don't pass -daemonize and remove the
-monitor and -serial options, kvm should start with the serial console
connected to the stdio of your terminal, and perhaps it'll tell you
what's wrong.

Thanks,

Guido

Guido Trotter

unread,
Aug 16, 2010, 1:26:23 PM8/16/10
to gan...@googlegroups.com
On Mon, Aug 16, 2010 at 6:23 PM, Joe Esposito <j...@j-espo.com> wrote:
> So a bit confused now.  I turned off the server to add a new IPMI card (the
> stock ipmi interface from Dell doesn't allow console access!), and while
> poking around the BIOS realized that Vritualization Technology wasn't
> enabled on this machine.  I turned it on, and after rebooting, the vm came
> up.
> Not sure how that's related, or if having the machine unplugged for a period
> of time helped more.  Odd..

Well, I suppose the virtualization bit being turned off, is a good
reason for kvm to crash.
I'm wondering: didn't you say that other vms were running on the same
node, though?

Thanks,

Guido

Joe Esposito

unread,
Aug 16, 2010, 1:28:56 PM8/16/10
to gan...@googlegroups.com
yes, As I said, I'm finding this odd :)  For a second I thought instance4 was the one I created with 2 vpcus, but alas it wasn't.  That was instance5, and it booted fine.  

I'm in the process of learning how to fail back to clusternode1 and make it the master again.. I'll try and recreate this again after that.

Guido Trotter

unread,
Aug 16, 2010, 1:34:41 PM8/16/10
to gan...@googlegroups.com
On Mon, Aug 16, 2010 at 6:28 PM, Joe Esposito <j...@j-espo.com> wrote:
> yes, As I said, I'm finding this odd :)  For a second I thought instance4
> was the one I created with 2 vpcus, but alas it wasn't.  That was instance5,
> and it booted fine.
> I'm in the process of learning how to fail back to clusternode1 and make it
> the master again.. I'll try and recreate this again after that.

By the way, which version of kvm are you using? On recent enough
versions you should enable the kvm_flag hypervisor parameter, at the
cluster level. (gnt-cluster modify -H kvm:kvm_flag=enabled).

Thanks,

Guido

Reply all
Reply to author
Forward
0 new messages