The vhost (and thus tun) module fails to be automagically loaded with ganeti 2.11.3 and qemu 2.1

543 views
Skip to first unread message

chib...@gmail.com

unread,
Aug 13, 2014, 4:11:28 AM8/13/14
to gan...@googlegroups.com

Hello,

cluster running Debian Jessie.
Today's upgrade included a new kernel (but still 3.14) and qemu 2.1 (up from 2.0).

When trying to migrate an instance from non-upgraded node to a freshly rebooted upgraded one this happened:
---
Could not pre-migrate instance vm-01: Failed to accept instance: Failed to open /dev/net/tun
----

This of course persisted with failover:
---
Could not start instance vm-01 on node comp-02: Hypervisor error: Failed to open /dev/net/tun
---

Manually loading the vhost module on the target host (and thus due to dependencies all the other ones including tun)  fixed the issue.

The most likely suspect is the upgrade from qemu 2.0 to 2.1.
Now if qemu isn't doing its job, or ganeti failing to tell it in some new syntax to do its thing is what I'm unsure off. 

Any insights would be very welcome.

Christian

Helga Velroyen

unread,
Aug 13, 2014, 4:19:28 AM8/13/14
to gan...@googlegroups.com
Hi!

can you have at the logs to see if it is Ganeti that gets confused about it? You will probably find something in /var/log/ganeti/node-daemon.log on the nodes where the instances ran (or were supposed to run) and in /var/log/ganeti/master-daemon.log on the master node.

Cheers,
Helga
--
Helga Velroyen | Software Engineer | hel...@google.com | 

Google Germany GmbH
Dienerstr. 12
80331 München

Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Geschäftsführer: Graham Law, Christine Elizabeth Flores

chib...@gmail.com

unread,
Aug 13, 2014, 10:58:15 AM8/13/14
to gan...@googlegroups.com
Sure, but they are not telling anything of significance, or I would have included them in the initial report:

node--daemon.log:
---
2014-08-13 16:15:02,569: ganeti-noded pid=3261 INFO RunCmd /usr/bin/kvm --help
2014-08-13 16:15:02,624: ganeti-noded pid=3261 INFO RunCmd /usr/bin/kvm -device '?'
2014-08-13 16:15:02,693: ganeti-noded pid=3261 ERROR Failed to accept instance: Failed to open /dev/net/tun
Traceback (most recent call last):
  File "/usr/share/ganeti/2.11/ganeti/backend.py", line 2061, in AcceptInstance
    hyper.AcceptInstance(instance, info, target)
  File "/usr/share/ganeti/2.11/ganeti/hypervisor/hv_kvm.py", line 2461, in AcceptInstance
    incoming=incoming_address)
  File "/usr/share/ganeti/2.11/ganeti/hypervisor/hv_kvm.py", line 1912, in _ExecuteKVMRuntime
    tapname, tapfd = _OpenTap(vnet_hdr=vnet_hdr)
  File "/usr/share/ganeti/2.11/ganeti/hypervisor/hv_kvm.py", line 329, in _OpenTap
    raise errors.HypervisorError("Failed to open /dev/net/tun")
HypervisorError: Failed to open /dev/net/tun
---

master-daemon.log:
---
2014-08-13 16:15:02,708: ganeti-masterd pid=595/Jq1/Job191/I_MIGRATE ERROR Instance pre-migration failed, trying to revert disk status: Failed to accept instance: Failed to open /dev/net/tun
2014-08-13 16:15:02,922: ganeti-masterd pid=595/Jq1/Job191 ERROR Op 1/1: Caught exception in INSTANCE_MIGRATE(vm-01)
Traceback (most recent call last):
  File "/usr/share/ganeti/2.11/ganeti/jqueue.py", line 1130, in _ExecOpCodeUnlocked
    timeout=timeout)
  File "/usr/share/ganeti/2.11/ganeti/jqueue.py", line 1441, in _WrapExecOpCode
    return execop_fn(op, *args, **kwargs)
  File "/usr/share/ganeti/2.11/ganeti/mcpu.py", line 538, in ExecOpCode
    calc_timeout)
  File "/usr/share/ganeti/2.11/ganeti/mcpu.py", line 464, in _LockAndExecLU
    result = self._LockAndExecLU(lu, level + 1, calc_timeout)
  File "/usr/share/ganeti/2.11/ganeti/mcpu.py", line 464, in _LockAndExecLU
    result = self._LockAndExecLU(lu, level + 1, calc_timeout)
  File "/usr/share/ganeti/2.11/ganeti/mcpu.py", line 473, in _LockAndExecLU
    result = self._LockAndExecLU(lu, level + 1, calc_timeout)
  File "/usr/share/ganeti/2.11/ganeti/mcpu.py", line 464, in _LockAndExecLU
    result = self._LockAndExecLU(lu, level + 1, calc_timeout)
  File "/usr/share/ganeti/2.11/ganeti/mcpu.py", line 464, in _LockAndExecLU
    result = self._LockAndExecLU(lu, level + 1, calc_timeout)
  File "/usr/share/ganeti/2.11/ganeti/mcpu.py", line 473, in _LockAndExecLU
    result = self._LockAndExecLU(lu, level + 1, calc_timeout)
  File "/usr/share/ganeti/2.11/ganeti/mcpu.py", line 412, in _LockAndExecLU
    result = self._ExecLU(lu)
  File "/usr/share/ganeti/2.11/ganeti/mcpu.py", line 379, in _ExecLU
    result = _ProcessResult(submit_mj_fn, lu.op, lu.Exec(self.Log))
  File "/usr/share/ganeti/2.11/ganeti/cmdlib/base.py", line 250, in Exec
    tl.Exec(feedback_fn)
  File "/usr/share/ganeti/2.11/ganeti/cmdlib/instance_migration.py", line 948, in Exec
    return self._ExecMigration()
  File "/usr/share/ganeti/2.11/ganeti/cmdlib/instance_migration.py", line 737, in _ExecMigration
    (self.instance.name, msg))
OpExecError: Could not pre-migrate instance vm-01: Failed to accept instance: Failed to open /dev/net/tun
---

Helga Velroyen

unread,
Aug 14, 2014, 3:46:49 AM8/14/14
to gan...@googlegroups.com
Hi!

this looks like a KVM problem to me (Ganeti just bubbles up the error). Can you try to start/migrate the instance manually using KVM/qemu (not using the ganeti commands)?

Cheers,
Helga

chib...@gmail.com

unread,
Aug 14, 2014, 5:44:55 AM8/14/14
to gan...@googlegroups.com

Well, a quick and dirty try seems to confirm that the problem is with qemu, or at the very least it needs some flags/configuration it didn't need before.

On a qemu 2.1 node (with the respective modules not loaded) we see:
---
# qemu-system-x86_64 -enable-kvm -netdev type=tap,id=hotnic-0b1f487f-pci-5,fd=8,vhost=on
qemu-system-x86_64: -netdev type=tap,id=hotnic-0b1f487f-pci-5,fd=8,vhost=on: TUNGETIFF ioctl() failed: Bad file descriptor
TUNSETOFFLOAD ioctl() failed: Bad file descriptor
qemu-system-x86_64: -netdev type=tap,id=hotnic-0b1f487f-pci-5,fd=8,vhost=on: tap: open vhost char device failed: No such file or directory
qemu-system-x86_64: -netdev type=tap,id=hotnic-0b1f487f-pci-5,fd=8,vhost=on: Device 'tap' could not be initialized
#
---

On a 2.0 node without the modules loaded we get:
---
# qemu-system-x86_64 -enable-kvm -netdev type=tap,id=hotnic-0b1f487f-pci-5,fd=8,vhost=on
qemu-system-x86_64: -netdev type=tap,id=hotnic-0b1f487f-pci-5,fd=8,vhost=on: TUNGETIFF ioctl() failed: Bad file descriptor
TUNSETOFFLOAD ioctl() failed: Bad file descriptor
Warning: netdev hotnic-0b1f487f-pci-5 has no peer
---
The qemu console comes up and the modules are loaded.

Oh well, next stop Debian bug report for qemu.
Reply all
Reply to author
Forward
0 new messages