I got a cluster with 21 nodes, all was working great for several month, but since yesterday, I got an issue with one of the node.
This particullar node is slightly different than others : it does have 2 xen-br (mentionning it because it's the only difference on that node)
On that node, there was 6 VM running fine, 5 on xen-br0 one on xen-br1.
Yesterday I wanted to reboot one VM (running on xen-br0 and was working just fine), and I had this issue :
[2013-07-15 11:19:12 4591] DEBUG (XendDomainInfo:101) XendDomainInfo.create(['vm', ['name', 'agate115'], ['memory', 2868], ['on_poweroff', 'destroy'], ['on_reboot', 'restart'], ['on_crash', 'restart'], ['on_xend_start', 'ignore'], ['on_xend_stop', 'ignore'], ['vcpus', 4], ['oos', 1], ['image', ['linux', ['kernel', '/boot/vmlinuz-2.6.32-5-xen-amd64'], ['ramdisk', '/boot/initrd.img-2.6.32-5-xen-amd64'], ['root', '/dev/xvda1'], ['videoram', 4], ['args', 'ro'], ['tsc_mode', 0], ['nomigrate', 0]]], ['s3_integrity', 1], ['device', ['vbd', ['uname', 'phy:/var/run/ganeti/instance-disks/agate115:0'], ['dev', 'sda'], ['mode', 'w']]], ['device', ['vif', ['bridge', 'xen-br0'], ['mac', 'aa:00:00:47:7b:0f']]]])
[2013-07-15 11:19:12 4591] DEBUG (XendDomainInfo:2508) XendDomainInfo.constructDomain
[2013-07-15 11:19:12 4591] DEBUG (balloon:220) Balloon: 6406688 KiB free; need 16384; done.
[2013-07-15 11:19:12 4591] DEBUG (XendDomain:464) Adding Domain: 35
[2013-07-15 11:19:12 4591] DEBUG (XendDomainInfo:2818) XendDomainInfo.initDomain: 35 256
[2013-07-15 11:19:12 4591] DEBUG (XendDomainInfo:2845) _initDomain:shadow_memory=0x0, memory_static_max=0xb3400000, memory_static_min=0x0.
[2013-07-15 11:19:12 4591] INFO (image:182) buildDomain os=linux dom=35 vcpus=4
[2013-07-15 11:19:12 4591] DEBUG (image:721) domid = 35
[2013-07-15 11:19:12 4591] DEBUG (image:722) memsize = 2868
[2013-07-15 11:19:12 4591] DEBUG (image:723) image = /boot/vmlinuz-2.6.32-5-xen-amd64
[2013-07-15 11:19:12 4591] DEBUG (image:724) store_evtchn = 1
[2013-07-15 11:19:12 4591] DEBUG (image:725) console_evtchn = 2
[2013-07-15 11:19:12 4591] DEBUG (image:726) cmdline = root=/dev/xvda1 ro
[2013-07-15 11:19:12 4591] DEBUG (image:727) ramdisk = /boot/initrd.img-2.6.32-5-xen-amd64
[2013-07-15 11:19:12 4591] DEBUG (image:728) vcpus = 4
[2013-07-15 11:19:12 4591] DEBUG (image:729) features =
[2013-07-15 11:19:12 4591] DEBUG (image:730) flags = 0
[2013-07-15 11:19:12 4591] DEBUG (image:731) superpages = 0
[2013-07-15 11:19:13 4591] INFO (XendDomainInfo:2367) createDevice: vbd : {'uuid': 'bf55fa8e-d67c-bec5-201b-92753a38acd3', 'bootable': 1, 'driver': 'paravirtualised', 'dev': 'sda', 'uname': 'phy:/var/run/ganeti/instance-disks/agate115:0', 'mode': 'w'}
[2013-07-15 11:19:13 4591] DEBUG (DevController:95) DevController: writing {'virtual-device': '2048', 'device-type': 'disk', 'protocol': 'x86_64-abi', 'backend-id': '0', 'state': '1', 'backend': '/local/domain/0/backend/vbd/35/2048'} to /local/domain/35/device/vbd/2048.
[2013-07-15 11:19:13 4591] DEBUG (DevController:97) DevController: writing {'domain': 'agate115', 'frontend': '/local/domain/35/device/vbd/2048', 'uuid': 'bf55fa8e-d67c-bec5-201b-92753a38acd3', 'bootable': '1', 'dev': 'sda', 'state': '1', 'params': '/var/run/ganeti/instance-disks/agate115:0', 'mode': 'w', 'online': '1', 'frontend-id': '35', 'type': 'phy'} to /local/domain/0/backend/vbd/35/2048.
[2013-07-15 11:19:13 4591] INFO (XendDomainInfo:2367) createDevice: vif : {'bridge': 'xen-br0', 'mac': 'aa:00:00:47:7b:0f', 'uuid': 'b1cd862b-9457-3247-61a5-6204163860d9'}
[2013-07-15 11:19:13 4591] DEBUG (DevController:95) DevController: writing {'mac': 'aa:00:00:47:7b:0f', 'handle': '0', 'protocol': 'x86_64-abi', 'backend-id': '0', 'state': '1', 'backend': '/local/domain/0/backend/vif/35/0'} to /local/domain/35/device/vif/0.
[2013-07-15 11:19:13 4591] DEBUG (DevController:97) DevController: writing {'bridge': 'xen-br0', 'domain': 'agate115', 'handle': '0', 'uuid': 'b1cd862b-9457-3247-61a5-6204163860d9', 'script': '/etc/xen/scripts/vif-bridge', 'mac': 'aa:00:00:47:7b:0f', 'frontend-id': '35', 'state': '1', 'online': '1', 'frontend': '/local/domain/35/device/vif/0'} to /local/domain/0/backend/vif/35/0.
[2013-07-15 11:19:13 4591] DEBUG (XendDomainInfo:3400) Storing VM details: {'on_xend_stop': 'ignore', 'shadow_memory': '0', 'uuid': 'cb196e18-c809-2415-06e9-62818066bc79', 'on_reboot': 'restart', 'start_time': '1373879953.31', 'on_poweroff': 'destroy', 'bootloader_args': '', 'on_xend_start': 'ignore', 'on_crash': 'restart', 'xend/restart_count': '0', 'vcpus': '4', 'vcpu_avail': '15', 'bootloader': '', 'image': "(linux (kernel /boot/vmlinuz-2.6.32-5-xen-amd64) (ramdisk /boot/initrd.img-2.6.32-5-xen-amd64) (args 'root=/dev/xvda1 ro') (superpages 0) (tsc_mode 0) (videoram 4) (pci ()) (nomigrate 0) (notes (HV_START_LOW 18446603336221196288) (FEATURES '!writable_page_tables|pae_pgdir_above_4gb') (VIRT_BASE 18446744071562067968) (GUEST_VERSION 2.6) (PADDR_OFFSET 0) (GUEST_OS linux) (HYPERCALL_PAGE 18446744071578882048) (LOADER generic) (SUSPEND_CANCEL 1) (PAE_MODE yes) (ENTRY 18446744071584297472) (XEN_VERSION xen-3.0)))", 'name': 'agate115'}
[2013-07-15 11:19:13 4591] DEBUG (XendDomainInfo:1804) Storing domain details: {'console/ring-ref': '3773537', 'image/entry': '18446744071584297472', 'console/port': '2', 'cpu/3/availability': 'online', 'store/ring-ref': '3773538', 'image/loader': 'generic', 'vm': '/vm/cb196e18-c809-2415-06e9-62818066bc79', 'control/platform-feature-multiprocessor-suspend': '1', 'image/hv-start-low': '18446603336221196288', 'description': '', 'cpu/2/availability': 'online', 'cpu/1/availability': 'online', 'image/virt-base': '18446744071562067968', 'memory/target': '2936832', 'image/guest-version': '2.6', 'image/pae-mode': 'yes', 'image/guest-os': 'linux', 'console/limit': '1048576', 'image/paddr-offset': '0', 'image/hypercall-page': '18446744071578882048', 'image/suspend-cancel': '1', 'cpu/0/availability': 'online', 'image/features/pae-pgdir-above-4gb': '1', 'image/features/writable-page-tables': '0', 'console/type': 'xenconsoled', 'name': 'agate115', 'domid': '35', 'image/xen-version': 'xen-3.0', 'store/port': '1'}
[2013-07-15 11:19:13 4591] DEBUG (DevController:95) DevController: writing {'protocol': 'x86_64-abi', 'state': '1', 'backend-id': '0', 'backend': '/local/domain/0/backend/console/35/0'} to /local/domain/35/device/console/0.
[2013-07-15 11:19:13 4591] DEBUG (DevController:97) DevController: writing {'domain': 'agate115', 'frontend': '/local/domain/35/device/console/0', 'uuid': 'd8333596-906f-46d2-b304-ef7be6d85fc0', 'frontend-id': '35', 'state': '1', 'location': '2', 'online': '1', 'protocol': 'vt100'} to /local/domain/0/backend/console/35/0.
[2013-07-15 11:19:13 4591] DEBUG (XendDomainInfo:1891) XendDomainInfo.handleShutdownWatch
[2013-07-15 11:19:13 4591] DEBUG (DevController:139) Waiting for devices vif2.
[2013-07-15 11:19:13 4591] DEBUG (DevController:139) Waiting for devices vif.
[2013-07-15 11:19:13 4591] DEBUG (DevController:144) Waiting for 0.
[2013-07-15 11:19:13 4591] DEBUG (DevController:628) hotplugStatusCallback /local/domain/0/backend/vif/35/0/hotplug-status.
[2013-07-15 11:20:53 4591] DEBUG (XendDomainInfo:3053) XendDomainInfo.destroy: domid=35
[2013-07-15 11:20:54 4591] DEBUG (XendDomainInfo:2411) Destroying device model
[2013-07-15 11:20:54 4591] DEBUG (XendDomainInfo:2418) Releasing devices
[2013-07-15 11:20:54 4591] DEBUG (XendDomainInfo:2424) Removing vif/0
[2013-07-15 11:20:54 4591] DEBUG (XendDomainInfo:1286) XendDomainInfo.destroyDevice: deviceClass = vif, device = vif/0
[2013-07-15 11:20:54 4591] DEBUG (XendDomainInfo:2424) Removing vbd/2048
[2013-07-15 11:20:54 4591] DEBUG (XendDomainInfo:1286) XendDomainInfo.destroyDevice: deviceClass = vbd, device = vbd/2048
[2013-07-15 11:20:54 4591] DEBUG (XendDomainInfo:2424) Removing console/0
[2013-07-15 11:20:54 4591] DEBUG (XendDomainInfo:1286) XendDomainInfo.destroyDevice: deviceClass = console, device = console/0
[2013-07-15 11:20:54 4591] DEBUG (XendDomainInfo:2416) No device model
[2013-07-15 11:20:54 4591] DEBUG (XendDomainInfo:2418) Releasing devices
[2013-07-15 11:20:54 4591] DEBUG (XendDomainInfo:2424) Removing vif/0
[2013-07-15 11:20:54 4591] DEBUG (XendDomainInfo:1286) XendDomainInfo.destroyDevice: deviceClass = vif, device = vif/0
[2013-07-15 11:20:54 4591] DEBUG (XendDomainInfo:2424) Removing vbd/2048
[2013-07-15 11:20:54 4591] DEBUG (XendDomainInfo:1286) XendDomainInfo.destroyDevice: deviceClass = vbd, device = vbd/2048
But all others instance on that node works just fine..
I've read a lot about that error, but none of what I saw worked for me.
The instance is using the correct xen-br0 (not br1).
Nilshar.