2.11 instance fails to boot with stock wheezy kernel

166 views
Skip to first unread message

Karsten Heymann

unread,
Nov 13, 2014, 12:11:19 PM11/13/14
to gan...@googlegroups.com
Hi,

I'm running 2.11 from backports on debian wheezy and I just found out that I cannot start newly created instances with the stock debian wheezy kernel 3.2.0-24 anymore:

# gnt-instance add -t plain -s 1G -n node.lab.mydomain -o debootstrap+default --no-name-check --no-ip-check --no-wait-for-sync -H kvm:kernel_path=/boot/vmlinuz-3.2.0-4-amd64,initrd_path=/boot/initrd.img-3.2.0-4-amd64 test3.lab.mydomain
Thu Nov 13 18:04:24 2014 * disk 0, size 1.0G
Thu Nov 13 18:04:24 2014 * creating instance disks...
Thu Nov 13 18:04:25 2014 adding instance test3.lab.mydomain to cluster config
Thu Nov 13 18:04:26 2014 * running the instance OS create scripts...
Thu Nov 13 18:04:32 2014 * starting instance...

[    0.675605] device-mapper: ioctl: 4.22.0-ioctl (2011-10-19) initialised: dm-d...@redhat.com
[    0.678315]  vda: vda1
[    0.797043] ata2.00: ATAPI: QEMU DVD-ROM, 1.1.2, max UDMA/100
[    0.798752] ata2.00: configured for MWDMA2
[    0.800226] scsi 1:0:0:0: CD-ROM            QEMU     QEMU DVD-ROM     1.1. PQ: 0 ANSI: 5
[    0.815995] sr0: scsi3-mmc drive: 4x/4x cd/rw xa/form2 tray
[    0.817343] cdrom: Uniform CD-ROM driver Revision: 3.20
[    0.821904] sr 1:0:0:0: Attached scsi generic sg0 type 5
Begin: Loading essential drivers ... done.
Begin: Running /scripts/init-premount ... done.
Begin: Mounting root file system ... Begin: Running /scripts/local-top ... Begin: Loading multipath modules ... [    0.896668] device-mapper: multipath: version 1.3.2 loaded
Success: loaded module dm-multipath.
Failure: failed to load module dm-emc.
done.
Begin: Discovering multipaths ... [    0.911857] device-mapper: multipath round-robin: version 1.0.0 loaded
done.
done.
Begin: Running /scripts/local-premount ... [    1.042726] Btrfs loaded
Scanning for Btrfs filesystems
done.
mount: Device or resource busy
Begin: Running /scripts/local-bottom ... done.
done.
Begin: Running /scripts/init-bottom ... mount: No such file or directory
done.
Could not copy file: No such file or directory
Target filesystem doesn't have requested /sbin/init.
No init found. Try passing init= bootarg.

If I boot the very instance with, e.g. Kernel 3.16-0.bpo.3-amd64 from wheezy-backports, it just starts out of the box. Also if I install grub-pc and kernel 3.2.0-4 inside the running instance (and empty the values of kernel_path and initrd_path), it just boots. Honestly I've run out of ideas what could be wrong with our cluster. Does anyone have experienced something similiar and has an idea where to proceed?

Best regards,
Karsten

candlerb

unread,
Nov 14, 2014, 7:03:41 AM11/14/14
to gan...@googlegroups.com
On Thursday, 13 November 2014 17:11:19 UTC, Karsten Heymann wrote:
 
I'm running 2.11 from backports on debian wheezy and I just found out that I cannot start newly created instances with the stock debian wheezy kernel 3.2.0-24 anymore:


You mean 3.2.0-4 I presume? Which exact version?

 
# gnt-instance add -t plain -s 1G -n node.lab.mydomain -o debootstrap+default --no-name-check --no-ip-check --no-wait-for-sync -H kvm:kernel_path=/boot/vmlinuz-3.2.0-4-amd64,initrd_path=/boot/initrd.img-3.2.0-4-amd64 test3.lab.mydomain


I can't replicate your problem.  I tried that command line on a 2-node cluster with this kernel:

# dpkg-query -s linux-image-3.2.0-4-amd64 | grep Version:
Version: 3.2.63-2

(just changing the -n option) and it was fine.


# gnt-instance console test3.lab.mydomain
...
Debian GNU/Linux 7 test3.lab.mydomain ttyS0

test3 login:


I then did an apt-get update // apt-get dist-ugprade and rebooted, so that I had

# dpkg-query -s linux-image-3.2.0-4-amd64 | grep Version:
Version: 3.2.63-2+deb7u1

The instance started again successfully.

My platform is Debian Wheezy, ganeti 2.11.6-1~bpo70+1 from backports.

I wonder if you have some customization either in your cluster-wide default settings, or your use of instance-debootstrap? I have:

# cat /etc/ganeti/instance-debootstrap/variants/default.conf
ARCH="amd64"
EXTRA_PKGS="acpi-support-base,console-tools,udev,linux-image-amd64,sudo,vim,grub-pc,bridge-utils,vlan,openssh-server"
PROXY="http://<local-apt-cacher-ng>:3142/"

Or perhaps you have a corrupted master image in cache?

# ls -l /var/cache/ganeti-instance-debootstrap/
total 330344
-rw------- 1 root root 338268160 Nov 14 11:48 cache-wheezy-amd64.tar

Try removing that, then creating your instance again (it will be slow the first time as it has to re-download all the packages)

HTH,

Brian.

candlerb

unread,
Nov 14, 2014, 9:06:47 AM11/14/14
to gan...@googlegroups.com
I think I found it.

Googling for the error "failed to load module dm-emc." I found an old report

Now, my nodes didn't have multipath-tools or multipath-tools-boot installed. If I install them, it adds /usr/share/initramfs-tools/hooks/multipath and rebuilds my initramfs, and then on booting the instance I get the same problem as you:

Begin: Loading essential drivers ... done.
Begin: Running /scripts/init-premount ... done.
Begin: Mounting root file system ... Begin: Running /scripts/local-top ... Begin: Loading multipath modules ... [    1.928173] Refined TSC clocksource calibration: 2133.351 MHz.
[    1.936961] device-mapper: multipath: version 1.3.2 loaded
Success: loaded module dm-multipath.
modprobe: module dm-emc not found in modules.dep
Failure: failed to load module dm-emc.
done.
Begin: Discovering multipaths ... [    1.979627] device-mapper: multipath round-robin: version 1.0.0 loaded
done.
done.
Begin: Running /scripts/local-premount ... done.
[    2.197204] usb 1-1: new full-speed USB device number 2 using uhci_hcd
mount: mounting /dev/vda1 on /root failed: Device or resource busy
Begin: Running /scripts/local-bottom ... done.
done.
Begin: Running /scripts/init-bottom ... mount: mounting /dev on /root/dev failed: No such file or directory
done.
Target filesystem doesn't have requested /sbin/init.
No init found. Try passing init= bootarg.
[    2.277098] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
[    2.287071] usbcore: registered new interface driver usbhid
[    2.291693] usbhid: USB HID core driver


BusyBox v1.20.2 (Debian 1:1.20.0-7) built-in shell (ash)
Enter 'help' for a list of built-in commands.

/bin/sh: can't access tty; job control turned off
(initramfs) [    2.444450] usb 1-1: New USB device found, idVendor=0627, idProduct=0001
[    2.451505] usb 1-1: New USB device strings: Mfr=1, Product=3, SerialNumber=5
[    2.457979] usb 1-1: Product: QEMU USB Tablet
[    2.461925] usb 1-1: Manufacturer: QEMU 1.1.2
[    2.466199] usb 1-1: SerialNumber: 42
[    2.486557] input: QEMU 1.1.2 QEMU USB Tablet as /devices/pci0000:00/0000:00:01.2/usb1/1-1/1-1:1.0/input/input1
[    2.494519] generic-usb 0003:0627:0001.0001: input,hidraw0: USB HID v0.01 Pointer [QEMU 1.1.2 QEMU USB Tablet] on usb-0000:00:01.2-1/input0

(initramfs)

The solution is just to:

    apt-get remove --purge multipath-tools multipath-tools-boot

However if you need multipath-tools on the node, then I suggest you make a new initramfs which doesn't have them, and use that one for booting the VMs.

Moral of the story: using the same kernel and/or initramfs for the host and the guest can cause problems :-)

Regards,

Brian.

Karsten Heymann

unread,
Nov 14, 2014, 9:56:12 AM11/14/14
to gan...@googlegroups.com
Hi Brian,

awesome!

2014-11-14 15:06 GMT+01:00 candlerb <b.ca...@pobox.com>:
The solution is just to:

    apt-get remove --purge multipath-tools multipath-tools-boot

However if you need multipath-tools on the node, then I suggest you make a new initramfs which doesn't have them, and use that one for booting the VMs.

Moral of the story: using the same kernel and/or initramfs for the host and the guest can cause problems :-)

Brian.

You're totally right! We need multipath on our nodes, but we don't need to boot from multipath devices, so a 'apt-get remove --purge multipath-tools-boot' the problem is gone.

I definitely owe you a $DRINK_OF_YOUR_CHOICE, should you ever come to Berlin.

Best regards,
Karsten

candlerb

unread,
Nov 15, 2014, 6:24:21 AM11/15/14
to gan...@googlegroups.com

You're totally right! We need multipath on our nodes, but we don't need to boot from multipath devices, so a 'apt-get remove --purge multipath-tools-boot' the problem is gone.


Excellent.

This could do with a bit more investigation: e.g. you could try creating a new instance, configure it to boot from grub rather that from external kernel/ramdisk, add multipath-tools-boot to it, and see if it can still boot. If it can't, then you have a simple reproducible test case to report upstream to Debian.

Regards,

Brian.

Reply all
Reply to author
Forward
0 new messages