Problems starting domU on Debian Squeeze

278 views
Skip to first unread message

RaSca

unread,
Aug 27, 2010, 6:49:24 AM8/27/10
to ganeti
Hi all,
I've configured 2 Debian Squeeze to work with Xen (PVM) and Ganeti
everything seems fine, I created a VM with this command:

gnt-instance add -t drbd -n debian-lenny-nodo1.cutaway.local:debian-
lenny-nodo2.cutaway.local --disk=0:size=500m --net=0 -o debootstrap
+default vm1.cutaway.local

and it worked.
The problem is that when I try to boot the machines it freezes while
loading the device-mapper, as you can see from the console:

...
...
[ 0.149821] XENBUS: Device with no driver: device/vbd/2048
[ 0.149835] XENBUS: Device with no driver: device/vif/0
[ 0.149846] XENBUS: Device with no driver: device/console/0
[ 0.149881] /build/buildd-linux-2.6_2.6.32-20-i386-jrJt9q/
linux-2.6-2.6.32/debian/build/source_i386_xen/drivers/rtc/hctosys.c:
unable to open rtc device (rtc0)
[ 0.149923] Initalizing network drop monitor service
[ 0.149988] Freeing unused kernel memory: 396k freed
[ 0.151720] Write protecting the kernel text: 2628k
[ 0.152269] Write protecting the kernel read-only data: 960k
[ 0.208457] udev: starting version 160
[ 0.330038] Initialising Xen virtual ethernet driver.
[ 0.373395] blkfront: xvda: barriers enabled
[ 0.375362] xvda: xvda1
[ 0.536412] md: raid1 personality registered for level 1
[ 0.542079] async_tx: api initialized (async)
[ 0.543939] xor: automatically using best checksumming function:
pIII_sse
[ 0.560041] pIII_sse : 942.000 MB/sec
[ 0.560056] xor: using function: pIII_sse (942.000 MB/sec)
[ 0.628092] raid6: int32x1 799 MB/s
[ 0.696086] raid6: int32x2 958 MB/s
[ 0.764085] raid6: int32x4 835 MB/s
[ 0.832078] raid6: int32x8 669 MB/s
[ 0.900029] raid6: mmxx1 2201 MB/s
[ 0.968044] raid6: mmxx2 2655 MB/s
[ 1.036052] raid6: sse1x1 1351 MB/s
[ 1.104040] raid6: sse1x2 2211 MB/s
[ 1.172026] raid6: sse2x1 2344 MB/s
[ 1.240049] raid6: sse2x2 2898 MB/s
[ 1.240063] raid6: using algorithm sse2x2 (2898 MB/s)
[ 1.254443] md: raid6 personality registered for level 6
[ 1.254466] md: raid5 personality registered for level 5
[ 1.254477] md: raid4 personality registered for level 4
[ 1.295048] device-mapper: uevent: version 1.0.3
[ 1.296357] device-mapper: ioctl: 4.15.0-ioctl (2009-04-01)
initialised: dm-d...@redhat.com

What the problem may be?

Here are some versions of the software, which i repeat is all from
Debian Squeeze repos:

linux-image-2.6.32-5-xen-686
xen-hypervisor-4.0-i386 (4.0.1~rc5-1 )
xen-utils-4.0 (4.0.1~rc5-1)
drbd8-utils (8.3.7-2.1)

Thanks in advance,

--
RaSca
Mia Mamma Usa Linux: Niente è impossibile da capire, se lo spieghi
bene!
ra...@miamammausalinux.org
http://www.miamammausalinux.org

Iustin Pop

unread,
Aug 27, 2010, 6:53:35 AM8/27/10
to gan...@googlegroups.com
On Fri, Aug 27, 2010 at 12:49, RaSca <rasc...@gmail.com> wrote:
> Hi all,
> I've configured 2 Debian Squeeze to work with Xen (PVM) and Ganeti
> everything seems fine, I created a VM with this command:
>
> gnt-instance add -t drbd -n debian-lenny-nodo1.cutaway.local:debian-
> lenny-nodo2.cutaway.local --disk=0:size=500m --net=0 -o debootstrap
> +default vm1.cutaway.local
>
> and it worked.
> The problem is that when I try to boot the machines it freezes while
> loading the device-mapper, as you can see from the console:
>
[…]

> [    1.254466] md: raid5 personality registered for level 5
> [    1.254477] md: raid4 personality registered for level 4
> [    1.295048] device-mapper: uevent: version 1.0.3
> [    1.296357] device-mapper: ioctl: 4.15.0-ioctl (2009-04-01)
> initialised: dm-d...@redhat.com
>
> What the problem may be?

Are you sure that this is a frozen machine, rather than one that
simply doesn't have a getty on the console?

iustin

RaSca

unread,
Aug 27, 2010, 8:06:02 AM8/27/10
to ganeti
On 27 Ago, 12:53, Iustin Pop <ius...@google.com> wrote:
[...]
> Are you sure that this is a frozen machine, rather than one that
> simply doesn't have a getty on the console?
> iustin

Hi,
I've created the vm with this command:

gnt-instance add -t drbd -n debian-lenny-nodo1.cutaway.local:debian-
lenny-nodo2.cutaway.local --disk=0:size=500m --net=0 -o debootstrap
+default vm1.cutaway.local

So is a bootstrap installation, anything else. These are the infos
about the instance:

Instance name: vm1.cutaway.local
UUID: 67504091-617c-447f-bc2f-cb3596e8c61d
Serial number: 10
Creation time: 2010-08-27 12:14:49
Modification time: 2010-08-27 14:03:51
State: configured to be up, actual state is up
Nodes:
- primary: debian-lenny-nodo1.cutaway.local
- secondaries: debian-lenny-nodo2.cutaway.local
Operating system: debootstrap+default
Allocated network port: None
Hypervisor: xen-pvm
- bootloader_args: default ()
- bootloader_path: default ()
- initrd_path: default (/boot/initrd-2.6-xenU)
- kernel_args: default (xencons=tty console=tty)
- kernel_path: default (/boot/vmlinuz-2.6-xenU)
- root_path: default (/dev/sda1)
- use_bootloader: default (False)
Hardware:
- VCPUs: 1
- memory: 128MiB
- NICs:
- nic/0: MAC: aa:00:00:fd:a2:9a, IP: None, mode: bridged, link:
xen-br0
Disks:
- disk/0: drbd8, size 500M
access mode: rw
nodeA: debian-lenny-nodo1.cutaway.local, minor=0
nodeB: debian-lenny-nodo2.cutaway.local, minor=0
port: 11009
auth key: 0f5d2192db908e6b96651dd9b5cb71e1efb51d4a
on primary: /dev/drbd0 (147:0) in sync, status ok
on secondary: /dev/drbd0 (147:0) in sync, status ok
child devices:
- child 0: lvm, size 500M
logical_id: xenvg/97181d91-638e-4e1c-
be7b-717a8594f61f.disk0_data
on primary: /dev/xenvg/97181d91-638e-4e1c-
be7b-717a8594f61f.disk0_data (253:0)
on secondary: /dev/xenvg/97181d91-638e-4e1c-
be7b-717a8594f61f.disk0_data (253:0)
- child 1: lvm, size 128M
logical_id: xenvg/97181d91-638e-4e1c-
be7b-717a8594f61f.disk0_meta
on primary: /dev/xenvg/97181d91-638e-4e1c-
be7b-717a8594f61f.disk0_meta (253:1)
on secondary: /dev/xenvg/97181d91-638e-4e1c-
be7b-717a8594f61f.disk0_meta (253:1)

Do you see something wrong?

Thanks a lot!

Thomas Rieschl

unread,
Aug 27, 2010, 8:59:03 AM8/27/10
to gan...@googlegroups.com
hi!

What's the console type in your domU?
Have a look at the /etc/inittab file and make sure, you have "hvc0" in
your first getty console (and *not* tty1)

To change that, just shutdown the instance and activate it's disk, mount
it and change the inittab file.


hth

cheers,
thomas


On 27.08.2010 14:06, RaSca wrote:
> On 27 Ago, 12:53, Iustin Pop <ius...@google.com> wrote:
> [...]
>> Are you sure that this is a frozen machine, rather than one that
>> simply doesn't have a getty on the console?
>> iustin
>
> Hi,
> I've created the vm with this command:
>
> gnt-instance add -t drbd -n debian-lenny-nodo1.cutaway.local:debian-
> lenny-nodo2.cutaway.local --disk=0:size=500m --net=0 -o debootstrap
> +default vm1.cutaway.local
>
> So is a bootstrap installation, anything else. These are the infos
> about the instance:

[...]

signature.asc

RaSca

unread,
Aug 27, 2010, 9:17:55 AM8/27/10
to ganeti
On 27 Ago, 14:59, Thomas Rieschl <tho...@rieschl.com> wrote:
> hi!
> What's the console type in your domU?
> Have a look at the /etc/inittab file and make sure, you have "hvc0" in
> your first getty console (and *not* tty1)
> To change that, just shutdown the instance and activate it's disk, mount
> it and change the inittab file.
> hth
> cheers,
> thomas

Hi Thomas,
I've activated the disks in this way:

# gnt-instance activate-disks vm1.cutaway.local
debian-lenny-nodo1.cutaway.local:disk/0:/dev/drbd0

But I can't mount drbd device:

# mount /dev/drbd0 /mnt/
mount: you must specify the filesystem type

or

# mount -t ext3 /dev/drbd0 /mnt/
mount: wrong fs type, bad option, bad superblock on /dev/drbd0,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so

What am I missing? Note that ganeti is configured to work with lvm,
but is impossible to mount also the logical volume:

# mount /dev/xenvg/97181d91-638e-4e1c-be7b-717a8594f61f.disk0_data /
mnt/
mount: /dev/mapper/xenvg-97181d91--638e--4e1c--
be7b--717a8594f61f.disk0_data already mounted or /mnt/ busy

Thanks,

Iustin Pop

unread,
Aug 27, 2010, 9:26:26 AM8/27/10
to gan...@googlegroups.com

The disks are partitioned, you need to activate the partition via
kpartx first. Please read the documentation at
http://ganeti-doc.googlecode.com/svn/ganeti-2.1/html/admin.html#accessing-an-instance-s-disks

> Note that ganeti is configured to work with lvm,
> but is impossible to mount also the logical volume:
>
> # mount /dev/xenvg/97181d91-638e-4e1c-be7b-717a8594f61f.disk0_data /

Never *ever* do this, since you will introduce corruption in the secondary node.

regards,
iustin

Thomas Rieschl

unread,
Aug 27, 2010, 9:35:22 AM8/27/10
to gan...@googlegroups.com
hi!

On 27.08.2010 15:17, RaSca wrote:
> On 27 Ago, 14:59, Thomas Rieschl <tho...@rieschl.com> wrote:
>> hi!
>> What's the console type in your domU?
>> Have a look at the /etc/inittab file and make sure, you have "hvc0" in
>> your first getty console (and *not* tty1)
>> To change that, just shutdown the instance and activate it's disk, mount
>> it and change the inittab file.
>> hth
>> cheers,
>> thomas
>
> Hi Thomas,
> I've activated the disks in this way:
>
> # gnt-instance activate-disks vm1.cutaway.local
> debian-lenny-nodo1.cutaway.local:disk/0:/dev/drbd0
>
> But I can't mount drbd device:
>
> # mount /dev/drbd0 /mnt/
> mount: you must specify the filesystem type
>
> or
>
> # mount -t ext3 /dev/drbd0 /mnt/
> mount: wrong fs type, bad option, bad superblock on /dev/drbd0,
> missing codepage or helper program, or other error
> In some cases useful info is found in syslog - try
> dmesg | tail or so

hm, that's weird. did you shutdown the instance before activating the disks?
looks like the device is used. also, you have to mount the disk on
debian-lenny-nodo1.cutaway.local (the disk is activated on the primary
node of the instance)

try to deactivate the disks, make sure the instance shut down and then
activate the disks again.
what does "tune2fs -l /dev/drbd0" say?

if nothing works, that could be a disk problem. you could try to remove
the instance and recreate it again if it's just an instance for testing.

>
> What am I missing? Note that ganeti is configured to work with lvm,
> but is impossible to mount also the logical volume:
>
> # mount /dev/xenvg/97181d91-638e-4e1c-be7b-717a8594f61f.disk0_data /
> mnt/
> mount: /dev/mapper/xenvg-97181d91--638e--4e1c--
> be7b--717a8594f61f.disk0_data already mounted or /mnt/ busy

be careful with that. you're not supposed to do that, you could destroy
the disks. if you have to mount the disk that way mount it read-only
(mount -o ro ...)

cheers,
thomas

signature.asc

RaSca

unread,
Aug 27, 2010, 9:36:21 AM8/27/10
to ganeti
On 27 Ago, 15:26, Iustin Pop <ius...@google.com> wrote:
[...]
> The disks are partitioned, you need to activate the partition via
> kpartx first. Please read the documentation athttp://ganeti-doc.googlecode.com/svn/ganeti-2.1/html/admin.html#acces...

Wonderful. I'm In. Thanks. Just another question: what is the right
way to create an instance with a disk not partitioned? Is it possible?

> Never *ever* do this, since you will introduce corruption in the secondary node.
> regards,
> iustin

Sure of it.

Thomas Rieschl

unread,
Aug 27, 2010, 9:45:39 AM8/27/10
to gan...@googlegroups.com
hi

On 27.08.2010 15:26, Iustin Pop wrote:
> On Fri, Aug 27, 2010 at 15:17, RaSca <rasc...@gmail.com> wrote:

[...]


>>
>> What am I missing?
>
> The disks are partitioned, you need to activate the partition via
> kpartx first. Please read the documentation at
> http://ganeti-doc.googlecode.com/svn/ganeti-2.1/html/admin.html#accessing-an-instance-s-disks

[...]

oh, sorry,
did not know about that.
new behavior in ganeti 2?
ganeti 1.2.9 just used whole-disk filesystems, right?


cheers,
thomas

signature.asc

Iustin Pop

unread,
Aug 27, 2010, 9:51:22 AM8/27/10
to gan...@googlegroups.com

Not even read-only mounts are safe, since on journalled filesystems it
*might* trigger some journal replay.

To stay safe, just don't do it, or do it on a snapshot of the LV if
you just want to read data off it.

cheers,
iustin

Iustin Pop

unread,
Aug 27, 2010, 9:52:38 AM8/27/10
to gan...@googlegroups.com
On Fri, Aug 27, 2010 at 15:36, RaSca <rasc...@gmail.com> wrote:
> On 27 Ago, 15:26, Iustin Pop <ius...@google.com> wrote:
> [...]
>> The disks are partitioned, you need to activate the partition via
>> kpartx first. Please read the documentation athttp://ganeti-doc.googlecode.com/svn/ganeti-2.1/html/admin.html#acces...
>
> Wonderful. I'm In. Thanks. Just another question: what is the right
> way to create an instance with a disk not partitioned? Is it possible?

Yes, just edit /etc/default/ganeti-instance-debootstrap (if I remember
the path correctly).

regards,
iustin

Iustin Pop

unread,
Aug 27, 2010, 9:53:56 AM8/27/10
to gan...@googlegroups.com

It's actually a new behaviour in recent versions of
ganeti-instance-debootstrap. Ganeti per se doesn't care, except for
passing the root device to the instance (for Xen PVM).

(It might also be that it only works on Ganeti 2)

So it depends on what you configure the OS scripts with…

regards,
iustin

RaSca

unread,
Aug 27, 2010, 10:04:37 AM8/27/10
to ganeti
On 27 Ago, 15:52, Iustin Pop <ius...@google.com> wrote:
[...]
> Yes, just edit /etc/default/ganeti-instance-debootstrap (if I remember
> the path correctly).
> regards,
> iustin

Correct. PARTITION_STYLE="msdos" is the option.

Thanks guys!

its.me

unread,
Sep 29, 2010, 4:13:28 PM9/29/10
to ganeti
Hello RaSca,

Have you been able to solve your original (domU stops booting)
problem?
I've recently upgraded my Ganeti setup to Debian Squeeze (to be able
to run 2.6.32-xen kernels) and installed Xen-4.0.1.
I'm running PVM instances. When I start an (again - PVM) instance with
older kernel (2.6.26-2-xen) - it starts without any single issue.
When I start exactly the same instance with 2.6.32-5-xen kernel - the
boot process stops at the same point where your original boot was
stopping:
[skipped]
[ 0.169418] NET: Registered protocol family 17
[ 0.169541] registered taskstats version 1
[ 0.169560] XENBUS: Device with no driver: device/vbd/2048
[ 0.169566] XENBUS: Device with no driver: device/vif/0
[ 0.169571] XENBUS: Device with no driver: device/console/0
[ 0.169594] /tmp/buildd/linux-2.6-2.6.32/debian/build/
source_amd64_xen/drivers/rtc/hctosys.c: unable to open rtc device
(rtc0)
[ 0.169618] Initalizing network drop monitor service
[ 0.169700] Freeing unused kernel memory: 596k freed
[ 0.170031] Write protecting the kernel read-only data: 4304k
[ 0.224354] udev: starting version 160
[ 0.319692] Initialising Xen virtual ethernet driver.
[ 0.347374] blkfront: xvda: barriers enabled
[ 0.348617] xvda: unknown partition table
[ 0.499258] md: raid1 personality registered for level 1
[ 0.529504] device-mapper: uevent: version 1.0.3
[ 0.530185] device-mapper: ioctl: 4.15.0-ioctl (2009-04-01)
initialised: dm-d...@redhat.com

I've edited the instance's "inittab" file to have "hvc0" to no effect
at all.
I've also tried to pass "sda" as a disk devices as well as "xvda" to
no effect as well.

The moment I change the kernel back to 2.6.26-2-xen - the same
instance boots and runs again without problems.

Thank you very much for your input.

Regards,
Gregory.
==

its.me

unread,
Oct 1, 2010, 3:12:31 PM10/1/10
to ganeti
Hurray! The problem solved. But the weirdness - left.

A close examination of instance's "/etc/fstab" showed that "xvda2" and
"xvda1" were used. Then I replaced them by "sda" and "sdb".
Then I replaced the "root_path" parameter within Ganeti instance's
config (as a matter of fact - I modified cluster's config because all
my instances start with the same parameters) with "xvda" (note - not "/
dev/sda" as it was originally, not "/dev/xvda2" as in my
"experimental" guest was created by "xen-create-image", but rather
with "/dev/xvda"). The instance boots perfectly when started by Ganeti
now. Please also note that Ganeti cluster's "root_path" must be set to
"/dev/xvda" (despite Ganeti passes to XEN "sda/b" and instance itself
used "sda/b").
Weird. I hope somebody can explain that, but my problem is solved now.
I'm back to experimenting with real-time (live) fail-over from one
node to another - I'm really hoping that time warp (time moved
backward) issue is going to be gone now, under the 2.6.32.x.... (see
my other thread).

I'm hoping that my findings may help somebody.

Regards,
Gregory.
==
> initialised: dm-de...@redhat.com

Guido Trotter

unread,
Oct 3, 2010, 11:40:43 AM10/3/10
to gan...@googlegroups.com
On Fri, Oct 1, 2010 at 8:12 PM, its.me <gb200...@gmail.com> wrote:
> Hurray! The problem solved. But the weirdness - left.
>
> A close examination of instance's "/etc/fstab" showed that "xvda2" and
> "xvda1" were used. Then I replaced them by "sda" and "sdb".
> Then I replaced the "root_path" parameter within Ganeti instance's
> config (as a matter of fact - I modified cluster's config because all
> my instances start with the same parameters) with "xvda" (note - not "/
> dev/sda" as it was originally, not "/dev/xvda2" as in my
> "experimental" guest was created by "xen-create-image", but rather
> with "/dev/xvda"). The instance boots perfectly when started by Ganeti
> now. Please also note that Ganeti cluster's "root_path" must be set to
> "/dev/xvda" (despite Ganeti passes to XEN "sda/b" and instance itself
> used "sda/b").
> Weird. I hope somebody can explain that, but my problem is solved now.
> I'm back to experimenting with real-time (live) fail-over from one
> node to another - I'm really hoping that time warp (time moved
> backward) issue is going to be gone now, under the 2.6.32.x.... (see
> my other thread).
>
> I'm hoping that my findings may help somebody.
>

Probably means that you have a filesystem directly inside the virtual
disk, without having partitions inside. What you did to fstab probably
didn't help, but also doesn't prevent starting the machine, as the
kernel anyway has a working root filesystem. What you did inside
ganeti was the actual solution, pointing the kernel to the actual
device holding your root filesystem. Had you had a partition table
inside the virtual disk, xvda1 would have been correct. All boils down
to what your instance script does at create time.

Regards,

Guido

its.me

unread,
Oct 3, 2010, 3:37:26 PM10/3/10
to ganeti
Hi Guido,

Yes, you're correct. I don't have partitions - the FS is set over the
"whole" virtual disk (intentionally).
I've also replaced "sda/b" by "xvda/b" inside the "/etc/fstab" - the
guest still boots perfectly. Therefore the solution was achieved, as
you explained, pointing the kernel to the right device (path) holding
my roof FS.

Speaking of my original - "time warping" problem - as sad as it could
be admitting it - the issue is still there... I'm going to update my
original thread about it with more details.

Thank you.

Regards,
Gregory.

On Oct 3, 11:40 am, Guido Trotter <ultrot...@gmail.com> wrote:

its.me

unread,
Oct 5, 2010, 11:08:34 PM10/5/10
to ganeti
Yahoo... my 3 node concept cluster is ready to get going into 5 node
production one... The only thing left is to upgrade my current Ganeti
2.1.7 to freshly released 2.2.0.

The time-warping problem is also gone as of now. The latest Squeeze
kernel (2.6.32-5-> 2.6.32-23) works perfectly with "stock" Squeeze's
Xen 4.0.1 and PVM instances.

I've found that IPv6 should be disabled for quick and reliable
migration back and forth between Ganeti nodes. With IPv6 disabled the
migration takes about 15 seconds in one direction and about 1..2
seconds on the way back. I'm using standard, unmodified "xen" clock
source. No need to run NTP on guests (just on nodes, but is an usual
thing). I've also tried to use the very latest Sid kernel (2.6.32-24)
- the [pvm] instance's kernel ends up panicking miserably at the end
of migration and then the only way to remove failed instance is to use
"xm destroy"... Ganeti just sits waiting endlessly on gnt-instance
stop" command. Anyway - just don't use Sid kernel, the latest one from
Squeeze does the perfect job.

Should somebody be interested - I'm using [quite cheap today] SunFire
x2200M2 with dual * quad core CPUs, 4*1Gb NICs, 16G RAM, 2*1TB SATA-II
drives under MDADM (software RAID) for less critical deployment;
SunFire x4100M2 - similar to above, but with dual power supplies and
hardware SAS based RAID, LOM/IPMI has its own non-shared (5th) NIC -
for nodes dedicated to instances running critical tasks.

I'm curious - is there any way to speed up the [Ganeti/Xen] migration
time from one node to another? I'm using bonded 2*100Mbps links for
cluster interfaces and 2*1Gbps links for DRBD - is it possible to have
Ganeti use/share 1Gbps links the same way DRBD uses them? - we have
lots of relatively cheap 1G switches for secondary infrastructure
components (where DRBD lives) while our primary infrastructure still
uses 100Mbps host ports due to increased costs of managed switches...

Thank you.
Regards,
Gregory.
==

Kinglok

unread,
Nov 4, 2010, 6:25:04 AM11/4/10
to ganeti
Thanks for the help.

I am using Debian Squeeze and ganeti 2.1.6 (all from debian
repository)
Using "/dev/xvda" won't help me but "/dev/xvda1".

I create the instance by using this command:
gnt-instance add -t drbd \
-n node1.example.com:node2.example.com \
-o debootstrap+default \
--disk 0:size=5g \
--net=0:ip=192.168.0.10,bridge=eth0 \
-B memory=512 \
-H xen-pvm:kernel_path=/boot/vmlinuz-2.6-xenU,initrd_path=/boot/
initrd-2.6-xenU,root_path=/dev/xvda1 \
instance1.example.com

Kinglok
> > I've recently upgraded my Ganeti setup to DebianSqueeze(to be able
Reply all
Reply to author
Forward
0 new messages