mkimg used to create gpt image, problem booting

Craig Rodrigues

unread,

Aug 22, 2014, 12:49:14 PM8/22/14

to

Hi,

I did the following:

(1) Created a chroot from a src checkout/build with:
make installworld DESTDIR=/opt2/branches/test1
make installkernel DESTDIR=/opt2/branches/test1
make distribution DESTDIR=/opt2/branches/test1

(2) Created a UFS image, /tmp/file.img using makefs based on the
contents of /opt2/branches/test1

(3) Verified with bhyve that the UFS image booted properly.

(4) Created an image with:
cd /opt2/branches
mkimg -v -s gpt -b test1/boot/pmbr -p freebsd-boot:=test1/boot/gptboot
-p freebsd-ufs:=/tmp/file.img -o /tmp/foo1.img

(5) Tried to boot the image with qemu:

qemu-system-x86_64 -m 2048 -hda /tmp/foo1.img

SeaBIOS (version rel-1.7.4-0-g96917a8-20140203_153353-nilsson.home.kraxel.org)

iPXE (http://ipxe.org) 00:03.0 C900 PCI2.10 PnP
PMM+7FFC6110+7FF26110 C900

Booting from Hard Disk...

BTX loader 1.00 BTX version is 1.02
Consoles: internal video/keyboard
BIOS drive A: is disk0
BIOS drive C: is disk1
BIOS 639kB/2096120kB available memory

FreeBSD/x86 bootstrap loader, Revision 1.1
(ro...@dibbler.crodrigues.org, Wed Aug 20 21:58:27 PDT 2014)

can't load 'kernel'

Type '?' for a list of commands, 'help' for more detailed help.
OK

If I mdconfig the foo1.img disk image, and do a gpart show, I see:

=> 3 1784944 md0 GPT (872M)
3 32 1 freebsd-boot (16K)
35 1784912 2 freebsd-ufs (872M)

Any idea what I am doing wrong?

Thanks.

--
Craig
_______________________________________________
freebsd...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-curre...@freebsd.org"

Marcel Moolenaar

unread,

Aug 22, 2014, 4:45:26 PM8/22/14

to

On Aug 22, 2014, at 9:49 AM, Craig Rodrigues <rod...@freebsd.org> wrote:

>
> (5) Tried to boot the image with qemu:
>
> qemu-system-x86_64 -m 2048 -hda /tmp/foo1.img

*snip*

> If I mdconfig the foo1.img disk image, and do a gpart show, I see:
>
> => 3 1784944 md0 GPT (872M)
> 3 32 1 freebsd-boot (16K)
> 35 1784912 2 freebsd-ufs (872M)
>
> Any idea what I am doing wrong?

To the best of my knowledge, qemu is the thing you're
doing wrong :-)

I have so far not been able to boot an image created
by mkimg with a FreeBSD-hosted qemu.
o VMware and VirtualBox are fine.
o A non-FreeBSD hosted qemu also works fine.

If your host is running -current, make sure to set
MALLOC_CONF=junk:false. It improves behaviour on
FreeBSD for boot0/boo1.

HTH (probably not),

--
Marcel Moolenaar
mar...@xcllnt.net

signature.asc

Craig Rodrigues

unread,

Aug 23, 2014, 3:00:40 PM8/23/14

to

On Fri, Aug 22, 2014 at 1:45 PM, Marcel Moolenaar <mar...@xcllnt.net> wrote:
>
>> If I mdconfig the foo1.img disk image, and do a gpart show, I see:
>>
>> => 3 1784944 md0 GPT (872M)
>> 3 32 1 freebsd-boot (16K)
>> 35 1784912 2 freebsd-ufs (872M)
>>
>> Any idea what I am doing wrong?
>
> To the best of my knowledge, qemu is the thing you're
> doing wrong :-)

Hi,

I transferred foo1.img to a Mac with VirtualBox, converted it to VMDK
with "VBoxManage convertfromraw --format VMDK", and tried to boot it
in VirtualBox.
I got the same error as in QEMU. It looks like /boot/loader runs,
but I cannot do "ls" to see the root file system.

I created another disk image with a GPT layout, but this time
using the FreeBSD bsdinstall inside a bhyve VM.

I noticed the following:

WORKING IMAGE BOOTS IN QEMU, CREATED WITH BSDINSTALL
=============================================
=> 34 10485693 md0 GPT (5.0G)
34 128 1 83bd6b9d-7f41-11dc-be0b-001560b84f0f (64K)
162 9959296 2 516e7cb6-6ecf-11d6-8ff8-00022d09712b (4.7G)
9959458 524288 3 516e7cb5-6ecf-11d6-8ff8-00022d09712b (256M)
10483746 1981 - free - (991K)

DOES NOT BOOT IN QEMU, CREATED WITH MKIMG
===================================================
=> 3 1784944 md1 GPT (872M)
3 32 1 83bd6b9d-7f41-11dc-be0b-001560b84f0f (16K)
35 1784912 2 516e7cb6-6ecf-11d6-8ff8-00022d09712b (872M)

I ran the following crazy experiment, just to see what would happen:

dd if=/dev/md1s2 of=/dev/md0s2 bs=8192

I then tried to boot the first image with QEMU, and it booted successfully,
with my UFS file system that I had previously created with makefs.

I'm not sure where to look for the problem. I notice that
in the non-working image, the offset starts at block 3,
while in the working image, the offset starts at block 34.

Is that enough to make things not boot?

Marcel Moolenaar

unread,

Aug 23, 2014, 3:11:18 PM8/23/14

to

On Aug 23, 2014, at 12:00 PM, Craig Rodrigues <rod...@freebsd.org> wrote:
>
> I ran the following crazy experiment, just to see what would happen:
>
> dd if=/dev/md1s2 of=/dev/md0s2 bs=8192
>
> I then tried to boot the first image with QEMU, and it booted successfully,
> with my UFS file system that I had previously created with makefs.
>
> I'm not sure where to look for the problem. I notice that
> in the non-working image, the offset starts at block 3,
> while in the working image, the offset starts at block 34.
>
> Is that enough to make things not boot?

Could be. Try the -P option to mkimg. It sets the
underlying (unexposed) physical sector size while
still working with the visible 512 bytes sectors.
The net effect is that for the GPT scheme things
get aligned to the physical sector size and that
it also causes the image size to be rounded.

You can also try emitting vmdk directly to see if
that makes a difference. vmdk also has the side-
effect of rounding the image to the grain size.

--
Marcel Moolenaar
mar...@xcllnt.net

signature.asc

Craig Rodrigues

unread,

Aug 23, 2014, 4:49:50 PM8/23/14

to

On Sat, Aug 23, 2014 at 12:11 PM, Marcel Moolenaar <mar...@xcllnt.net> wrote:
>
> Could be. Try the -P option to mkimg. It sets the
> underlying (unexposed) physical sector size while
> still working with the visible 512 bytes sectors.
> The net effect is that for the GPT scheme things
> get aligned to the physical sector size and that
> it also causes the image size to be rounded.
>
> You can also try emitting vmdk directly to see if
> that makes a difference. vmdk also has the side-
> effect of rounding the image to the grain size.

I tried the following experiments:

mkimg -v -f vmdk -s gpt -b test1/boot/pmbr -p

freebsd-boot:=test1/boot/gptboot -p freebsd-ufs:=/tmp/file.img -o

/tmp/foo1.vmdk

When I tried to boot the image in QEMU, I had the same problem as before.
It looks like it started writing the image on block 3, same as before.

I also tried adding the -P flag, with different values like 2048 and 4096.
I ran into the same problem.

Hmm.

Craig Rodrigues

unread,

Aug 23, 2014, 10:14:17 PM8/23/14

to

Hi,

I did some more experiments, and found that after /boot/loader runs,
if I break into the loader prompt and type "lsdev", I would get this:

(1) GPT Disk image which boots under QEMU, made by bsdinstall
==================================================
View from loader
============
OK lsdev
cd devices:
disk devices:
disk0: BIOS drive A:
disk1: BIOS drive C:
disk1p1: FreeBSD boot
disk1p2: FreeBSD UFS
disk1p3: FreeBSD swap
pxe devices:

View from gpart, after we mdconfig the disk image

====================================
=> 34 10485693 md0 GPT (5.0G)

34 128 1 freebsd-boot (64K)
162 9959296 2 freebsd-ufs (4.7G)
9959458 524288 3 freebsd-swap (256M)

10483746 1981 - free - (991K)

(2) GPT Disk image which fails to boot under QEMU, made by mkimg
===================================================
View from loader
============
OK lsdev
cd devices:
disk devices:
disk0: BIOS drive A:
disk1: BIOS drive C:
pxe devices:

View from gpart, after we mdconfig the disk image

====================================

=> 3 1784944 md1 GPT (872M)

3 32 1 freebsd-boot (16K)
35 1784912 2 freebsd-ufs (872M)

This leads me to believe that there is logic in /boot/loader,
which is not in GEOM, that fails to parse the GPT produced by mkimg.

I did some further debugging inside the loader by doing the following.
-> I added "CFLAGS += -DPART_DEBUG" to sys/boot/common/Makefile.inc
-> I added DEBUG() statements all over sys/boot/common/part.c

I observed that in sys/boot/common/part.c in the ptbl_gptread() function,
that in this section:

305 ent = (struct gpt_ent *)tbl;
306 size = MIN(hdr.hdr_entries * hdr.hdr_entsz,
307 MAXTBLSZ * table->sectorsize);
308 for (i = 0; i < size / hdr.hdr_entsz; i++, ent++) {
309 if (uuid_equal(&ent->ent_type, &gpt_uuid_unused, NULL))
310 continue;

ent->ent_type is all 0's, which matches gpt_uuid_unused, so it bails
out of the loop and never adds the gpt partitions to the list of partitions
that the loader can access.

I'm not familiar with the GPT format, nor am I familiar with the
gpt code inside the loader, and how it differs from GEOM.

Do you have any further ideas of where to hunt for the root cause of
the problem?

Thanks.

Andrey V. Elsukov

unread,

Aug 24, 2014, 5:11:44 AM8/24/14

to

Yes, the problem is in the ptable_gptread() function. I'll commit the fix.

--
WBR, Andrey V. Elsukov

signature.asc

Marcel Moolenaar

unread,

Aug 24, 2014, 11:23:29 AM8/24/14

to

On Aug 24, 2014, at 2:11 AM, Andrey V. Elsukov <bu7...@yandex.ru> wrote:

> On 24.08.2014 06:14, Craig Rodrigues wrote:

>> I did some further debugging inside the loader by doing the following.
>> -> I added "CFLAGS += -DPART_DEBUG" to sys/boot/common/Makefile.inc
>> -> I added DEBUG() statements all over sys/boot/common/part.c
>>
>> I observed that in sys/boot/common/part.c in the ptbl_gptread() function,
>> that in this section:
>>
>> 305 ent = (struct gpt_ent *)tbl;
>> 306 size = MIN(hdr.hdr_entries * hdr.hdr_entsz,
>> 307 MAXTBLSZ * table->sectorsize);
>> 308 for (i = 0; i < size / hdr.hdr_entsz; i++, ent++) {
>> 309 if (uuid_equal(&ent->ent_type, &gpt_uuid_unused, NULL))
>> 310 continue;
>>
>> ent->ent_type is all 0's, which matches gpt_uuid_unused, so it bails
>> out of the loop and never adds the gpt partitions to the list of partitions
>> that the loader can access.
>

> Yes, the problem is in the ptable_gptread() function. I'll commit the fix.
>

Actually, no. There is *a* problem in that function:
The function does not respect hdr.hdr_entsz when it
needs the next entry. It simply uses "ent++", which
is fixed our definition of struct gpt_ent and may
not match the definition of the writer.

I don't see how the loader is responsible for *the*
problem. All I see in qemu is that the loader, when
it reads a sector, isn't getting the actual sector
data that's in the image.

Just do a ktrace on qemu and you'll see what I mean.
YMMV of course,

--
Marcel Moolenaar
mar...@xcllnt.net

signature.asc

Craig Rodrigues

unread,

Aug 24, 2014, 12:59:10 PM8/24/14

to

On Sun, Aug 24, 2014 at 2:11 AM, Andrey V. Elsukov <bu7...@yandex.ru> wrote:
>
> Yes, the problem is in the ptable_gptread() function. I'll commit the fix.

Index: head/sys/boot/common/part.c
===================================================================
--- head/sys/boot/common/part.c (revision 270444)
+++ head/sys/boot/common/part.c (revision 270445)
@@ -254,8 +254,8 @@
table->sectorsize);
if (phdr != NULL) {
/* Read the primary GPT table. */
- size = MIN(MAXTBLSZ,
- phdr->hdr_entries * phdr->hdr_entsz / table->sectorsize);
+ size = MIN(MAXTBLSZ, (phdr->hdr_entries * phdr->hdr_entsz +
+ table->sectorsize - 1) / table->sectorsize);
if (dread(dev, tbl, size, phdr->hdr_lba_table) == 0 &&
gpt_checktbl(phdr, tbl, size * table->sectorsize,
table->sectors - 1) == 0) {

I can confirm that r270445 fixes the problem for me, where I can
now QEMU boot a GPT partitioned image created with mkimg.

I put some more debugging in the code, and found this:

(1) GPT IMAGE CREATED WITH MKIMG
===============================

phdr->hdr_entries = 2, phdr->hdr_entsz = 128, table->sectorsize = 512

(2) GPT IMAGE CREATED WITH BSDINSTALL
===================================

phdr->hdr_entries = 128, phdr->hdr_entsz = 128, table->sectorsize = 512

Does gpart create a fixed partition table with 128 entries?
That would explain a lot.

Also, in the gptboot man page, it mentions that gptboot can only boot
on systems with 128 partitions or less. This seems like an artificial
restriction.
Does the gptboot code really enforce this? Not that I have a system with more
than 128 partitions. :)

Should mkimg be changed to create a partition table with 128 entries
by default, to match older versions of FreeBSD which do not have this fix?

Thanks for the fix!

Marcel Moolenaar

unread,

Aug 24, 2014, 3:05:15 PM8/24/14

to

On Aug 24, 2014, at 9:59 AM, Craig Rodrigues <rod...@freebsd.org> wrote:

> On Sun, Aug 24, 2014 at 2:11 AM, Andrey V. Elsukov <bu7...@yandex.ru> wrote:
>>
>> Yes, the problem is in the ptable_gptread() function. I'll commit the fix.
>

> Should mkimg be changed to create a partition table with 128 entries
> by default, to match older versions of FreeBSD which do not have this fix?

Maybe. 128 is the suggested default. It's not a hard lower
limit. Technically speaking, it's perfectly fine to create
just enough entries to fill a single sector. Then again,
code makes all kinds of assumptions or has all kinds of
bugs -- just like the logic in the loader apparently.

By having mkimg create a large table, even if it's knows
up front that it doesn't have to may prevent broken code
from tripping over, bit it surely bloats the image for
no reason.

--
Marcel Moolenaar
mar...@xcllnt.net

signature.asc

Craig Rodrigues

unread,

Aug 24, 2014, 6:37:55 PM8/24/14

to

I see what you are saying. If you have a new device
and do "gpart create -s GPT", then this value in sys/geom/part/g_part_gpt.c:

.gps_minent = 128,

causes the logic in the g_part_ctl_create() function in sys/geom/part/g_part.c
to set the number of partitions to 128, and then it calls g_part_ctl_create()
which creates the partition table with 128 empty entries.

"gpart create" doesn't know how many partitions you want, so it
needs to allocate some space up front for the partition table,
and then you can do "gpart add" to add the partitions later.

With mkimg, you know exactly how many partitions you are creating
, so you don't need to specify 128 as the number of partitions.

Since only gpart was available for creating GPT partitions, the side-effect
of always having 128 partitions hid the bug in the loader.

Hopefully Andrey's fix can be backported to at least stable/9, because
the loader bug seems to have been there since at least 2012.

John-Mark Gurney

unread,

Aug 24, 2014, 7:31:43 PM8/24/14

to

Though, w/ people dd'ing images onto disks, and using growfs to grow
as necessary, we might want to allocate a few more more than the
minimum... I do agree that we want to keep sizes to a minimum...

> Since only gpart was available for creating GPT partitions, the side-effect
> of always having 128 partitions hid the bug in the loader.
>
> Hopefully Andrey's fix can be backported to at least stable/9, because
> the loader bug seems to have been there since at least 2012.

--

John-Mark Gurney Voice: +1 415 225 5579

"All that I will do, has been done, All that I have, has not."

Marcel Moolenaar

unread,

Aug 24, 2014, 7:55:08 PM8/24/14

to

On Aug 24, 2014, at 4:31 PM, John-Mark Gurney <j...@funkthat.com> wrote:
>>
>> With mkimg, you know exactly how many partitions you are creating
>> , so you don't need to specify 128 as the number of partitions.
>
> Though, w/ people dd'ing images onto disks, and using growfs to grow
> as necessary, we might want to allocate a few more more than the
> minimum... I do agree that we want to keep sizes to a minimum...

One thing I can maybe do is simply fill the empty sectors
that are there because of alignment. If you add -P 4K to
mkimg, then the first partition will by 4K aligned and you
have about 5 sectors unused between the end of the GPT
table and the first sector of the first partition. I may
as well extend the table to cover those unued sectors.

However, this is a pretty side-ways way to end up with a
GPT table that has some extra room. Maybe having scheme-
specific options for this kind of thing is not bad. At
least EBR and APM have the same "problem" and the BSD
disk label can also be created with more than just 8
partitions.

Related:
o Is there a need to create images with empty space at
the end of in between partitions?
o Is there a need to create partitions with specific
indices (i.e. index 6 for a typical 'f' partition)?

Answers to these questions could help figure this out...

--
Marcel Moolenaar
mar...@xcllnt.net

signature.asc

Andrey V. Elsukov

unread,

Aug 25, 2014, 3:29:41 AM8/25/14

to

On 24.08.2014 20:59, Craig Rodrigues wrote:
> Also, in the gptboot man page, it mentions that gptboot can only boot
> on systems with 128 partitions or less. This seems like an artificial
> restriction.
> Does the gptboot code really enforce this? Not that I have a system with more
> than 128 partitions. :)

It's because gptboot uses static buffer to read and write GPT table.

--
WBR, Andrey V. Elsukov

Andrey V. Elsukov

unread,

Aug 25, 2014, 3:48:01 AM8/25/14

to

On 25.08.2014 03:55, Marcel Moolenaar wrote:
>> Though, w/ people dd'ing images onto disks, and using growfs to grow
>> as necessary, we might want to allocate a few more more than the
>> minimum... I do agree that we want to keep sizes to a minimum...
>
> One thing I can maybe do is simply fill the empty sectors
> that are there because of alignment. If you add -P 4K to
> mkimg, then the first partition will by 4K aligned and you
> have about 5 sectors unused between the end of the GPT
> table and the first sector of the first partition. I may
> as well extend the table to cover those unued sectors.

IMHO, mkimg should behave like gpart and create images in
gpart-compatible way. Some users may want to copy the partition layout
from the image to real hardware and they will not be able to do it.

Also, now FreeBSD 11.0 uses different first usable LBA. By default it is
4k aligned. And this creates some incompatibility with older versions.
You can't do `gpart restore` and get the same table, as you had on older
system.

> However, this is a pretty side-ways way to end up with a
> GPT table that has some extra room. Maybe having scheme-
> specific options for this kind of thing is not bad. At
> least EBR and APM have the same "problem" and the BSD
> disk label can also be created with more than just 8
> partitions.

I thought about implementing `gpart modify` or `gpart set` -n entries to
change number of entries when it is possible (i.e. disklabel(8) can do
it, but gpart doesn't).
Also in r228076 I changed APM code to calculate maximum number of
entries depending from available free space.

--
WBR, Andrey V. Elsukov

Marcel Moolenaar

unread,

Aug 25, 2014, 10:40:50 AM8/25/14

to

On Aug 25, 2014, at 12:48 AM, Andrey V. Elsukov <bu7...@yandex.ru> wrote:

> On 25.08.2014 03:55, Marcel Moolenaar wrote:
>>> Though, w/ people dd'ing images onto disks, and using growfs to grow
>>> as necessary, we might want to allocate a few more more than the
>>> minimum... I do agree that we want to keep sizes to a minimum...
>>
>> One thing I can maybe do is simply fill the empty sectors
>> that are there because of alignment. If you add -P 4K to
>> mkimg, then the first partition will by 4K aligned and you
>> have about 5 sectors unused between the end of the GPT
>> table and the first sector of the first partition. I may
>> as well extend the table to cover those unued sectors.
>
> IMHO, mkimg should behave like gpart and create images in
> gpart-compatible way.

It already does. There's s difference between behaving
like something else or behaving exactly identical to
that something. The same applies to compatible. It is
not the same as identical.

There is no compatibility issue. mkimg follows the GPT
specification (modulo bugs) and gpart happily groks the
partition table.

> Some users may want to copy the partition layout
> from the image to real hardware and they will not be able to do it.

I'm inclined to say that generally speaking this is not
possible. The GPT has metadata in the first few sectors
*and* the last few sectors and LBAs of these sectors are
part of the metadata. You cannot blindly copy an image
onto a physical medium unless the image and the physical
medium are of exactly the same size. Odds are they are
not.

To reliably transfer or convert an image (e.g. RAW->VHD)
one must modify the image as part of the process. Not a
hard rule, but best to assume as a rule of thumb. This
seems to warrant a utility all by itself.

> Also, now FreeBSD 11.0 uses different first usable LBA. By default it is
> 4k aligned. And this creates some incompatibility with older versions.
> You can't do `gpart restore` and get the same table, as you had on older
> system.

It sounds restore is broken then. The restore command
cannot ever assume anything about the GPT. Including
the tool that created the GPT. In order to restore a
GPT, it must be properly backed-up. The backup header
and table should suffice most of the time for that
purpose as it's a replica, but as soon as meta-data is
missing and the restore command has to guess, things
will go wrong.

--
Marcel Moolenaar
mar...@xcllnt.net

signature.asc

Andrey V. Elsukov

unread,

Aug 25, 2014, 11:02:44 AM8/25/14

to

On 25.08.2014 18:40, Marcel Moolenaar wrote:
>> Also, now FreeBSD 11.0 uses different first usable LBA. By default it is
>> 4k aligned. And this creates some incompatibility with older versions.
>> You can't do `gpart restore` and get the same table, as you had on older
>> system.
>
> It sounds restore is broken then. The restore command
> cannot ever assume anything about the GPT. Including
> the tool that created the GPT. In order to restore a
> GPT, it must be properly backed-up. The backup header
> and table should suffice most of the time for that
> purpose as it's a replica, but as soon as meta-data is
> missing and the restore command has to guess, things
> will go wrong.

`gpart restore` just uses a number of commands to geom_part(4) to create
partition table similar to what was backed up. If your partition table
on the old system had a partition that starts from LBA 34, now `gpart
create` isn't able to create such partition table. Because by default
the first usable LBA is 40.

John-Mark Gurney

unread,

Aug 25, 2014, 1:38:40 PM8/25/14

to

Andrey V. Elsukov wrote this message on Mon, Aug 25, 2014 at 19:02 +0400:
> On 25.08.2014 18:40, Marcel Moolenaar wrote:
> >> Also, now FreeBSD 11.0 uses different first usable LBA. By default it is
> >> 4k aligned. And this creates some incompatibility with older versions.
> >> You can't do `gpart restore` and get the same table, as you had on older
> >> system.
> >
> > It sounds restore is broken then. The restore command
> > cannot ever assume anything about the GPT. Including
> > the tool that created the GPT. In order to restore a
> > GPT, it must be properly backed-up. The backup header
> > and table should suffice most of the time for that
> > purpose as it's a replica, but as soon as meta-data is
> > missing and the restore command has to guess, things
> > will go wrong.
>
> `gpart restore` just uses a number of commands to geom_part(4) to create
> partition table similar to what was backed up. If your partition table
> on the old system had a partition that starts from LBA 34, now `gpart
> create` isn't able to create such partition table. Because by default
> the first usable LBA is 40.

Luckily, gpart restore won't work:
# gpart backup /dev/md0
GPT 4
1 freebsd-ufs 8 262144
# gpart restore md1 < /tmp/foob.gpt.back
gpart: entries '4': Invalid argument

So, we're somewhat safe, guess gpart restore needs to learn how to
handle entries properly....

We should fix this, since other OS's might not use 128 for entries..

--
John-Mark Gurney Voice: +1 415 225 5579

"All that I will do, has been done, All that I have, has not."