build_image failure

2,470 views
Skip to first unread message

Kevin Johnson

unread,
May 26, 2015, 10:00:21 AM5/26/15
to chromiu...@chromium.org
Attempting to build_image for 64bit-generic to load on a laptop.  running with --board=amd64-generic dev options.  Getting this at the end:

 
 
plain floppy: device "/proc/3312/fd/3" busy (Resource temporarily unavailable):
Cannot initialize 'S:'
Bad target s:/ldlinux.sys
syslinux: failed to create ldlinux.sys
ERROR   : Tue May 26 09:52:08 EDT 2015
ERROR   : script called:  '--arch=amd64' '--to=/mnt/host/source/src/build/images/amd64-generic/R44-7071.0.2015_05_26_0944-a1/chromiumos_image.bin' '--from=/mnt/host/source/src/build/images/amd64-generic/R44-7071.0.2015_05_26_0944-a1/rootfs_dir/boot' '--vmlinuz=/mnt/host/source/src/build/images/amd64-generic/R44-7071.0.2015_05_26_0944-a1/rootfs_dir/boot/vmlinuz' '--to_offset=127926272' '--to_size=16777216' '--kernel_partition='/mnt/host/source/src/build/images/amd64-generic/R44-7071.0.2015_05_26_0944-a1/vmlinuz.image'' '--install_syslinux'
ERROR   : Backtrace:  (most recent call is last)
ERROR   :  update_bootloaders.sh:195:main(), called: die_err_trap  
ERROR   : 
ERROR   : Command failed:
ERROR   :   Command 'sudo syslinux -d /syslinux "${ESP_DEV}"' exited with nonzero code: 1
umount: /tmp/esp.8WCJNH: not mounted
WARNING : Initial unmount failed. Possibly crosbug.com/23443. Retrying
umount: /tmp/esp.8WCJNH: not mounted
INFO    : Unmounting image from /mnt/host/source/src/build/images/amd64-generic/R44-7071.0.2015_05_26_0944-a1/stateful_dir and /mnt/host/source/src/build/images/amd64-generic/R44-7071.0.2015_05_26_0944-a1/rootfs_dir
Cleaning up /usr/local symlinks for /mnt/host/source/src/build/images/amd64-generic/R44-7071.0.2015_05_26_0944-a1/stateful_dir/dev_image
ERROR   : Tue May 26 09:52:14 EDT 2015
ERROR   : script called:  '/mnt/host/source/src/build/images/amd64-generic/R44-7071.0.2015_05_26_0944-a1' 'chromiumos_image.bin' '--force_developer_mode'
ERROR   : Backtrace:  (most recent call is last)
ERROR   :  cros_make_image_bootable:432:main(), called: make_image_bootable '/mnt/host/source/src/build/images/amd64-generic/R44-7071.0.2015_05_26_0944-a1/chromiumos_image.bin' 
ERROR   :  cros_make_image_bootable:1:make_image_bootable(), called: die 'cros_make_image_bootable failed.' 
ERROR   : 
ERROR   : Error was:
ERROR   :   cros_make_image_bootable failed.


Any help would be appreciated.  all permissions and env seem reasonable.  building on fedora 21 system. 


Marc Herbert

unread,
May 28, 2015, 1:58:07 PM5/28/15
to chromiu...@chromium.org
Same here with Fedora 21 too.

If you download this patch: https://chromium-review.googlesource.com/#/q/Iad269ab19d923a8e1efb1222e309bbf88707d991

... then you can find and extract the complete and exact ./bin/cros_make_image_bootable call that reproduces this issue in 30 seconds instead of 10 minutes.

Marc Herbert

unread,
May 28, 2015, 3:49:16 PM5/28/15
to chromiu...@chromium.org
(refined the subject)

Since the following patch works around this issue there is (again?) something nasty going on with the loop device.

Interestingly, a mere (but sledgehammer) strace -f from outside the cros_sdk also works around the issue. Race condition/Heisenbug?

Educated guess: the failure could be when mattrib forked by syslinux tries to flock(, LOCK_EX|LOCK_NB) /dev/loop3

diff --git a/update_bootloaders.sh b/update_bootloaders.sh
index c81dce76e03f..ad91e669bdcf 100755
--- a/update_bootloaders.sh
+++ b/update_bootloaders.sh
@@ -192,7 +192,9 @@ if [[ "${FLAGS_arch}" = "x86" || "${FLAGS_arch}" = "amd64" ]]; then
   # we cut over from rootfs booting (extlinux).
   if [[ ${FLAGS_install_syslinux} -eq ${FLAGS_TRUE} ]]; then
     safe_umount "${ESP_FS_DIR}"
-    sudo syslinux -d /syslinux "${ESP_DEV}"
+    sudo dd if="${ESP_DEV}" of=/tmp/loop_copy
+    sudo syslinux -d /syslinux /tmp/loop_copy
+    sudo dd of="${ESP_DEV}" if=/tmp/loop_copy
     # mount again for cleanup to free resource gracefully
     sudo mount -o ro "${ESP_DEV}" "${ESP_FS_DIR}"
   fi

Marc Herbert

unread,
May 28, 2015, 4:57:55 PM5/28/15
to chromiu...@chromium.org
How to reproduce the issue instantly:

cros_sdk
./mount_gpt_image.sh -i chromiumos_base_image.bin -m -e /tmp/e
mount # take note of the offset
./mount_gpt_image.sh -i chromiumos_base_image.bin -m -e /tmp/e -u
mount -o loop,offset=[see above] /mnt/host/source/src/build/images/${BOARD}/latest/chromiumos_base_image.bin /tmp/e
syslinux -d /syslinux /dev/loop0
=> plain floppy: device "/proc/9856/fd/3" busy (Resource temporarily unavailable):

Again, strace -f from outside the cros_sdk makes the issue go way.


Where to next?


Marc Herbert

unread,
May 28, 2015, 6:53:48 PM5/28/15
to chromiu...@chromium.org
On Thursday, May 28, 2015 at 1:57:55 PM UTC-7, Marc Herbert wrote:
How to reproduce the issue instantly:

cros_sdk
./mount_gpt_image.sh -i chromiumos_base_image.bin -m -e /tmp/e
sudo losetup 
./mount_gpt_image.sh -i chromiumos_base_image.bin -m -e /tmp/e -u
sudo losetup -f -o [see above] /mnt/host/source/src/build/images/${BOARD}/latest/chromiumos_base_image.bin
syslinux -d /syslinux /dev/loop0
=> plain floppy: device "/proc/9856/fd/3" busy (Resource temporarily unavailable):


Corrected "mount" calls above with "sudo losetup...", sorry for the confusion.

I can now reproduce this outside cros_sdk. Pure syslinux+/dev/loop issue without any Chromium OS involvement of any kind. Pure Fedora 21 (kernel?) issue.

strace still makes the problem go away :-( Maybe just because it slows things down?

Clues anyone?

Mike Frysinger

unread,
May 28, 2015, 7:43:48 PM5/28/15
to Marc Herbert, chromium-os-dev
maybe your DE has some udev/blockid probe code that runs in the background ?  we've had numerous problems with those in the past (and well i guess we continue to).

what if you put a `udevadm settle` after the mount ?  don't think that would work inside the chroot though.

most code we just put retries around it on the assumption that the host distro eventually finishes messing around and we can get back to business :(.
-mike

--
--
Chromium OS Developers mailing list: chromiu...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-os-dev?hl=en


Marc Herbert

unread,
May 28, 2015, 8:35:41 PM5/28/15
to chromiu...@chromium.org, marc.h...@gmail.com
2015-05-28 16:43 GMT-07:00 Mike Frysinger:

> maybe your DE has some udev/blockid probe code that runs in the background ? we've had numerous problems with those in the past (and well i guess we continue to).
>
> what if you put a `udevadm settle` after the mount ? don't think that would work inside the chroot though.
>
> most code we just put retries around it on the assumption that the host distro eventually finishes messing around and we can get back to business :(.


Thanks Mike but I think it may be different this time. When I wrote I could reproduce the issue outside cros_sdk, I meant this:


                                 syslinux -d /syslinux /dev/loop0 # FAILS
strace -f -o /dev/null syslinux -d /syslinux /dev/loop0 # WORKS
                                 syslinux -d /syslinux /dev/loop0 # FAILS
strace -f -o /dev/null syslinux -d /syslinux /dev/loop0 # WORKS
[ad libitum]


I mean: the above sequence calls syslinux *only*, which hopefully does not even realize it's working on a loopback device; or at least does not care about it and does not mess with it.

In the (unfortunately successful...) strace -f, I can see the syslinux parent process forking mattrib/mcopy/mmove 10 times in total. They're all symlinks to mtools. Each of these 10 times mtools does open(/proc/<syslinux-PID>/fd/3, O_RDWR) which is actually /dev/loop0, then grabs an flock(, LOCK_EX|LOCK_NB) before read()-ing. My guess is it's flock that fails when strace -f isn't here to "help".

I tried to substitute /usr/bin/mattrib with an indirection shell script calling /usr/bin/mattrib.real but then mtools complains mattrib.real is unknown. Bummer.

Marc Herbert

unread,
May 30, 2015, 4:39:51 AM5/30/15
to chromiu...@chromium.org
On Thursday, 28 May 2015 12:49:16 UTC-7, Marc Herbert wrote:

Interestingly, a mere (but sledgehammer) strace -f from outside the cros_sdk also works around the issue. Race condition/Heisenbug?

Educated guess: the failure could be when mattrib forked by syslinux tries to flock(, LOCK_EX|LOCK_NB) /dev/loop3


All confirmed: inserting a 10ms sleep in mtools just before the flock() syscall makes the problem go away, whereas a 1ms is not long enough to avoid it.

Removing LOCK_NB also makes it work.

syslinux forks and closes mattrib/mtools in rapid fire - apparently too fast for the loopback flock to be actually released.

According to the semantics in the flock man page I would tend to blame the kernel, thoughts?

Daniel Charles

unread,
May 30, 2015, 1:05:25 PM5/30/15
to chromiu...@chromium.org


On Thursday, May 28, 2015 at 12:49:16 PM UTC-7, Marc Herbert wrote:
(refined the subject)

Since the following patch works around this issue there is (again?) something nasty going on with the loop device.

Interestingly, a mere (but sledgehammer) strace -f from outside the cros_sdk also works around the issue. Race condition/Heisenbug?

Educated guess: the failure could be when mattrib forked by syslinux tries to flock(, LOCK_EX|LOCK_NB) /dev/loop3

diff --git a/update_bootloaders.sh b/update_bootloaders.sh
index c81dce76e03f..ad91e669bdcf 100755
--- a/update_bootloaders.sh
+++ b/update_bootloaders.sh
@@ -192,7 +192,9 @@ if [[ "${FLAGS_arch}" = "x86" || "${FLAGS_arch}" = "amd64" ]]; then
   # we cut over from rootfs booting (extlinux).
   if [[ ${FLAGS_install_syslinux} -eq ${FLAGS_TRUE} ]]; then
     safe_umount "${ESP_FS_DIR}"
-    sudo syslinux -d /syslinux "${ESP_DEV}"
+    sudo dd if="${ESP_DEV}" of=/tmp/loop_copy
+    sudo syslinux -d /syslinux /tmp/loop_copy
+    sudo dd of="${ESP_DEV}" if=/tmp/loop_copy
     # mount again for cleanup to free resource gracefully
     sudo mount -o ro "${ESP_DEV}" "${ESP_FS_DIR}"
   fi


This patch does work on Fedora 22 as well.  I suspected of the device being in use after umount so I tried umount -d and losetup -d on the loop device /dev/loop3 in my case but different errors showed up.  With losetup -d /dev/loop3, syslinux will error with syslinux: short read. 



-- 
Daniel 

Radhakrishna

unread,
Jul 28, 2015, 3:49:21 PM7/28/15
to Chromium OS dev, kpj...@gmail.com
The fix suggested works on arch linux. 

Marc Herbert

unread,
Jul 28, 2015, 5:38:15 PM7/28/15
to Chromium OS dev, kpj...@gmail.com, radhakrish...@intel.com
On Tuesday, 28 July 2015 12:49:21 UTC-7, Radhakrishna wrote:
The fix suggested works on arch linux. 

It's a workaround :-)


I doubt Arch Linux and Fedora share a lot of kernel patches, so this could be recent regression (assuming it's the kernel). So I think it's worth sharing the kernel versions which reproduce this and the ones which don't:

4.0.8-200.fc21.x86_64 # Failing

Sharing what I think is the simplest way to reproduce this issue:

dd </dev/zero >/tmp/loop4sys count=10k
mkfs.vfat /tmp/loop4sys
losetup -f /tmp/loop4sys
losetup -a
syslinux /dev/loopX # plain floppy: device "/proc/13518/fd/3" busy (Resource temporarily unavailable):

# cleanup 
losetup -d /dev/loopX
rm /tmp/loop4sys

By the way someone reproduced this issue with a USB memory stick; no loopback: https://bugzilla.redhat.com/show_bug.cgi?id=1235016



Radhakrishna

unread,
Jul 28, 2015, 8:10:17 PM7/28/15
to Chromium OS dev, kpj...@gmail.com, marc.h...@gmail.com
The steps mentioned in simplest way to reproduce the issue did not indicate an error. Reverting the workaround patch on the file update_boot_loaders.sh produced the error originally mentioned while running bin/cros_make_image_bootable

Mike Frysinger

unread,
Jul 29, 2015, 2:09:46 AM7/29/15
to Marc Herbert, Chromium OS dev, kpj...@gmail.com, radhakrish...@intel.com
i tried to reproduce this a few times on diff systems and haven't been able to, but i don't have any Fedora setups ...
-mike

Joe Konno

unread,
Aug 4, 2015, 5:57:41 PM8/4/15
to Chromium OS dev
I'm also seeing this on an ~amd64 Gentoo system. It hit me during the tail end of build_image two executions in a row, and then succeeded on the 3rd. I had just rebooted after an earlier failure-- I was lazy and thought I'd exhausted my loopback devices. Smells like a race where I lose most of the time.

Sanitized uname -a: Linux 4.1.2-gentoo #1 SMP

Cheers, hth

Dongseong Hwang

unread,
Aug 11, 2015, 8:02:24 AM8/11/15
to Chromium OS dev
Ubuntu 15.04 has the same problem since 22nd/Jul. my kernel version is 3.19.0-25-generic
I think most upstream linux distrubutions suffer this problem, so it's needed to land workaround patch into chrome os repository (even if it's temporary)

- DS

黃虎崎

unread,
Aug 14, 2015, 1:25:33 AM8/14/15
to Chromium OS dev
I encountered the same problem on debian stretch. My kernel version is  4.0.2-1 (2015-05-11) x86_64.
And I applied the patch which provided from Marc, it can solve this issue.

Dongseong Hwang於 2015年8月11日星期二 UTC+8下午8時02分24秒寫道:

Mike Frysinger

unread,
Oct 2, 2015, 4:46:28 PM10/2/15
to Daniel Charles, chromium-os-dev
this is being tracked in http://crbug.com/508713 now
-mike

Mike Frysinger

unread,
Oct 2, 2015, 4:46:42 PM10/2/15
to 黃虎崎, Chromium OS dev
this is being tracked in http://crbug.com/508713 now
-mike

--

Marc Herbert

unread,
Oct 2, 2015, 11:31:16 PM10/2/15
to Chromium OS dev, huk...@gmail.com
On Friday, October 2, 2015 at 1:46:42 PM UTC-7, Mike Frysinger wrote:
this is being tracked in http://crbug.com/508713 now


A very useful link indeed, thanks!

In case you're too lazy/busy to go and read everything over there, quick summary: this race condition was not introduced in any new kernel version. It's a race condition between mtools and udev. It was introduced in udev version 214, after which udev tries to lock the (loopback) device(s). There does not seem to be any strong consensus about the "correct" fix yet.


Marc Herbert

unread,
Nov 3, 2015, 12:32:57 PM11/3/15
to Chromium OS dev
The conclusion of https://code.google.com/p/chromium/issues/detail?id=508713#c8 is that mtools is bad and should be fixed. For convenience in the mean time I've made available a cleaner, more robust and heavily tested version of the workaround below. Add it to your workspace creation script(s) with something as simple as this:

repo  start  build-workarounds   .
repo download --cherry-pick chromiumos/platform/crosutils 303962/1 # replace '/1' with the latest, too bad '/current' is not supported.


On Thursday, 28 May 2015 12:49:16 UTC-7, Marc Herbert wrote:

Since the following patch works around this issue there is (again?) something nasty going on with the loop device.

Alessandro Barracane

unread,
Nov 4, 2015, 11:17:34 AM11/4/15
to Chromium OS dev
thanks , i've the same issue with debian and the patch resolve for me.
now the image is builded  without errors

Robert Wolfe

unread,
Mar 17, 2016, 1:23:16 PM3/17/16
to Chromium OS dev

On Tuesday, November 3, 2015 at 11:32:57 AM UTC-6, Marc Herbert wrote:
The conclusion of https://code.google.com/p/chromium/issues/detail?id=508713#c8 is that mtools is bad and should be fixed. For convenience in the mean time I've made available a cleaner, more robust and heavily tested version of the workaround below. Add it to your workspace creation script(s) with something as simple as this:

repo  start  build-workarounds   .

For some reason, this did not work for me.
 
repo download --cherry-pick chromiumos/platform/crosutils 303962/1 # replace '/1' with the latest, too bad '/current' is not supported.

This, however, did.  Running the latest Ubuntu and kernel.

Robert Wolfe

unread,
Mar 17, 2016, 2:25:30 PM3/17/16
to Chromium OS dev
I was able to successfully build an installer with this one change without any issues.  The only problem I seem to be having now is getting the installer to lock onto my WiFi network :)

--
--
Chromium OS Developers mailing list: chromiu...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-os-dev?hl=en

---
You received this message because you are subscribed to the Google Groups "Chromium OS dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chromium-os-d...@chromium.org.

Marc Herbert

unread,
Mar 17, 2016, 7:54:41 PM3/17/16
to Chromium OS dev
On Thursday, March 17, 2016 at 10:23:16 AM UTC-7, Robert Wolfe wrote:

On Tuesday, November 3, 2015 at 11:32:57 AM UTC-6, Marc Herbert wrote:
The conclusion of https://code.google.com/p/chromium/issues/detail?id=508713#c8 is that mtools is bad and should be fixed. For convenience in the mean time I've made available a cleaner, more robust and heavily tested version of the workaround below. Add it to your workspace creation script(s) with something as simple as this:
 
repo download --cherry-pick chromiumos/platform/crosutils 303962/1 # replace '/1' with the latest, too bad '/current' is not supported.

This, however, did.  Running the latest Ubuntu and kernel.


If you look at the number of stars right now it looks like this issue is rare. Yet I heard just personally of five times more people who experienced this race condition.

Reply all
Reply to author
Forward
This conversation is locked
You cannot reply and perform actions on locked conversations.
0 new messages