ESOS booting problem

503 views
Skip to first unread message

ch...@friedmann.com

unread,
Nov 10, 2014, 10:59:18 AM11/10/14
to esos-...@googlegroups.com
I have been running ESOS for the last year with no issues. I decided to update to the latest so I loaded r713 to my usb drive. Up on rebooting, I now am presented into the base shell. I can usually reboot a couple of times and it will finally boot into ESOS.
I figured I might try the latest r719, but now no matter how many times I reboot, I always go into the shell mode:

mount: can't read '/etc/fstab': No such file or directory
Something bad happened; attempting to drop into a shell...

I have tried re-loading the image, but no matter what I try, I get this.
DL380G5, 32GB ram, Smart Array P410 connected to D2700 storage array

Can you post the script that runs to remount root and boot into the live environment? I want to be able to troubleshoot what is going on.

Thanks

Chris

Marc Smith

unread,
Nov 10, 2014, 11:58:26 AM11/10/14
to esos-...@googlegroups.com
Hi Chris,

Here is the initramfs script:

In your screen shot, I don't see the string "Initializing root file system..." so the script hasn't gotten to that point yet, which leaves these two lines as suspects:
mount -t tmpfs -o size=1536m tmpfs /mnt/root || rescue_shell
mount -o ro $(findfs LABEL=esos_root) /mnt/tmp || rescue_shell

In either case, in the initramfs image, there is no "/etc/fstab" file. I also just booted the latest official build of ESOS (r719 from download.esos-project.com/packages/trunk/) and do not receive that error. I would guess this line is the culprit: mount -o ro $(findfs LABEL=esos_root) /mnt/tmp || rescue_shell

I bet the 'findfs' tool is failing which causes "$(findfs LABEL=esos_root)" to equate to "" which means that line really is evaluated like this: mount -o ro /mnt/tmp || rescue_shell

And not specifying the device in the mount command would make mount look at the '/etc/fstab' file, and since that file doesn't exist the mount command fails. So the real issue here is why can't findfs resolve "LABEL=esos_root".


Are you building ESOS yourself? If not, I'd bet something is wrong with your USB flash drive. Try booting the same drive on another computer (desktop PC, laptop, whatever) and see what happens. Try a new / different brand or model USB flash drive and see what happens.


To help make this situation clearer in the future, I'm going to update the initramfs script so it runs findfs on the lines before to make sure it can resolve that file system label before proceeding. This will force it into a shell with an error message before attempting to mount.


--Marc

--
You received this message because you are subscribed to the Google Groups "esos-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to esos-users+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Curtis Grice

unread,
Nov 10, 2014, 12:10:34 PM11/10/14
to esos-...@googlegroups.com
Your not the only one. I had the same issue with a PowerEdge 2950. When you get dropped into bash try "dmesg | grep sd" and check what the boot drive shows up as. I'll bet its not "sda".

^Just an educated guess^

I ended up rebuilding from scratch as this is a test box for me.
This ended up happening after using a different flash drive with esos installed.

Edit: I see your init searches for the disk label... hmm I don't know why it would get wonky.

Marc Smith

unread,
Nov 10, 2014, 12:15:45 PM11/10/14
to esos-...@googlegroups.com
Instead of checking the kernel logs (dmesg) for the boot device node, look for issues... perhaps drive access issues/errors.


--Marc

--

Curtis Grice

unread,
Nov 10, 2014, 2:40:52 PM11/10/14
to esos-...@googlegroups.com
The issue is back after a reboot. The USB drive checks out on my Ubuntu box and I have tried 2 drives. When it drops into a shell I have no scsi devices listed in /dev. (I have my array disconnected) Also I am running a downloaded copy, not a personal build.

I cannot find any logs to dig through, however if I can provide any information I would be happy to help. 


On Monday, November 10, 2014 9:59:18 AM UTC-6, ch...@friedmann.com wrote:

Marc Smith

unread,
Nov 10, 2014, 2:58:10 PM11/10/14
to esos-...@googlegroups.com
Hmm... did this start happening with recent revisions? Like in the last month or so?

Your description jogged my memory a bit... the only thing I can think of that could possibly be the culprit is there used to be a ten-second sleep in the initramfs script. It was only put there originally for aesthetics, to prevent printed messages from being clobbered by kernel messages, but perhaps it served another purpose: Letting queued kernel events settle. With udev there is 'udevsettle' which blocks until all queued events are done, which is how they do it with udev. We may have been inadvertently doing it with the sleep that was there previously.

As I type this, I believe this would likely be the culprit. The 'mdev -s' is called early on in the initramfs script, but if not all events have completed, the device nodes won't be created, which would cause 'findfs' to not return anything (exit 1). Yes, that's gotta be it.

I'll get something committed tonight... either adding the sleep back in which is pretty lame, or come up with a way to block until all events have settled.

Thanks!


--Marc

--

ch...@friedmann.com

unread,
Nov 10, 2014, 3:41:47 PM11/10/14
to esos-...@googlegroups.com
I believe you are on the right track. I was under the impression that things were happening faster than normal. I was previously running 673 (if I recall correctly). I wanted to move to 713 in order to try out FCOE with my Qlogic QLE8262's.
I can concur that the /dev is not being populated with the usb devices but I could run mdev -s manually and it would create the devices

Instead of 10 seconds, can we try with 5?....

Marc Smith

unread,
Nov 10, 2014, 10:15:22 PM11/10/14
to esos-...@googlegroups.com
I looked a bit and didn't see a more elegant solution, and after looking at the udevadm 'settle' command, I'm not sure that would do what we need anyhow... something to test when we move to udev in ESOS.

There is another setting that may be playing a role in all of this: delay_use

I tested this by setting "usb_storage.delay_use=60" in GRUB at boot time (hit 'E' to temporarily edit the GRUB boot options when you see the GRUB screen). This causes USB mass storage devices to be delayed (by 60 seconds in this example). I was able to reproduce your situation when I set this. The default is '1' for this configuration option... perhaps '1' is too much for your hardware combination, and that delay of 1 second is causing it to miss the the 'mdev -s'. If you have a minute, try setting that option to '0' and see what happens.

Either way, I'm adding a sleep back into the initramfs script... 5 seconds this time. I'll test this in the morning and then get it committed.


--Marc

Marc Smith

unread,
Nov 11, 2014, 5:05:56 PM11/11/14
to esos-...@googlegroups.com
This change has been committed (r721) and will post to the downloads area this evening!


--Marc

ch...@friedmann.com

unread,
Nov 11, 2014, 10:31:08 PM11/11/14
to esos-...@googlegroups.com
I just installed 721 and so far everything is looking good. I have rebooted both nodes a couple of times with no errors.

Thank you  

Curtis Grice

unread,
Nov 12, 2014, 7:11:45 PM11/12/14
to esos-...@googlegroups.com
I am still having issues. Ill try checking out the SVN and tweaking the init.


On Monday, November 10, 2014 9:59:18 AM UTC-6, ch...@friedmann.com wrote:
Reply all
Reply to author
Forward
0 new messages