Should root device be listed in /etc/fstab?

Don Cross

unread,

Jul 29, 2018, 12:29:08 PM7/29/18

to men...@lists.mender.io

I'm creating a non-Yocto integration of Mender and U-Boot for a Raspberry Pi project. I noticed I had an entry for the device from which the root filesystem is mounted in my /etc/fstab file:

/dev/mmcblk0p3 / ext4 defaults,noatime 0 2

Of course, this is wrong half of the times Mender updates the system, because my root partition alternates between /dev/mmcblk0p3 and /dev/mmcblk0p4. I discovered this because the way I was creating the root filesystem image (inside the Mender artifact) caused the filesystem scan that occurs at boot time to think there was corruption.

When I booted from partition 4, I would see mention of filesystem errors being corrected, but then the boot proceeded normally. However, on the next Mender update, booting from partition 3 caused the following to be printed on the screen:

Checking file systems...
WARNING:
File system errors were found and have been corrected,
but the nature of the errors require this system to be rebooted.
After you press enter, this system will be rebooted.
Press Enter to continue...

Needless to say, this is a very unhelpful behavior on a system that must work correctly without a keyboard or monitor attached! The system is broken until someone comes along and manually cycles the power. I repeated several times and it never happens booting from partition 4, and it always happens booting from partition 3.

I did fix the thing that was causing the alleged corruption. (For those who are curious, doing "mkfs.ext4 -d <dir> ..." to copy files from <dir> into the newly created filesystem was the culprit. I had to add a separate step where I loop-mount the filesystem and copy the files using "cp -a ...". I still don't understand why the filesystem check complained in the former procedure. Running "e2fsck -nf" on the raw image shows no problems.)

I tried deleting the line from /etc/fstab that mentions the "/" mount point. All the problems went away, even when I re-introduce the corruption on purpose. The system boots up and works perfectly on every Mender upgrade. I am happy about that part.

But now I'm concerned that real corruption on the root partition will be ignored and cause problems. For example, suppose something is writing to the SD card and the system loses power in the middle. Seems like it would be better to scan and fix the root filesystem with each boot. But it must never ask someone to press Enter!

Therefore, what is the correct contents of /etc/fstab on a Mender system? Should the root device be excluded as I have done? If it should be present, how do you prevent that "Press Enter" prompt, and how do you handle the toggling partition on each Mender update?

Thanks,

Don

Drew Moseley

unread,

Jul 29, 2018, 12:36:16 PM7/29/18

to men...@lists.mender.io

--
You received this message because you are subscribed to the Google Groups "Mender List mender.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mender+un...@lists.mender.io.
To post to this group, send email to men...@lists.mender.io.
Visit this group at https://groups.google.com/a/lists.mender.io/group/mender/.

Hi Don,

We've been experimenting with that lately and it seems that if you change fstab to reference /dev/root instead it all "just works". The assumption is that is kernel magic to simply refer to whatever rootfs is specified on the kernel command line.

Drew

--
Drew Moseley | Technical Solutions Architect | (+1) 480-797-0552 | https://mender.io

Northern.tech AS | @northerntechHQ | @drewmoseley

Vladimir Bashkirtsev

unread,

Jul 29, 2018, 12:37:42 PM7/29/18

to men...@lists.mender.io

Just use:

/dev/root / ext4 defaults 1 1

intsead

Notably no noatime option is needed as your root partition should be normally read-only.

Vladimir Bashkirtsev

unread,

Jul 29, 2018, 12:40:38 PM7/29/18

to men...@lists.mender.io

It is worth to note that it is impossible to do

mount /dev/root /

Linux kernel does not actually make /dev/root to point to actual block device acting as a rootfs. It is merely a placeholder for fstab and mtab.

Vladimir Bashkirtsev

unread,

Jul 29, 2018, 12:49:50 PM7/29/18

to men...@lists.mender.io

Speaking about your concerns re power failure mid-way writing SD: don't worry about it as mender writes incoming update to inactive partition, then reads it back to validate, and only then reboots to use a new update. So power failure in the middle of update will only appear as failed update in mender UI. Then you just retry it.

In regards to corruption you've got: have seen in before. It is caused by cache in linux system - not everything gets written before you start to use it. We ended up mounting our images through guestmount because it can work directly on whole SD card image. Or you may use loop device in case of single partition.

On Mon, 30 Jul 2018, 00:29 Don Cross <cosin...@gmail.com> wrote:

Don Cross

unread,

Jul 29, 2018, 3:51:33 PM7/29/18

to men...@lists.mender.io

On Sun, Jul 29, 2018 at 12:36 PM Drew Moseley <drew.m...@northern.tech> wrote:

Hi Don,

We've been experimenting with that lately and it seems that if you change fstab to reference /dev/root instead it all "just works". The assumption is that is kernel magic to simply refer to whatever rootfs is specified on the kernel command line.

Drew

Hi Drew,

I tried putting /dev/root as the device name for "/" in /etc/fstab but it doesn't work. I get the following at boot:

Checking file systems...fsck.ext4: No such file or directory while trying to open /dev/root
FAILURE
File system errors were encountered that could not be fixed automatically.
This system cannot continue to boot and will therefore be halted
until those errors are fixed manually by a System Administrator.
After you press Enter, this system will be halted and powered off.

Even if the user power cycles (already a usability failure), the next boot takes you right back to this problem. So this effectively bricks the system from the user's point of view.

I'm trying to figure out how this works on your system. Maybe you have disabled automatic filesystem checks at boot? I would like to leave them enabled.

I'm curious... even if using a fake device name like /dev/root did work, why would it be better than not having any entry at all for "/" in /etc/fstab? My system seems to be working fine without it. I tell the kernel what device to mount / on from using U-Boot environment variables (essentially, "root=/dev/mmcblk0p${mender_boot_part}"), i.e., on the kernel command line like you mention. That seems to be all it needs to mount the / directory. I can't see any reason it needs to appear at all in /etc/fstab. Am I missing something?

Thanks,

Don

Don Cross

unread,

Jul 29, 2018, 4:09:09 PM7/29/18

to men...@lists.mender.io

Hi Vladimir,

Thanks for taking the time to write.

On Sun, Jul 29, 2018 at 12:49 PM Vladimir Bashkirtsev <vbashk...@gmail.com> wrote:

Speaking about your concerns re power failure mid-way writing SD: don't worry about it as mender writes incoming update to inactive partition, then reads it back to validate, and only then reboots to use a new update. So power failure in the middle of update will only appear as failed update in mender UI. Then you just retry it.

Just for clarification, I wasn't concerned about Mender updates to the inactive partition. I was thinking about any partial writes to the active partition getting interrupted by a power failure. It would be good to have some kind of automatic correction of the filesystem when the system again has power.

In regards to corruption you've got: have seen in before. It is caused by cache in linux system - not everything gets written before you start to use it. We ended up mounting our images through guestmount because it can work directly on whole SD card image. Or you may use loop device in case of single partition.

I could understand if it was because I was writing to the SD card, but I'm not. I am using a loop device to create just the one partition, used for making the .mender file for upgrade. Here is an outline of how I create the rootfs partition:

Use dd to copy /dev/zero to a file until it has the desired size of the rootfs partition (3495 MB).

Use mkfs.ext4 to format the file with the correct filesystem, with -d option to copy from my template filesystem.

I disabled mkfs.ext4 lazy-write options like this: -E lazy_itable_init=0,lazy_journal_init=0

All of this is happening on my host machine's hard drive, not the SD card. So it doesn't matter whether the data has been flushed to disk. The file is input for the mender-artifact utility. Then I upload the resulting Mender artifact file to Hosted Mender and it goes through the normal upgrade process.

The above procedure causes the Raspbian mkfs.ext4 to complain about corruption. It "repairs" the corruption and forces reboot, which looks like an upgrade failure to Mender/U-Boot (bootcount > bootlimit), unless I leave out the "/" line from /etc/fstab. Leaving out the "/" line causes everything to work fine as far as I can tell.

If I use a separate "cp -a" step to copy the files instead of "-d" option in mkfs.ext4, there is never any "corruption" reported. My best guess is the corruption is a false positive due to some incompatibility between the version of mkfs.ext4 on my host machine and the filesystem check on Raspbian.

The only thing I don't like about leaving out "/" is there is no filesystem check at boot. But I have to omit the line because of the silly "press Enter" behavior. I know this complicated/confusing... I've been immersed in it for the last 4 days!

Don

Don Cross

unread,

Jul 29, 2018, 4:14:47 PM7/29/18

to men...@lists.mender.io

On Sun, Jul 29, 2018 at 12:37 PM Vladimir Bashkirtsev <vbashk...@gmail.com> wrote:

Notably no noatime option is needed as your root partition should be normally read-only.

Hi again Vladimir,

Now this part is interesting. I know Mender and my own software are supposed to write persistent stuff to /data, not /. But is it really possible to mount / as read-only? That would indeed address most of my concerns about filesystem corruption, so long as it doesn't break something else.

Don

Vladimir Bashkirtsev

unread,

Jul 29, 2018, 5:03:57 PM7/29/18

to men...@lists.mender.io

I did not realize that you trying to write to active partition in the field. Yes, A/B partitions should be used read only. To mount it read only you need to pass "ro" in kernel parameters, use /dev/root as a stub (as explained above), do not perform any fs checks on A/B partitions (waste of time on slow embedded device). If some of your software requires say file in /etc/conf_file then make it a symlink in your image pointing to /data partition.

--

Vladimir Bashkirtsev

unread,

Jul 29, 2018, 5:16:17 PM7/29/18

to men...@lists.mender.io

I do understand that you use mkfs.ext4 with -d option. But that's exactly what I found to fail because of caching issues. And I do not speak about SD - just plain HDD. mkfs.ext4 uses lower level calls to populate file system and they are not flushed immediately (and in fact may be delayed considerably). If you will try to use a file which has ext4 created this way in a few seconds after (in my tests even after 30 seconds) by means of fopen then it is likely not to have all writes completed - hence corrupted image. When I saw it for the first time I was stunned but looking through e2fsprogs code makes things fairly clear: they bypass many system layers to make creation of file system efficient. Loop mount is much better way as umount will definitely flush every outstanding write.

--

Vladimir Bashkirtsev

unread,

Jul 29, 2018, 5:21:59 PM7/29/18

to men...@lists.mender.io

And BTW: as you use Raspberry don't forget to arm the watchdog in uboot. It will reset your board if a new artifact does not start in 15 seconds and will roll back to previous image automatically. My images boot on Raspberry Pi 3 model B in around three seconds: plenty of time left to get watchdog kicked by systemd.

Don Cross

unread,

Jul 29, 2018, 5:56:32 PM7/29/18

to men...@lists.mender.io

On Sun, Jul 29, 2018 at 5:21 PM Vladimir Bashkirtsev <vbashk...@gmail.com> wrote:

And BTW: as you use Raspberry don't forget to arm the watchdog in uboot. It will reset your board if a new artifact does not start in 15 seconds and will roll back to previous image automatically. My images boot on Raspberry Pi 3 model B in around three seconds: plenty of time left to get watchdog kicked by systemd.

Thanks, Vladimir! Your comments are very helpful.
I am interested in using the watchdog. I did not know about that. However, my implementation uses sysvinit, not systemd. Is there an existing watchdog feature for that case? Is this part of U-Boot? If I have to, I would rather write my own watchdog than switch to systemd, at this point in the project.

On Mon, 30 Jul 2018, 05:16 Vladimir Bashkirtsev <vbashk...@gmail.com> wrote:
I do understand that you use mkfs.ext4 with -d option. But that's exactly what I found to fail because of caching issues. And I do not speak about SD - just plain HDD. mkfs.ext4 uses lower level calls to populate file system and they are not flushed immediately (and in fact may be delayed considerably). If you will try to use a file which has ext4 created this way in a few seconds after (in my tests even after 30 seconds) by means of fopen then it is likely not to have all writes completed - hence corrupted image. When I saw it for the first time I was stunned but looking through e2fsprogs code makes things fairly clear: they bypass many system layers to make creation of file system efficient. Loop mount is much better way as umount will definitely flush every outstanding write.

Oh, wow, I did not know about that either. So maybe what I saw was genuine corruption after all.

Don

Vladimir Bashkirtsev

unread,

Jul 29, 2018, 9:09:38 PM7/29/18

to men...@lists.mender.io

To get a watchdog on Raspberry going you need three things:

1. Enable Broadcom watchdog in u-boot configuration

2. Enable Broadcom watchdog in the kernel you run. Plain vanilla Raspbian kernel already has it.

3. A userland daemon (watchdog - what an unexpected choice of name!) which will be poking watchdog device once in no less 15 seconds (recommended a half of watchdog maximum - 8 seconds for Raspberry). All that daemon does is writing a string "15" (number of seconds till reset) to /dev/watchdog every 8 seconds - so you can implement it yourself by calling

echo 15 > /dev/watchdog

sleep 8

in a loop.

What it gives:

1. U-boot starts watchdog just before handing over control to the kernel.

2. If userland daemon will not start kicking /dev/watchdog in 15 seconds board will be reset hard way. Test it by disabling watchdog daemon.

3. If your system became unusable (test it with fork bomb) it will get a reset in 15 seconds and your system will get back to known working state (provided your image is unchanged and good).

--

Don Cross

unread,

Jul 31, 2018, 4:04:20 PM7/31/18

to men...@lists.mender.io

Hi again Vladimir,

On Sun, Jul 29, 2018 at 9:09 PM Vladimir Bashkirtsev <vbashk...@gmail.com> wrote:

To get a watchdog on Raspberry going you need three things:
1. Enable Broadcom watchdog in u-boot configuration

This is the part I'm having a hard time figuring out how to do. I guessed I was supposed to add something to my uboot/configs/rpi3_32b_defconfig, but I can't figure out what. I guessed this, but it doesn't seem to do anything:

CONFIG_HW_WATCHDOG=y

2. Enable Broadcom watchdog in the kernel you run. Plain vanilla Raspbian kernel already has it.

Yes, I confirmed this is already working. In my existing configuration, the system has created

/dev/watchdog

/dev/watchdog0

3. A userland daemon (watchdog - what an unexpected choice of name!) which will be poking watchdog device once in no less 15 seconds (recommended a half of watchdog maximum - 8 seconds for Raspberry). All that daemon does is writing a string "15" (number of seconds till reset) to /dev/watchdog every 8 seconds - so you can implement it yourself by calling

echo 15 > /dev/watchdog
sleep 8

in a loop.

Some helpful notes for anyone else studying this for the first time:

You can do this to disable the watchdog once enabled:

echo "V" > /dev/watchdog

Any other character written right before closing the file will cause countdown to start again. I don't think it matters what else you write. For example, even "" will cause the same behavior as "15". I tried "60" for example, but the timeout was still 15 seconds before rebooting. Here is a sample program in which it keeps /dev/watchdog open continuously, writing a '\0' every 10 seconds:

https://github.com/torvalds/linux/blob/master/samples/watchdog/watchdog-simple.c

- Don

Vladimir Bashkirtsev

unread,

Aug 1, 2018, 1:50:01 AM8/1/18

to men...@lists.mender.io

Correct option to turn on watchdog for Raspberry is CONFIG_BCM2835_WDT (both in u-boot and kernel)

Watchdog counter in Raspberry is 4 bit counter - hence 15 seconds is a maximum. When you trying to set it to 60 it uses maximum possible - 15 seconds. Anything less should use shorter periods.

--

Don Cross

unread,

Aug 1, 2018, 1:34:47 PM8/1/18

to men...@lists.mender.io

On Wed, Aug 1, 2018 at 1:50 AM Vladimir Bashkirtsev <vbashk...@gmail.com> wrote:

Correct option to turn on watchdog for Raspberry is CONFIG_BCM2835_WDT (both in u-boot and kernel)

Hi everyone,

Note I am targeting a Raspberry Pi 3 Model B+, but using 32-bit kernel and code. I believe the Pi 3B+ is based on Broadcom 2837 chipset, not 2835. Regardless, I added the following to the file configs/rpi_3_32b_defconfig:

CONFIG_BCM2835_WDT=y

The resulting u-boot.bin executable got a little bigger, so something changed in the build. But I can't tell any difference in behavior. I disabled my userland program for pinging the watchdog periodically. I expected when I booted up that the system would soon reboot itself. But that did not happen. It just sits there. However, if I enter the following command from the root user, I do see a reboot 15 seconds later:

echo "" > /dev/watchdog

So I know I am getting really close.

Is what I did with my defconfig file above correct? And is there something else I need to do in order to get an automatic reboot if the boot process fails to reach my userland watchdog pinger in time? I can't tell if U-Boot isn't enabling the watchdog countdown, or if the kernel is coming along and canceling it when it creates the /dev/watchdog device, or if some other unwanted thing is happening.

Thanks,

Don

Vladimir Bashkirtsev

unread,

Aug 1, 2018, 2:17:47 PM8/1/18

to men...@lists.mender.io

Kernel stops watchdog as soon as it loads the watchdog driver. As it is now your u-boot watchdog will kick in if kernel fail to boot. To test replace your kernel file with non-executable rubbish. If kernel starts then it considers that's all is good and cancels watchdog: further protection should be done by your userland daemon.

As you can see there still a gap between kernel taking over and watchdog daemon starting. In order to close this gap you need to tell your kernel driver not to stop watchdog at all. How to do it depends on your kernel:

1. If you compile kernel yourself you need to enable CONFIG_WATCHDOG_NOWAYOUT

2. If your kernel compiled by someone else and watchdog is compiled as module you need to pass nowayout=1 as module parameter.

3. If watchdog driver is compiled into your kernel then you need to pass watchdog.nowayout=1 on kernel command line. Note that you need to put correct module name before .nowayout=1 - I don't know how this module called for Raspberry because I use option 1 and my kernel compiled with nowayout=1. I guess you can look in the driver source itself.

--

Vladimir Bashkirtsev

unread,

Aug 1, 2018, 2:20:03 PM8/1/18

to men...@lists.mender.io

In regards to chipset numbering: watchdog is the same across all Rasperries and 2835 is mother of them all. So for Rasperry Pi 3 Model B it is still BCM2835.

--

Vladimir Bashkirtsev

unread,

Aug 1, 2018, 2:30:44 PM8/1/18

to men...@lists.mender.io

Or even better way to resolve your issue if you can rebuild your kernel:

config WATCHDOG_HANDLE_BOOT_ENABLED
	bool "Update boot-enabled watchdog until userspace takes over"
	default y
	help
	  The default watchdog behaviour (which you get if you say Y here) is
	  to ping watchdog devices that were enabled before the driver has
	  been loaded until control is taken over from userspace using the
	  /dev/watchdog file. If you say N here, the kernel will not update
	  the watchdog on its own. Thus if your userspace does not start fast
	  enough your device will reboot.

On Thu, 2 Aug 2018, 01:34 Don Cross <cosin...@gmail.com> wrote:

--

Don Cross

unread,

Aug 1, 2018, 10:14:41 PM8/1/18

to men...@lists.mender.io

On Wed, Aug 1, 2018 at 2:30 PM Vladimir Bashkirtsev <vbashk...@gmail.com> wrote:

Or even better way to resolve your issue if you can rebuild your kernel:
config WATCHDOG_HANDLE_BOOT_ENABLED
	bool "Update boot-enabled watchdog until userspace takes over"
	default y
	help
	  The default watchdog behaviour (which you get if you say Y here) is
	  to ping watchdog devices that were enabled before the driver has
	  been loaded until control is taken over from userspace using the
	  /dev/watchdog file. If you say N here, the kernel will not update
	  the watchdog on its own. Thus if your userspace does not start fast
	  enough your device will reboot.
On Thu, 2 Aug 2018, 01:34 Don Cross <cosin...@gmail.com> wrote:
On Wed, Aug 1, 2018 at 1:50 AM Vladimir Bashkirtsev <vbashk...@gmail.com> wrote:
Correct option to turn on watchdog for Raspberry is CONFIG_BCM2835_WDT (both in u-boot and kernel)

Thank you very much for your help, Vladimir!

I got everything working the way I want. I did end up building my own version of the Raspbian kernel instead of using the pre-built one. It was easier than I thought, following instructions at:

https://www.raspberrypi.org/documentation/linux/kernel/building.md

To prevent the kernel from disarming the watchdog and to make it impossible to disarm once armed, I patched the .config file using the following commands between 'make bcm2709_defconfig' and compiling the kernel:

sed -i 's/# CONFIG_WATCHDOG_NOWAYOUT is not set/CONFIG_WATCHDOG_NOWAYOUT=y/g' .config

sed -i 's/CONFIG_WATCHDOG_HANDLE_BOOT_ENABLED=y/# CONFIG_WATCHDOG_HANDLE_BOOT_ENABLED is not set/g' .config

Now if I don't run my own watchdog pinger program, the Pi keeps rebooting itself as expected.

Don

Vladimir Bashkirtsev

unread,

Aug 2, 2018, 1:23:09 AM8/2/18

to men...@lists.mender.io

You're welcome! But notably CONFIG_WATCHDOG_HANDLE_BOOT_ENABLED should be set to 'is not set' to prevent kernel from fiddling with watchdog at all. Apparently bcm2709_defconfig already has it switched off. So all you need is CONFIG_WATCHDOG_NOWAYOUT.

Kernel should kick the watchdog only if there some hardware devices which take too long to initialize and generally it is not the case on embedded devices.

--

Reply all

Reply to author

Forward