Hi Christian,
On 09.07.24 17:12, Christian Eggers wrote:
> During investigation of random update failures on my embedded
> device (bare nand with UBI), I found out that there is a race condition
> with (e)udevd (presumably also systemd), which prevents swupdate's
> ubivol handler from successfully updating an UBI volume.
>
> I get the following error messages:
> swupdate: RUN [install_single_image] : Found installer for stream
> ubipartition
> swupdate: RUN [install_single_image] : Found installer for stream
> lios-fitimage-ubifs-secureboot-orbiter-secureboot.fitimg ubivol
^-- I know this device..
> swupdate: RUN [resize_volume] : Removed UBI Volume kernel0
> swupdate: RUN [resize_volume] : Created dynamic UBI volume kernel0 of
> 2737669 bytes (old size 2793472)
> swupdate: FAILURE ERROR ubivol_handler.c : update_volume : 230 : cannot
> start volume "/dev/ubi0_2" update
> swupdate: RUN [install_single_image] : Installer for ubivol not successful !
>
> I use the "auto-resize" property for adjusting the size of my kernel/rootfs
> UBI volumes. Particularly for production images (read-only, no spare blocks)
> there is a high probability that the number of required UBI logical
> erase blocks
> changes between different update images. In this case the ubivol handler
> resizes the volume by first deleting and then recreating it. After that,
> the /UBI volume update/ procedure is used for writing the image data
> to the volume (as it would also be done if no resizing were necessary).
>
> The problem here is, that ubi_update_start() / UBI_IOCVOLUP tries
> to gain exclusive access to the UBI volume, which is only possible if
> swupdate
> is the only process which has currently a file descriptor open on this
> volume.
This is a generic issue and happens each time there is a consumer of a
device that must be updated. For example, we cannot update a filesystem
we have already mounted without "freezing", because any further access
to the filesystem will probably produce I/O errors if we update the
underlying block device. Similar problem, to be closer to you, was with
the Alexa 35/65, where a lot of firmware should be updated (multiple
microcontrollers) and it conflicts if the application is running. The
"progress" interface is also thought for this, and an application can
listen and detect if an update was triggered, and stop to use devices
(storage or not storage) until the update has finished. SWUpdate mostly
requires exclusive use for what should be updated.
>
> But the creation of the "new"/"resized" UBI volume is also recognized by
> (e)udevd,
> which in turn runs its internal 'blkid' on the new UBI volume. Since
> systemd pull
> request 13360 <
https://github.com/systemd/systemd/pull/13360>, the rule
> file rules/60-persistent-storage.rules
> <
https://github.com/systemd/systemd/pull/13360/files#diff-d4543987bc7831db6120f9f6f16ba21f6f0cb133d408449426343eac2d6954c1> isn't skipped for UBI devices
> anymore. This change was taken by eudev 3.2.12 (PR 220
> <
https://github.com/eudev-project/eudev/pull/220>) which in turn went
> into OpenEmbedded in Nanbield (or Scarthgap for LTS users like me).
Same issue happens with "udisk", that tries to mount any found device,
that is while SWUpdate is updating a stand-by partition or volume,
udisks2 takes over and mount it. And access can be dangerous.
But I will say this is first an issue of the integration: if (I presume
you are in this case) the update follow the dual-copy principle, it is
must be ensured that everything defined as "stand-by" is not touched by
the running software, including udev rules. This means of course more
work, and .bbappend to replace some rules should be added.
>
> Currently I am unsure what the best solution would be. At first I'll try
> to disallow
> udevd to run its 'blkid' on UBI volumes (why is this run on non block
> devices at all?)
> Unfortunately there is no way for requesting exclusive access to UBI
> volumes in
> a "blocking" way, so it simply fails only because udevd is accessing it
> simultaneously
> (for only a few milliseconds).
But you could disable at all, right ?
> Maybe it would also make sense if the
> ubivol handler
> tries again a few times if ubi_update_start() returns EBUSY.
We could do, but it is plenty of cases like this. The example with
udisk2 belongs to the category. It is more system integration than an
issue in SWUpdate.
However, I do not see any drawbacks to run ubi_update_start() in a
defined loop (I will say, with pause of one second, just a few times) if
the returned error is EBUSY. So yes, I do not think it is a general
solution, but it does not hurt....
> This would
> make swupdate
> more robust if such a problem would be reintroduced in udevd/systemd in
> future or
> somebody uses an affected version.
Well, it is very difficult to foresee what systemd will do in the future...
Best regards,
Stefano