ubivol: race condition between volume creation/resizing and udevd

48 views
Skip to first unread message

Christian Eggers

unread,
Jul 9, 2024, 11:31:13 AM7/9/24
to swupdate
During investigation of random update failures on my embedded
device (bare nand with UBI), I found out that there is a race condition
with (e)udevd (presumably also systemd), which prevents swupdate's
ubivol handler from successfully updating an UBI volume.

I get the following error messages:
swupdate: RUN [install_single_image] : Found installer for stream  ubipartition
swupdate: RUN [install_single_image] : Found installer for stream lios-fitimage-ubifs-secureboot-orbiter-secureboot.fitimg ubivol
swupdate: RUN [resize_volume] : Removed UBI Volume kernel0
swupdate: RUN [resize_volume] : Created dynamic UBI volume kernel0 of 2737669 bytes (old size 2793472)
swupdate: FAILURE ERROR ubivol_handler.c : update_volume : 230 : cannot start volume "/dev/ubi0_2" update
swupdate: RUN [install_single_image] : Installer for ubivol not successful !


I use the "auto-resize" property for adjusting the size of my kernel/rootfs
UBI volumes. Particularly for production images (read-only, no spare blocks)
there is a high probability that the number of required UBI logical erase blocks
changes between different update images. In this case the ubivol handler
resizes the volume by first deleting and then recreating it. After that,
the UBI volume update procedure is used for writing the image data
to the volume (as it would also be done if no resizing were necessary).

The problem here is, that ubi_update_start() / UBI_IOCVOLUP tries
to gain exclusive access to the UBI volume, which is only possible if swupdate
is the only process which has currently a file descriptor open on this volume.

But the creation of the "new"/"resized" UBI volume is also recognized by (e)udevd,
which in turn runs its internal 'blkid' on the new UBI volume. Since systemd pull
request 13360, the rule file rules/60-persistent-storage.rules isn't skipped for UBI devices
anymore. This change was taken by eudev 3.2.12 (PR 220) which in turn went
into OpenEmbedded in Nanbield (or Scarthgap for LTS users like me).

Currently I am unsure what the best solution would be. At first I'll try to disallow
udevd to run its 'blkid' on UBI volumes (why is this run on non block devices at all?)
Unfortunately there is no way for requesting exclusive access to UBI volumes in
a "blocking" way, so it simply fails only because udevd is accessing it simultaneously
(for only a few milliseconds). Maybe it would also make sense if the ubivol handler
tries again a few times if ubi_update_start() returns EBUSY. This would make swupdate
more robust if such a problem would be reintroduced in udevd/systemd in future or
somebody uses an affected version.

regards,
Christian

Stefano Babic

unread,
Jul 9, 2024, 11:59:28 AM7/9/24
to Christian Eggers, swupdate
Hi Christian,

On 09.07.24 17:12, Christian Eggers wrote:
> During investigation of random update failures on my embedded
> device (bare nand with UBI), I found out that there is a race condition
> with (e)udevd (presumably also systemd), which prevents swupdate's
> ubivol handler from successfully updating an UBI volume.
>
> I get the following error messages:
> swupdate: RUN [install_single_image] : Found installer for stream
>  ubipartition
> swupdate: RUN [install_single_image] : Found installer for stream
> lios-fitimage-ubifs-secureboot-orbiter-secureboot.fitimg ubivol
^-- I know this device..

> swupdate: RUN [resize_volume] : Removed UBI Volume kernel0
> swupdate: RUN [resize_volume] : Created dynamic UBI volume kernel0 of
> 2737669 bytes (old size 2793472)
> swupdate: FAILURE ERROR ubivol_handler.c : update_volume : 230 : cannot
> start volume "/dev/ubi0_2" update
> swupdate: RUN [install_single_image] : Installer for ubivol not successful !
>
> I use the "auto-resize" property for adjusting the size of my kernel/rootfs
> UBI volumes. Particularly for production images (read-only, no spare blocks)
> there is a high probability that the number of required UBI logical
> erase blocks
> changes between different update images. In this case the ubivol handler
> resizes the volume by first deleting and then recreating it. After that,
> the /UBI volume update/ procedure is used for writing the image data
> to the volume (as it would also be done if no resizing were necessary).
>
> The problem here is, that ubi_update_start() / UBI_IOCVOLUP tries
> to gain exclusive access to the UBI volume, which is only possible if
> swupdate
> is the only process which has currently a file descriptor open on this
> volume.

This is a generic issue and happens each time there is a consumer of a
device that must be updated. For example, we cannot update a filesystem
we have already mounted without "freezing", because any further access
to the filesystem will probably produce I/O errors if we update the
underlying block device. Similar problem, to be closer to you, was with
the Alexa 35/65, where a lot of firmware should be updated (multiple
microcontrollers) and it conflicts if the application is running. The
"progress" interface is also thought for this, and an application can
listen and detect if an update was triggered, and stop to use devices
(storage or not storage) until the update has finished. SWUpdate mostly
requires exclusive use for what should be updated.

>
> But the creation of the "new"/"resized" UBI volume is also recognized by
> (e)udevd,
> which in turn runs its internal 'blkid' on the new UBI volume. Since
> systemd pull
> request 13360 <https://github.com/systemd/systemd/pull/13360>, the rule
> file rules/60-persistent-storage.rules
> <https://github.com/systemd/systemd/pull/13360/files#diff-d4543987bc7831db6120f9f6f16ba21f6f0cb133d408449426343eac2d6954c1> isn't skipped for UBI devices
> anymore. This change was taken by eudev 3.2.12 (PR 220
> <https://github.com/eudev-project/eudev/pull/220>) which in turn went
> into OpenEmbedded in Nanbield (or Scarthgap for LTS users like me).

Same issue happens with "udisk", that tries to mount any found device,
that is while SWUpdate is updating a stand-by partition or volume,
udisks2 takes over and mount it. And access can be dangerous.

But I will say this is first an issue of the integration: if (I presume
you are in this case) the update follow the dual-copy principle, it is
must be ensured that everything defined as "stand-by" is not touched by
the running software, including udev rules. This means of course more
work, and .bbappend to replace some rules should be added.

>
> Currently I am unsure what the best solution would be. At first I'll try
> to disallow
> udevd to run its 'blkid' on UBI volumes (why is this run on non block
> devices at all?)
> Unfortunately there is no way for requesting exclusive access to UBI
> volumes in
> a "blocking" way, so it simply fails only because udevd is accessing it
> simultaneously
> (for only a few milliseconds).

But you could disable at all, right ?

> Maybe it would also make sense if the
> ubivol handler
> tries again a few times if ubi_update_start() returns EBUSY.

We could do, but it is plenty of cases like this. The example with
udisk2 belongs to the category. It is more system integration than an
issue in SWUpdate.

However, I do not see any drawbacks to run ubi_update_start() in a
defined loop (I will say, with pause of one second, just a few times) if
the returned error is EBUSY. So yes, I do not think it is a general
solution, but it does not hurt....


> This would
> make swupdate
> more robust if such a problem would be reintroduced in udevd/systemd in
> future or
> somebody uses an affected version.

Well, it is very difficult to foresee what systemd will do in the future...

Best regards,
Stefano

Christian Eggers

unread,
Jul 9, 2024, 12:25:53 PM7/9/24
to swupdate
Hi Stefano,

Stefano Babic schrieb am Dienstag, 9. Juli 2024 um 17:59:28 UTC+2:
Hi Christian,

On 09.07.24 17:12, Christian Eggers wrote:
> During investigation of random update failures on my embedded
> device (bare nand with UBI), I found out that there is a race condition
> with (e)udevd (presumably also systemd), which prevents swupdate's
> ubivol handler from successfully updating an UBI volume.
>
> I get the following error messages:
> swupdate: RUN [install_single_image] : Found installer for stream
>  ubipartition
> swupdate: RUN [install_single_image] : Found installer for stream
> lios-fitimage-ubifs-secureboot-orbiter-secureboot.fitimg ubivol
^-- I know this device..
seems like the world is a village. So don't hesitate to send questions
(if any).
From the perspective of our own embedded application this isn't
an issue (as we use A/B updates). To be honest, I already tried to get
rid of udevd, as our application doesn't need it (using devtmpfs/libudev is
sufficient for us, but NetworkManager really requires the udev daemon).
 
>
> But the creation of the "new"/"resized" UBI volume is also recognized by
> (e)udevd,
> which in turn runs its internal 'blkid' on the new UBI volume. Since
> systemd pull
> request 13360 <https://github.com/systemd/systemd/pull/13360>, the rule
> file rules/60-persistent-storage.rules
> <https://github.com/systemd/systemd/pull/13360/files#diff-d4543987bc7831db6120f9f6f16ba21f6f0cb133d408449426343eac2d6954c1> isn't skipped for UBI devices
> anymore. This change was taken by eudev 3.2.12 (PR 220
> <https://github.com/eudev-project/eudev/pull/220>) which in turn went
> into OpenEmbedded in Nanbield (or Scarthgap for LTS users like me).

Same issue happens with "udisk", that tries to mount any found device,
that is while SWUpdate is updating a stand-by partition or volume,
udisks2 takes over and mount it. And access can be dangerous.

But I will say this is first an issue of the integration: if (I presume
you are in this case) the update follow the dual-copy principle, it is
must be ensured that everything defined as "stand-by" is not touched by
the running software, including udev rules. This means of course more
work, and .bbappend to replace some rules should be added.

I fully agree! The "new" udevd problem hit me via a bug report after upgrading
to the latest Yocto.
 
>
> Currently I am unsure what the best solution would be. At first I'll try
> to disallow
> udevd to run its 'blkid' on UBI volumes (why is this run on non block
> devices at all?)
> Unfortunately there is no way for requesting exclusive access to UBI
> volumes in
> a "blocking" way, so it simply fails only because udevd is accessing it
> simultaneously
> (for only a few milliseconds).

But you could disable at all, right ?

I think so. I just tried to revert the offending commit in systemd at all. But
after reading the full history, it looks like it has been added with intent
(udevd's 'blkid' also support UBIFS).
 
> Maybe it would also make sense if the
> ubivol handler
> tries again a few times if ubi_update_start() returns EBUSY.

We could do, but it is plenty of cases like this. The example with
udisk2 belongs to the category. It is more system integration than an
issue in SWUpdate.

However, I do not see any drawbacks to run ubi_update_start() in a
defined loop (I will say, with pause of one second, just a few times) if
the returned error is EBUSY. So yes, I do not think it is a general
solution, but it does not hurt....

It should at least help with udevd/systemd, which opens the UBI volume
read-only for a quite short period of time. As it took me some time to
get around the actual root of the problem, implementing a workaround
in swupdate may help other users.

I'll try to prepare and send patches tomorrow. Actually I hit this problem
with v2023.12, it looks that there is another problem with the latest release:
All volumes are "resized" to 0 (which means they are deleted).
 
> This would
> make swupdate
> more robust if such a problem would be reintroduced in udevd/systemd in
> future or
> somebody uses an affected version.

Well, it is very difficult to foresee what systemd will do in the future...

Best regards,
Stefano


Best regards,
Christian

Stefano Babic

unread,
Jul 9, 2024, 3:24:04 PM7/9/24
to Christian Eggers, swupdate
Hi Christian,
> <https://github.com/systemd/systemd/pull/13360/files#diff-d4543987bc7831db6120f9f6f16ba21f6f0cb133d408449426343eac2d6954c1 <https://github.com/systemd/systemd/pull/13360/files#diff-d4543987bc7831db6120f9f6f16ba21f6f0cb133d408449426343eac2d6954c1>> isn't skipped for UBI devices
> intent <https://github.com/systemd/systemd/pull/20100>
> (udevd's 'blkid' also support UBIFS).
>
> > Maybe it would also make sense if the
> > ubivol handler
> > tries again a few times if ubi_update_start() returns EBUSY.
>
> We could do, but it is plenty of cases like this. The example with
> udisk2 belongs to the category. It is more system integration than an
> issue in SWUpdate.
>
> However, I do not see any drawbacks to run ubi_update_start() in a
> defined loop (I will say, with pause of one second, just a few
> times) if
> the returned error is EBUSY. So yes, I do not think it is a general
> solution, but it does not hurt....
>
>
> It should at least help with udevd/systemd, which opens the UBI volume
> read-only for a quite short period of time. As it took me some time to
> get around the actual root of the problem, implementing a workaround
> in swupdate may help other users.
>
> I'll try to prepare and send patches tomorrow.

It just some line of codes - I have prepared the patch, but I ask you to
test and add your Tested-by.

> Actually I hit this problem
> with v2023.12,
> it looks that there is another problem with the latest
> release:

Latest release means 2024.05 or still 2023.12 ?

Best regards,
Stefano


> All volumes are "resized" to 0 (which means they are deleted).
>
> > This would
> > make swupdate
> > more robust if such a problem would be reintroduced in
> udevd/systemd in
> > future or
> > somebody uses an affected version.
>
> Well, it is very difficult to foresee what systemd will do in the
> future...
>
> Best regards,
> Stefano
>
>
> Best regards,
> Christian
>
> --
> You received this message because you are subscribed to the Google
> Groups "swupdate" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to swupdate+u...@googlegroups.com
> <mailto:swupdate+u...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/swupdate/0ad55a08-f950-4bb7-8b3e-5f29b33aa7cfn%40googlegroups.com <https://groups.google.com/d/msgid/swupdate/0ad55a08-f950-4bb7-8b3e-5f29b33aa7cfn%40googlegroups.com?utm_medium=email&utm_source=footer>.
Reply all
Reply to author
Forward
0 new messages