SWupdate failure converting from UTF-8 to current locale

1,154 views
Skip to first unread message

Pierluigi Passaro

unread,
Oct 9, 2021, 7:02:01 PM10/9/21
to swup...@googlegroups.com
Hi All,
sorry for the long email, I'm trying to summarize one week of investigations.

With Yocto Hardknott we are experiencing problems decompressing tar.gz archive.
The rootfs include (2) files containing UTF8 chars, but the default "locale" settings are just empty.
This seems to create a problem with libarchive.
This behavior is known since 2015 and described here
This behavior is also present in other libarchive based projects and apparently can be solved by locally setting a valid UTF8 locale, like here
In my case, when I invoke the command
    swupdate -i myupdate.swu
I get the error
    SWUPDATE failed [0] ERROR handlers/archive_handler.c : extract : 110 : archive_read_next_header(): Pathname can't be converted from UTF-8 to current locale. 
If I set the locale from the command line
    export LC_ALL=en_US.utf8
and then I invoke the update, everything works great as with previous Yocto releases.

I'm not familiar with the whole architecture of SWupdate, but does make sense expect that the archive_handler.c code should take care of locally setting a valid UTF8 locale before starting the decompression ?
Alternatively, should the UTF8 locale be considered a system mandatory requirement to use SWupdate ?

I've also tested the patch suggested here
but this led to an apparently working condition that actually discard all files contining UTF8 chars.

Any hints on this condition?

Thanks
Best Regards
Pier

Otavio Salvador

unread,
Oct 9, 2021, 7:06:32 PM10/9/21
to Pierluigi Passaro, swup...@googlegroups.com
Hello,

Em sáb., 9 de out. de 2021 às 20:02, Pierluigi Passaro
<pierl...@variscite.com> escreveu:
> I get the error
> SWUPDATE failed [0] ERROR handlers/archive_handler.c : extract : 110 : archive_read_next_header(): Pathname can't be converted from UTF-8 to current locale.
> If I set the locale from the command line
> export LC_ALL=en_US.utf8
> and then I invoke the update, everything works great as with previous Yocto releases.

I am unsure it is exactly the same case but it seems like an issue we
fixed in our UpdateHub agent. We found this exactly issue with systems
without locale support and in our case we fixed our Rust library to
handle it[1].

1. https://github.com/OSSystems/compress-tools-rs/commit/f4f06685c702dae1233110c5da4659c73c133ab3

This might help to add a similar mechanism for SWUpdate. Hope it helps.

--
Otavio Salvador O.S. Systems
http://www.ossystems.com.br http://code.ossystems.com.br
Mobile: +55 (53) 9 9981-7854 Mobile: +1 (347) 903-9750

James Hilliard

unread,
Oct 10, 2021, 4:42:54 AM10/10/21
to Pierluigi Passaro, swup...@googlegroups.com
On Sat, Oct 9, 2021 at 5:02 PM Pierluigi Passaro
<pierl...@variscite.com> wrote:
>
> Hi All,
> sorry for the long email, I'm trying to summarize one week of investigations.
>
> With Yocto Hardknott we are experiencing problems decompressing tar.gz archive.
> The rootfs include (2) files containing UTF8 chars, but the default "locale" settings are just empty.
> This seems to create a problem with libarchive.
> This behavior is known since 2015 and described here
> https://github.com/libarchive/libarchive/issues/587
> This behavior is also present in other libarchive based projects and apparently can be solved by locally setting a valid UTF8 locale, like here
> https://github.com/libarchive/libarchive/issues/1535#issuecomment-846498686
> In my case, when I invoke the command
> swupdate -i myupdate.swu
> I get the error
> SWUPDATE failed [0] ERROR handlers/archive_handler.c : extract : 110 : archive_read_next_header(): Pathname can't be converted from UTF-8 to current locale.
> If I set the locale from the command line
> export LC_ALL=en_US.utf8

I'm using LC_CTYPE=en_US.UTF-8 which seems to work for me at least. I
recall hitting weird issues when trying to use LC_ALL.

> and then I invoke the update, everything works great as with previous Yocto releases.
>
> I'm not familiar with the whole architecture of SWupdate, but does make sense expect that the archive_handler.c code should take care of locally setting a valid UTF8 locale before starting the decompression ?
> Alternatively, should the UTF8 locale be considered a system mandatory requirement to use SWupdate ?

The locale situation in general is a bit of a mess, I think swupdate
handles it about as best as possible at the moment.

Some background on the libarchive locale situation:
https://github.com/sbabic/swupdate/commit/95a2b9961119aac80aea1eeabbc1cd52b72d876a
https://github.com/mpv-player/mpv/commit/1e70e82baa9193f6f027338b0fab0f5078971fbe

>
> I've also tested the patch suggested here
> https://github.com/libarchive/libarchive/pull/626/commits/f18bdc1f9af1168a9338dad6d879c094efc900fc
> but this led to an apparently working condition that actually discard all files contining UTF8 chars.
>
> Any hints on this condition?
>
> Thanks
> Best Regards
> Pier
>
> --
> You received this message because you are subscribed to the Google Groups "swupdate" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to swupdate+u...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/swupdate/AM0PR08MB4372F9F5C13B12123C0CA392FFB39%40AM0PR08MB4372.eurprd08.prod.outlook.com.

Pierluigi Passaro

unread,
Oct 10, 2021, 7:11:41 AM10/10/21
to swupdate
This is exactly the point...
According the official documentation, a call like the one in the archive_handler.c
    archive_locale = newlocale(LC_CTYPE_MASK, "", (locale_t)0);
should sets the locale to an "implementation-defined native environment", that in my understanding is the just default one, something like if the HAVE_LOCALE was disabled.
If I change the call to
    archive_locale = newlocale(LC_CTYPE_MASK, "en_US.utf8", (locale_t)0);
everything start working again.
I'm not expecting this can be a valid/acceptable patch, so this leads me to the original question.
Should the UTF8 locale be considered a system mandatory requirement to use SWupdate ?

James Hilliard

unread,
Oct 10, 2021, 1:45:24 PM10/10/21
to Pierluigi Passaro, swupdate
From my understanding that should pick up LC_CTYPE=en_US.UTF-8 if set,
otherwise I think it ends up using
the default "C" locale...which actually does support UTF-8 in musl
libc(since musl has a default "C.UTF-8" locale)
but not glibc(some distros patch glibc to also have a "C.UTF-8" however).

There is a patch available for review for upstream glibc which adds
"C.UTF-8" but it hasn't been merged yet:
https://sourceware.org/bugzilla/show_bug.cgi?id=17318#c26

> If I change the call to
> archive_locale = newlocale(LC_CTYPE_MASK, "en_US.utf8", (locale_t)0);
> everything start working again.

Does setting LC_CTYPE=en_US.utf8 in the env with the default
implementation not work?
I thought that was equivalent to doing this.

> I'm not expecting this can be a valid/acceptable patch, so this leads me to the original question.
> Should the UTF8 locale be considered a system mandatory requirement to use SWupdate ?
>
>> >
>> > I've also tested the patch suggested here
>> > https://github.com/libarchive/libarchive/pull/626/commits/f18bdc1f9af1168a9338dad6d879c094efc900fc
>> > but this led to an apparently working condition that actually discard all files contining UTF8 chars.
>> >
>> > Any hints on this condition?
>> >
>> > Thanks
>> > Best Regards
>> > Pier
>> >
>> > --
>> > You received this message because you are subscribed to the Google Groups "swupdate" group.
>> > To unsubscribe from this group and stop receiving emails from it, send an email to swupdate+u...@googlegroups.com.
>> > To view this discussion on the web visit https://groups.google.com/d/msgid/swupdate/AM0PR08MB4372F9F5C13B12123C0CA392FFB39%40AM0PR08MB4372.eurprd08.prod.outlook.com.
>
> --
> You received this message because you are subscribed to the Google Groups "swupdate" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to swupdate+u...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/swupdate/e2abdbbf-4261-4d3f-8325-99da1715cc32n%40googlegroups.com.

Pierluigi Passaro

unread,
Oct 10, 2021, 4:40:27 PM10/10/21
to swupdate
With Yocto Hardknott, with no toolchain/locale customization, glibc is enabled.
The default locale settings are
    root@imx8mp-var-dart:~# locale
    LANG=C
    LC_CTYPE="C"
    LC_NUMERIC="C"
    LC_TIME="C"
    LC_COLLATE="C"
    LC_MONETARY="C"
    LC_MESSAGES="C"
    LC_PAPER="C"
    LC_NAME="C"
    LC_ADDRESS="C"
    LC_TELEPHONE="C"
    LC_MEASUREMENT="C"
    LC_IDENTIFICATION="C"
    LC_ALL=
and the following locale alternatives are available
    root@imx8mp-var-dart:~# locale -a
    C
    POSIX
    en_GB
    en_GB.utf8
    en_US
    en_US.utf8
 
There is a patch available for review for upstream glibc which adds
"C.UTF-8" but it hasn't been merged yet:
https://sourceware.org/bugzilla/show_bug.cgi?id=17318#c26

This patch looks promising, but seems to be on hold since April 2020 :(

> If I change the call to
> archive_locale = newlocale(LC_CTYPE_MASK, "en_US.utf8", (locale_t)0);
> everything start working again.

Does setting LC_CTYPE=en_US.utf8 in the env with the default
implementation not work?
I thought that was equivalent to doing this.

If I manually set LC_CTYPE=en_US.utf8 from the command line and then run swupdated (with no archive patch), it works fine.
I'm wondering, from an architectural perspective, which is the right way to proceed.
I'm not expecting that all computer worldwide should have the en_US.utf8 installed: I don't think make sense changing the current archive_handler.c code.
Also, I'm not expecting that an embedded system with "C" as default locale should fail to extract a tar.gz file: if I extract the tar.gz from the swu file,  and I manually run
   tar xvf myupdate.tar.gz
it works with no problems.
I'm aware that tar does not dependent on libarchive, but this leads me to think that maybe the problem is only in libarchive.
For the time being, I can force the default locale to something supporting UTF8, but still this sounds a workaround, not a solution.
Any other suggestion ?

> I'm not expecting this can be a valid/acceptable patch, so this leads me to the original question.
> Should the UTF8 locale be considered a system mandatory requirement to use SWupdate ?
>
>> >
>> > I've also tested the patch suggested here
>> > https://github.com/libarchive/libarchive/pull/626/commits/f18bdc1f9af1168a9338dad6d879c094efc900fc
>> > but this leads to an apparently working condition that actually discards all files containing UTF8 chars.

James Hilliard

unread,
Oct 11, 2021, 1:07:55 AM10/11/21
to Pierluigi Passaro, swupdate
On Sun, Oct 10, 2021 at 2:40 PM Pierluigi Passaro
I think various revisions of that patch have been around for years
actually as downstream
distros have been shipping it for a long time.

Actually I just noticed a newer version of it just got merged upstream
about a month ago:
https://sourceware.org/git/?p=glibc.git;a=commit;h=466f2be6c08070e9113ae2fdc7acd5d8828cba50

>
>> > If I change the call to
>> > archive_locale = newlocale(LC_CTYPE_MASK, "en_US.utf8", (locale_t)0);
>> > everything start working again.
>>
>> Does setting LC_CTYPE=en_US.utf8 in the env with the default
>> implementation not work?
>> I thought that was equivalent to doing this.
>
>
> If I manually set LC_CTYPE=en_US.utf8 from the command line and then run swupdated (with no archive patch), it works fine.
> I'm wondering, from an architectural perspective, which is the right way to proceed.

Maybe have a launch script search for available UTF-8 locales and set
it appropriately?
I just set it in my systemd service env myself.

> I'm not expecting that all computer worldwide should have the en_US.utf8 installed: I don't think make sense changing the current archive_handler.c code.
> Also, I'm not expecting that an embedded system with "C" as default locale should fail to extract a tar.gz file: if I extract the tar.gz from the swu file, and I manually run
> tar xvf myupdate.tar.gz
> it works with no problems.
> I'm aware that tar does not dependent on libarchive, but this leads me to think that maybe the problem is only in libarchive.

It does seem to be an issue somewhat linked to libarchive, at least
for tar handling, but I don't think there's a decent alternative to
libarchive for swupdate.

> For the time being, I can force the default locale to something supporting UTF8, but still this sounds a workaround, not a solution.

I guess wait for the glibc C.UTF-8 patch to make it into a release and use that?

Or try to overhaul upstream libarchive's locale handling spaghetti
code...but that doesn't exactly sound easy.

> Any other suggestion ?
>
>> > I'm not expecting this can be a valid/acceptable patch, so this leads me to the original question.
>> > Should the UTF8 locale be considered a system mandatory requirement to use SWupdate ?
>> >
>> >> >
>> >> > I've also tested the patch suggested here
>> >> > https://github.com/libarchive/libarchive/pull/626/commits/f18bdc1f9af1168a9338dad6d879c094efc900fc
>> >> > but this leads to an apparently working condition that actually discards all files containing UTF8 chars.
>> >> >
>> >> > Any hints on this condition?
>> >> >
>> >> > Thanks
>> >> > Best Regards
>> >> > Pier
>> >> >
>> >> > --
>> >> > You received this message because you are subscribed to the Google Groups "swupdate" group.
>> >> > To unsubscribe from this group and stop receiving emails from it, send an email to swupdate+u...@googlegroups.com.
>> >> > To view this discussion on the web visit https://groups.google.com/d/msgid/swupdate/AM0PR08MB4372F9F5C13B12123C0CA392FFB39%40AM0PR08MB4372.eurprd08.prod.outlook.com.
>> >
>> > --
>> > You received this message because you are subscribed to the Google Groups "swupdate" group.
>> > To unsubscribe from this group and stop receiving emails from it, send an email to swupdate+u...@googlegroups.com.
>> > To view this discussion on the web visit https://groups.google.com/d/msgid/swupdate/e2abdbbf-4261-4d3f-8325-99da1715cc32n%40googlegroups.com.
>
> --
> You received this message because you are subscribed to the Google Groups "swupdate" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to swupdate+u...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/swupdate/82efa8ac-7cdb-4657-8004-113da65bfd25n%40googlegroups.com.

Pierluigi Passaro

unread,
Oct 11, 2021, 3:57:27 AM10/11/21
to swupdate
Hi James,
I really appreciated your support analyzing the situation.
All the options are now clear, I just need to choose one of them
Thanks a lot
Pier

Pierluigi Passaro

unread,
Oct 11, 2021, 4:05:19 AM10/11/21
to swupdate
... and also thanks to Otavio for sharing his similar experience ;)
Regards

Stefano Babic

unread,
Oct 11, 2021, 9:36:41 AM10/11/21
to Pierluigi Passaro, swupdate
> <https://github.com/sbabic/swupdate/commit/95a2b9961119aac80aea1eeabbc1cd52b72d876a>
>
> >> >>
> https://github.com/mpv-player/mpv/commit/1e70e82baa9193f6f027338b0fab0f5078971fbe
Yes, thanks Otavio / James for the deep analyses. What can be done in
SWUpdate is maybe to extenbd documentation and describe the issue and
how to work with it. Archive handler with libarchive is not described in
doc/source/handlers.rst, and adding it with this topic is worth, too.
Patches welcome ;-).

Best regards,
Stefano

--
=====================================================================
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: +49-8142-66989-53 Fax: +49-8142-66989-80 Email: sba...@denx.de
=====================================================================
Reply all
Reply to author
Forward
0 new messages