[icu4c] Loading .dat file for data

42 views
Skip to first unread message

Bastien Durel

unread,
Dec 23, 2024, 12:41:16 PM12/23/24
to icu-s...@unicode.org
Hello,

I'm building an application that should be deployed on various Linux
via tgz, so I cannot rely on system's ICU.
We have a policy a reducing the linked-in shared objects, so I link
against a static libicu. To minimize the binary size (because I will
have a lots of them), I try to get the data itself out of the binary,
so I compiled icu4c with `--with-data-packaging=archive`

I then link the apps against libicuuc.a & libicudata.a, and I added
this code inside the initialization routine :


std::filesystem::path icu_data_path = GetICUDataPath();
auto package_name = icu_data_path.filename();
auto* data = udata_open(
icu_data_path.parent_path().c_str(),
"dat",
package_name.stem().c_str(),
&status);
if (U_FAILURE(status) != 0) {
std::cout << "icu_data_path: " << icu_data_path << std::endl;
std::cout << "data: " << data << std::endl;
std::cout << "*data: " << udata_getMemory(data) << std::endl;
throw std::runtime_error(VE_OPEN_ICU_DATA(
icu_data_path.string(), u_errorName(status)));
}
udata_setAppData("icudt", udata_getMemory(data), &status);
if (U_FAILURE(status) != 0) {
throw std::runtime_error(VE_LOAD_ICU_DATA(
icu_data_path.string(), u_errorName(status)));
}


The udata_open() returns a pointer as expected, but the
udata_setAppData() fails, so I get this exception :

Failed to load ICU data from '/usr/local/share/icu/76.1/icudt76l.dat':
U_INVALID_FORMAT_ERROR

Do you know if the archive is supposed to be loaded by some other way ?
Or if it's not intended to be opened this way ?

Thanks,

NB: sorry for re-posting, but the google group didn't looked like if
was fully initialized, so I'm not sure the mail was really forwarded.

Regards,

--
Bastien Durel

Shane Carr

unread,
Dec 28, 2024, 1:47:50 AM12/28/24
to Bastien Durel, icu-s...@unicode.org
I see 2 potential issues:

1. You should use udata_setCommonData not udata_setAppData
2. You should pass in the whole file, including the header, not just the part returned by udata_getMemory. Try passing the return value of udata_open to udata_setCommonData.

--
You received this message because you are subscribed to the Google Groups "icu-support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to icu-support...@unicode.org.
To view this discussion visit https://groups.google.com/a/unicode.org/d/msgid/icu-support/04d010fba1dc5fea95c6d62c6d0c90e3f3a60f7a.camel%40quetastream.com.

--
You received this message because you are subscribed to the Google Groups "ICU - Team" group.
To unsubscribe from this group and stop receiving emails from it, send an email to icu-team+u...@unicode.org.
To view this discussion visit https://groups.google.com/a/unicode.org/d/msgid/icu-team/04d010fba1dc5fea95c6d62c6d0c90e3f3a60f7a.camel%40quetastream.com.

Shane Carr

unread,
Dec 28, 2024, 1:48:52 AM12/28/24
to Bastien Durel, icu-s...@unicode.org
I see 2 potential issues:

1. You should use udata_setCommonData not udata_setAppData
2. You should pass in the whole file, including the header, not just the part returned by udata_getMemory. Try passing the return value of udata_open to udata_setCommonData.

On Mon, Dec 23, 2024 at 9:41 AM Bastien Durel <bastie...@quetastream.com> wrote:

Steven R. Loomis

unread,
Dec 30, 2024, 9:58:01 AM12/30/24
to Shane Carr, Bastien Durel, icu-s...@unicode.org
All of this needs to be done before any other ICU calls are made.

Depending on your exact scenario, you might be able to replace all of that code with:

    u_setDataDirectory(icu_data_path.c_str());

… and let ICU’s loading handle the rest.  Normally the ICU_DATA environment variable is used to set the path, so this line might not even be needed.

A recommended followup to the above would be to call u_init to validate the data loading:
     
    UErrorCode status = U_ZERO_ERROR;
    u_init(&status);
    if (U_FAILURE(status)) { 
       throw std::runtime_error(VE_OPEN_ICU_DATA(
        icu_data_path.string(), u_errorName(status)));
     }


--
Steven R. Loomis
Code Hive Tx, LLC





--
You received this message because you are subscribed to the Google Groups "ICU - Team" group.
To unsubscribe from this group and stop receiving emails from it, send an email to icu-team+u...@unicode.org.

bastie...@quetastream.com

unread,
Jan 2, 2025, 5:15:11 AMJan 2
to Steven R. Loomis, icu-s...@unicode.org
Le lundi 30 décembre 2024 à 08:57 -0600, Steven R. Loomis a écrit :
> All of this needs to be done before any other ICU calls are made.
>
> Depending on your exact scenario, you might be able to replace all of
> that code with:
>
>     u_setDataDirectory(icu_data_path.c_str());
>
Hello,

From what I've seen, u_setDataDirectory() works great with data files
(*.res, *.cnv ...), but not with archives (*.dat)

Thanks for the u_init() suggestion, I will do that to validate when I'm
loading data files (I've put that as a fallback)

Regards,

--
Bastien

bastie...@quetastream.com

unread,
Jan 2, 2025, 5:35:46 AMJan 2
to Shane Carr, icu-s...@unicode.org
Le vendredi 27 décembre 2024 à 22:48 -0800, Shane Carr a écrit :
> I see 2 potential issues:
>
> 1. You should use udata_setCommonData not udata_setAppData
> 2. You should pass in the whole file, including the header, not just
> the part returned by udata_getMemory. Try passing the return value of
> udata_open to udata_setCommonData.

Hello,

I had tried udata_setAppData() but it doesn't seems to work, either
with `return data;` or `return udata_getMemory(data);`

But it works when I'm using it with mmap'ed memory.

udata_setCommonData() worked too with mmap'ed memory, but only if the
archive was also opened with udata_open(), which was rather strange.

Thanks for help, anyway, I no load with udata_setAppData(mmaped_data)

Regards,

--
Bastien
Reply all
Reply to author
Forward
0 new messages