Is this the most idiomatic, reasonably performant way to normalize a std::string encoded with UTF8 using ICU4C?
30 views
Skip to first unread message
prospero
unread,
Mar 28, 2025, 11:23:06 PMMar 28
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to icu-s...@unicode.org
Just want to make sure I'm not missing anything obvious (besides StringPiece having non-explicit constructors for std::string), especially something that could cause an unnecessary degradation in performance:
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to prospero, icu-s...@unicode.org
On Sat, Mar 29, 2025 at 4:23 AM 'prospero' via icu-support
<icu-s...@unicode.org> wrote:
> icu::UnicodeString unnormalized_string = […]
> icu::UnicodeString normalized_string;
There's no need to convert back and forth to and from
icu::UnicodeString, you'll waste less computing resources if you skip
that and work directly with UTF-8 instead, by calling the
normalizeUTF8() method instead of the normalize() method:
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to rou...@google.com, icu-s...@unicode.org
Easily 2-3x faster. Thanks for your help, Fredrik.
> Sent: Monday, March 31, 2025 at 3:53 PM
> From: "'Fredrik Roubert' via icu-support" <icu-s...@unicode.org>
> To: "prospero" <pros...@cyber-wizard.com>
> Cc: icu-s...@unicode.org > Subject: Re: [icu-support] Is this the most idiomatic, reasonably performant way to normalize a std::string encoded with UTF8 using ICU4C?