Hello everyone,
I'm encountering an unexpected encoding issue with string literals in a Unicode build of a wxWidgets application, and I’d appreciate any insights.
Environment
wxWidgets: 3.3.1
OS: Windows 11
Source files: UTF-8 with BOM
Compiler: Visual Studio 2022
with /utf-8
wxWidgets build: Unicode (UTF-16 internally)
Problem
Using the standard _() macro on a UTF-8
string literal containing Italian accented characters results in a
corrupted wxString.
It looks like the UTF-8 bytes are interpreted as single-byte
characters.
Examples
| Code | Debug Output | Result |
|---|---|---|
wxString msg1 =
_("Prova di log con accenti: è ò à ì ù"); |
L"Prova di log con
accenti: è ò à ì ù" |
Corrupted |
wxString msg2 =
_(wxS("Prova di log con accenti: è ò à ì ù")); |
L"Prova di log con
accenti: è ò à ì ù" |
Correct |
Clarification
I understand that wxS() fixes the issue
because it turns the literal into a wide string (L"...").
What confuses me is that I expected _() alone to be sufficient, since the
source file is UTF-8 and MSVC is instructed to treat all narrow
literals as UTF-8 via /utf-8.
My question
Is it expected that _() still interprets
narrow string literals according to the system code page (CP1252)
instead of UTF-8, even when the compiler is set to UTF-8?
In other words: in a Unicode build, is using wxS() (or FromUTF8) the intended
and required way when dealing with UTF-8 source files?
Thanks a lot for any clarification.
Best regards,
Claudio Rossato
Hi,
Thanks for the explanation — it
clarifies the behavior of _().
I understand now that the argument is always treated as an ASCII key. I still find it surprising that this means a program cannot be written in Italian first and then translated to English. It seems that with wxWidgets, multi-language programs must always start with English strings.
Claudio.
Hi,
Thank you for the clarification — now everything is clear.
Claudio.