Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

International characters (again, detailed explanation this time) [3]

0 views
Skip to first unread message

Jeff Mitchell

unread,
Sep 22, 2001, 10:12:19 PM9/22/01
to wx-u...@lists.wxwindows.org

Yep, a busy night ;) Sorry for the third email in a row.

I *have* tried a few font options, btw:

text = (wxTextCtrl*) FindWindow ( ID_detail_title );
text -> SetValue ( n -> GetTitleTextAlways() );

wxFont titlefont ( 20, wxROMAN, wxNORMAL, wxNORMAL,
FALSE, "", wxFONTENCODING_ISO8859_1 );
text -> SetFont ( titlefont );

Setting 8859-1 didn't help (it turns out same as previous
screenshot, though at point 20 due to previous lines). The wxWindows
manual also has UTF8, but thats not defined in the version of wxWindows
I'm using now (I think 2.2.7). There is a UNICODE #decl in my version, but
it doesnt' help (and I get a Unknown Encoding error at runtime that lets
me chose a new font to display in).

Perhaps I need to go to a more recent wxWindows (mine is the
stabel release on the website as of a month or so ago) that has UTF8 or
something? Though I'm pretty sure 8859-1 should be correct, since it works
with IE (not that we trust IE or anything ;)

Help!

jeff

--
"It's murder out there. You can't even travel around in your own micro
circuits without permission from 'Master Control Program'. I mean,
sending *ME* down here to play games.... Who does he calculate he is?"
-- Peter Jurasik as Crom, _Tron_

Vaclav Slavik

unread,
Sep 23, 2001, 10:51:09 AM9/23/01
to wx-u...@lists.wxwindows.org
Hi,

> wxFont titlefont ( 20, wxROMAN, wxNORMAL, wxNORMAL,
> FALSE, "", wxFONTENCODING_ISO8859_1 );
>

> Setting 8859-1 didn't help (it turns out same as previous

Of course it didn't, you're missing the point here. Expat always
returns string encoded in UTF-8. *Always*. encoding=iso-8859-1
in your XML file tells expat how to interpret the *file*, it says
nothing about what expat will *give you*. (I'm not telling you whole
truth here; it *is* possible to configure expat to output wchar_t
instead of UTF-8 encoded char* string. Your build of expat is
obviously not configured so, so let's forget about it for now...)

So naturally, if you use iso8859-1 font to display utf-8 text, it
will output garbage. wxWindows currently cannot directly display or
(use in controls) UTF-8 encoded text, not even in 2.3 where UTF8
encoding is defined.

You have two options: one, you can use unicode build of wxWindows
(wxUSE_UNICODE, only Win32) and convert expat's output to wxString
this way:

wxString string(expat_buf, wxConvUTF8, expat_buf_len);

Or you can use ANSI build (wxUSE_UNICODE=0,wxUSE_WCHAR_T=1) and do
two-step conversion:

// copy UTF-8 expat's output to wxString for convenience:
wxString utf8(expat_buf, expat_buf_len);
// convert utf8 to wchar_t representation and then back to iso:
wxCSConv cs("iso-8859-1");
wxString iso(utf8.wc_str(wxConvUTF8), cs);

Note that the last line is *lossy* convertion because utf8 may
contain (encoded) any of Unicode characters, while iso only has space
for 256 of them! You can do this only if you're sure the XML file in
question only contains iso-8859-1 strings.

Last but not least, this part of wxWindows (wx*Conv*,
strconv.{cpp,h}, wchar.{cpp,h}) is buggy as hell in 2.2 and even
2.3.1 has some bugs in it (CVS head aka 2.3.2 probably has some as
well but nobody has found them so far ;).

If you can't use 2.3 trunk for some reason (like that you rely on
wx's stability :), you may use libiconv directly, see
http://clisp.cons.org/~haible/packages-libiconv.html, it is available
as win32 DLL as well.

> UTF8 or something? Though I'm pretty sure 8859-1 should be correct,
> since it works with IE (not that we trust IE or anything ;)

It works in expat, too, it's you who misinterpret the output :)

Regards,
Vaclav


Jeff Mitchell

unread,
Sep 23, 2001, 12:39:09 PM9/23/01
to wx-u...@lists.wxwindows.org
On Sun, 23 Sep 2001, Vaclav Slavik wrote:

> Of course it didn't, you're missing the point here. Expat always
> returns string encoded in UTF-8. *Always*. encoding=iso-8859-1

Ahhh, this is important :)

> You have two options: one, you can use unicode build of wxWindows
> (wxUSE_UNICODE, only Win32) and convert expat's output to wxString
> this way:
>
> wxString string(expat_buf, wxConvUTF8, expat_buf_len);

And I can use wxCSConv to turn the UTF8 back into 8859-1 to write
back out to XML files? (There are other tools processing the XML files,
too, using 8859-1 :/)

> Or you can use ANSI build (wxUSE_UNICODE=0,wxUSE_WCHAR_T=1) and do
> two-step conversion:
>
> // copy UTF-8 expat's output to wxString for convenience:
> wxString utf8(expat_buf, expat_buf_len);
> // convert utf8 to wchar_t representation and then back to iso:
> wxCSConv cs("iso-8859-1");
> wxString iso(utf8.wc_str(wxConvUTF8), cs);
>
> Note that the last line is *lossy* convertion because utf8 may
> contain (encoded) any of Unicode characters, while iso only has space
> for 256 of them! You can do this only if you're sure the XML file in
> question only contains iso-8859-1 strings.

I'm pretty sure I'm covered but since I know little of these
things I may not be. The first solution sounds pretty easy.. ie:

Most of my libraries and code works with char*'s; I can rebuld
wxWindows with the UNICODE define, and then the two liner that sets up
the control can just massage the char* into wxString encoded right, and
feed that to the control, and when I fetch back the changes, convert back
to UTF8 or whatever.

> Last but not least, this part of wxWindows (wx*Conv*,
> strconv.{cpp,h}, wchar.{cpp,h}) is buggy as hell in 2.2 and even
> 2.3.1 has some bugs in it (CVS head aka 2.3.2 probably has some as
> well but nobody has found them so far ;).
>
> If you can't use 2.3 trunk for some reason (like that you rely on
> wx's stability :), you may use libiconv directly, see
> http://clisp.cons.org/~haible/packages-libiconv.html, it is available
> as win32 DLL as well.

Haven't seen a CLisp reference in awhile; Haible is amazing. I
can't use the 2.3 series for this, sadly :/

> It works in expat, too, it's you who misinterpret the output :)

I work too much :)

Great, thanks for (as always) great help. I'll tool around.

Vaclav Slavik

unread,
Sep 23, 2001, 6:41:30 PM9/23/01
to wx-u...@lists.wxwindows.org
Hi,

> > wxString string(expat_buf, wxConvUTF8, expat_buf_len);
>
> And I can use wxCSConv to turn the UTF8 back into 8859-1 to write
> back out to XML files?

Of course.

> Most of my libraries and code works with char*'s; I can rebuld
> wxWindows with the UNICODE define, and then the two liner that sets
> up the control can just massage the char* into wxString encoded

No. In unicode build, all wxWin functions work with wchar_t instead
of char. See the Unicode overview in wxManual.


VS

0 new messages