Asking for advice: utf-8 vs. wchar_t under Android

74 views
Skip to first unread message

Bakcsi Zsolt

unread,
May 25, 2012, 5:08:27 AM5/25/12
to wx-...@googlegroups.com
Hello,

Recently, a lot of discussion was going on about what the internal
representation of wxString should be under Linux.
As far as I understand, the final decision is to use wchar_t (utf-32).
And, as far as I can see, wchar_t is used under wxAndroid too - well,
it is almost Linux.
Now, I think, with wxAndroid, it would be better to use utf-8. My arguments:
- usually there's far less memory,
- and far less CPU (for conversions),
- and I think, for other reasons too, people won't be able to just
drop huge codebases into wxAndroid as is, therefore, it's much more
realistic to ask them to review their code to address string indexing
problems.

BTW, AFAIK, under android, java uses utf-16, c/c++ uses utf-8, so, to
make things more complete, why not use utf-32 in wx? :)

On the other hand, seems like wchar_t is not really supported under
android NDK. So, if we use it to represent wxString, will wx use some
(any) wide version of character functions (like vswprintf, wcsrtombs,
etc., see android's wchar.h)?

So, what do you think?

Best Regards,
Zsolt

Vadim Zeitlin

unread,
May 25, 2012, 5:52:39 AM5/25/12
to wx-...@googlegroups.com
On Fri, 25 May 2012 11:08:27 +0200 Bakcsi Zsolt wrote:

BZ> As far as I understand, the final decision is to use wchar_t (utf-32).

Yes, by default.

BZ> Now, I think, with wxAndroid, it would be better to use utf-8. My arguments:
BZ> - usually there's far less memory,
BZ> - and far less CPU (for conversions),
BZ> - and I think, for other reasons too, people won't be able to just
BZ> drop huge codebases into wxAndroid as is, therefore, it's much more
BZ> realistic to ask them to review their code to address string indexing
BZ> problems.

I'd agree with all of the above but only if Java used it...

BZ> BTW, AFAIK, under android, java uses utf-16, c/c++ uses utf-8, so, to
BZ> make things more complete, why not use utf-32 in wx? :)

Not ideal, of course. But it's Java's fault, not ours, UTF-16 is really
the worst possible choice :-(

BZ> On the other hand, seems like wchar_t is not really supported under
BZ> android NDK. So, if we use it to represent wxString, will wx use some
BZ> (any) wide version of character functions (like vswprintf, wcsrtombs,
BZ> etc., see android's wchar.h)?

We do have our own fallbacks for (ancient) systems not providing these
functions but it's clear that it would be better to have wide-char friendly
CRT. Does NDK support UTF-8 correctly at least?

BZ> So, what do you think?

If UTF-32 doesn't work at all then UTF-8 would be an easy choice. If not,
I'm really not sure but I think using UTF-8 by default might make more
sense. Both it and UTF-32 require conversion to Java UTF-16 anyhow but
UTF-8 typically takes less memory and may be used without conversion with
other libraries.

Regards,
VZ

Bakcsi Zsolt

unread,
May 25, 2012, 6:11:57 AM5/25/12
to wx-...@googlegroups.com
Hello,

On Fri, May 25, 2012 at 11:52 AM, Vadim Zeitlin <va...@wxwidgets.org> wrote:
> BZ> On the other hand, seems like wchar_t is not really supported under
> BZ> android NDK. So, if we use it to represent wxString, will wx use some
> BZ> (any) wide version of character functions (like vswprintf, wcsrtombs,
> BZ> etc., see android's wchar.h)?
>
>  We do have our own fallbacks for (ancient) systems not providing these
> functions but it's clear that it would be better to have wide-char friendly
> CRT. Does NDK support UTF-8 correctly at least?

Well, I still need to google around to find out about utf-8
correctness in NDK...

BTW, wchar.h of android says (even for level 14):
/* IMPORTANT: Any code that relies on wide character support is essentially
* non-portable and/or broken. the only reason this header exist
* is because I'm really a nice guy. However, I'm not nice enough
* to provide you with a real implementation. instead wchar_t == char
* and all wc functions are stubs to their "normal" equivalent...
*/

Please let me ask for some help, where should I look, how can wx be
instructed to use only the fallback wide character functions?


>  If UTF-32 doesn't work at all then UTF-8 would be an easy choice. If not,
> I'm really not sure but I think using UTF-8 by default might make more
> sense.

I think I'd give utf-8 a try. To do this, is it ok to add these lines:
#define wxUSE_UNICODE_UTF8 1
#define wxUSE_UTF8_LOCALE_ONLY 1
to <wx>/include/android/setup.h?

Actually, I already did try this, and interestingly, seems like out
app crashes when creating the first wx-based class. Unfortunately, I
couldn't find out more, as I was unable to turn the stack trace into
something human-compatible. (When leaving the utf-8 settings as-is,
our app crashes randomly at some later point...)

Kind Regards,
Zsolt

Vadim Zeitlin

unread,
May 25, 2012, 1:29:12 PM5/25/12
to wx-...@googlegroups.com
On Fri, 25 May 2012 12:11:57 +0200 Bakcsi Zsolt wrote:

BZ> Please let me ask for some help, where should I look, how can wx be
BZ> instructed to use only the fallback wide character functions?

configure is supposed to detect their absence itself but if the functions
are present and just don't work (I do hope that the comment you quoted is
some kind of a sick joke, even though I strongly suspect that it isn't...)
then you're on your own and would need to undefine various HAVE_XXX
forcefully for Android.

BZ> I think I'd give utf-8 a try. To do this, is it ok to add these lines:
BZ> #define wxUSE_UNICODE_UTF8 1
BZ> #define wxUSE_UTF8_LOCALE_ONLY 1
BZ> to <wx>/include/android/setup.h?

Yes, normally this is all that you need.

BZ> Actually, I already did try this, and interestingly, seems like out
BZ> app crashes when creating the first wx-based class.

I'm afraid this could be because it uses some wide char function that
corrupts the stack because of "wchar_t == char" thing...

Regards,
VZ

Catalin

unread,
May 25, 2012, 2:48:29 PM5/25/12
to wx-...@googlegroups.com


From: Bakcsi Zsolt <>
Sent: Friday, 25 May 2012, 13:11

Well, I still need to google around to find out about utf-8
correctness in NDK...

Bakcsi Zsolt

unread,
May 26, 2012, 7:38:06 PM5/26/12
to wx-...@googlegroups.com
>  configure is supposed to detect their absence itself but if the functions
> are present and just don't work (I do hope that the comment you quoted is
> some kind of a sick joke, even though I strongly suspect that it isn't...)
> then you're on your own and would need to undefine various HAVE_XXX
> forcefully for Android.

In wxcrtbase.h, I see conditional wide char function definitions like:

#ifndef wxCRT_StrcatW
WXDLLIMPEXP_BASE wchar_t *wxCRT_StrcatW(wchar_t *dest, const wchar_t *src);
#endif

These are the fallback functions, aren't they?
However, some lines earlier, there are definitions like this:

#define wxCRT_StrcatW wcscat

and - as far as I can see - these do not depend on any ifdefs.

So, how can I make wx to use the fallback functions?

Thanks, Best Regards,
Zsolt

Vadim Zeitlin

unread,
May 27, 2012, 8:12:11 AM5/27/12
to wx-...@googlegroups.com
On Sun, 27 May 2012 01:38:06 +0200 Bakcsi Zsolt wrote:

BZ> In wxcrtbase.h, I see conditional wide char function definitions like:
BZ>
BZ> #ifndef wxCRT_StrcatW
BZ> WXDLLIMPEXP_BASE wchar_t *wxCRT_StrcatW(wchar_t *dest, const wchar_t *src);
BZ> #endif
BZ>
BZ> These are the fallback functions, aren't they?

Yes, they are.

BZ> However, some lines earlier, there are definitions like this:
BZ>
BZ> #define wxCRT_StrcatW wcscat

They probably used to but we must have dropped support for the platforms
where wcscat() didn't exist.

BZ> So, how can I make wx to use the fallback functions?

You'd have to put the entire block defining wxCRT_StrcatW() and friends
inside "#ifndef __ANDROID__" and also implement a lot of other functions
for which we don't have fallbacks. The trouble is that using UTF-8 doesn't
mean not using wchar_t, we still probably use a lot of wide char functions
even in this build...

Regards,
VZ

Bakcsi Zsolt

unread,
May 30, 2012, 5:35:46 AM5/30/12
to wx-...@googlegroups.com
On Sun, May 27, 2012 at 2:12 PM, Vadim Zeitlin <va...@wxwidgets.org> wrote:
> [...] The trouble is that using UTF-8 doesn't
> mean not using wchar_t, we still probably use a lot of wide char functions
> even in this build...

Things don't look very bright then...

May be it would be easier to add wchar_t support to Androi NDK somehow.
In one of Catalin's links (thank you) I found this:
http://www.crystax.net/en/android/ndk/7

This is a modified Android NDK with added wchar_t support (and more).
I don't know about it's licencing yet (just contacted the author).
I'll let you know if I get any answer.

Kind Regards,
Zsolt

Vadim Zeitlin

unread,
May 30, 2012, 3:45:48 PM5/30/12
to wx-...@googlegroups.com
On Wed, 30 May 2012 11:35:46 +0200 Bakcsi Zsolt wrote:

BZ> On Sun, May 27, 2012 at 2:12 PM, Vadim Zeitlin <va...@wxwidgets.org> wrote:
BZ> > [...] The trouble is that using UTF-8 doesn't
BZ> > mean not using wchar_t, we still probably use a lot of wide char functions
BZ> > even in this build...
BZ>
BZ> Things don't look very bright then...

For now we could build with wxUSE_UNICODE=0 but this clearly is not a good
long term solution.

BZ> May be it would be easier to add wchar_t support to Androi NDK somehow.
BZ> In one of Catalin's links (thank you) I found this:
BZ> http://www.crystax.net/en/android/ndk/7
BZ>
BZ> This is a modified Android NDK with added wchar_t support (and more).
BZ> I don't know about it's licencing yet (just contacted the author).
BZ> I'll let you know if I get any answer.

I heard about this NDK before and I think it's under the same licence as
the official one as parts of it were indeed integrated into the Google
version, so I hope it should be OK. But it would be better to have a
confirmation from the author, of course.

Thanks,
VZ

Bakcsi Zsolt

unread,
May 30, 2012, 5:14:58 PM5/30/12
to wx-...@googlegroups.com
On Wed, May 30, 2012 at 9:45 PM, Vadim Zeitlin <va...@wxwidgets.org> wrote:
> BZ> http://www.crystax.net/en/android/ndk/7
>
>  I heard about this NDK before and I think it's under the same licence as
> the official one as parts of it were indeed integrated into the Google
> version, so I hope it should be OK. But it would be better to have a
> confirmation from the author, of course.

I've got the answer. I would interpret it as
'yes, you can use it with an LGPL-ed lib even for commercial
closed-source apps, but the copyright notice and the disclaimer must
be shown somehow'.
But please, take a look too:

<><><><><><><><><><><><><><><><><><><><><><><><><><><>

Hi,

CrystaX NDK differs from Google's NDK (from legal point of view) only in
additional libcrystax linked to final library. Other than that, the same
restrictions applied to both Google's and CrystaX NDK. To view license
of libcrystax, please read <ndk-root>/sources/crystax/LICENSE. I'll copy
it's content here:

==========>cut here<==========
This library contains code from libc library of FreeBSD project which
by-turn contains
code from other projects. To see specific authors and/or licenses, look
into appropriate
source file. Here is license for those parts which are not derived from
any other projects
but written by Dmitry Moskalchuk.

Copyright (c) 2011-2012 Dmitry Moskalchuk <d...@crystax.net>.
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are
permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright
notice, this list of
conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright
notice, this list
of conditions and the following disclaimer in the documentation
and/or other materials
provided with the distribution.

THIS SOFTWARE IS PROVIDED BY Dmitry Moskalchuk ''AS IS'' AND ANY EXPRESS
OR IMPLIED
WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND
FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
Dmitry Moskalchuk OR
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
HOWEVER CAUSED AND ON
ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF
ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

The views and conclusions contained in the software and documentation
are those of the
authors and should not be interpreted as representing official policies,
either expressed
or implied, of Dmitry Moskalchuk.
==========>cut here<==========

Hope that helps.

--
Dmitry Moskalchuk

On 05/30/2012 01:23 PM, Bakcsi Zsolt wrote:
> Hi CrystaX,
>
> I found the CrystaX Android NDK while googling around about Android
> wchar_t support.
> I'm currently trying to port an LGPL-ed library (wxWidgets), which
> heavily depends on wchar_t.
> I couldn't find licence info on http://www.crystax.net.
> Please let me know whether it is OK to use the CrystaX Android NDK (in
> it's binary form) to build an LGPL-ed library, and then, to build
> applications using that library? WxWidget's LGPL allows to use it even
> for closed-source commercial applications - would the utilization of
> CrystaX NDK still allow this?
>
> Thank you in advance!
>
> Best Regards,
> Zsolt

Bakcsi Zsolt

unread,
May 31, 2012, 6:02:17 AM5/31/12
to wx-...@googlegroups.com
> On Wed, 30 May 2012 11:35:46 +0200 Bakcsi Zsolt wrote:
> BZ> http://www.crystax.net/en/android/ndk/7

The good news is, I can compile unit tests with CrystaX NDK (no
unresolved externals) - but to really make it work, test.cpp needs to
be rewritten so it can be invoked by an Android activity, and send
it's output to Android debug log. I hope I'll have time to try it
these days. (Also, our app behaves definitely better with CrystaX
NDK.)

The bad news is, our app crashes with CrystaX NDK too, when wx is
compiled in utf-8 mode. The debug output is really sparse, it just
says the program exited with signal 5 (Trace/breakpoint trap). Anyway,
hopefully we'll know more when unit tests work.

BTW, should I upload an updated patch to the current trunk containing
wxAndroid unit test improvements too (as well as content from the
previous patch), or should I wait until good things from the previous
patch are merged into the trunk?

Best Regards,
Zsolt

Vadim Zeitlin

unread,
May 31, 2012, 8:31:25 AM5/31/12
to wx-...@googlegroups.com
On Thu, 31 May 2012 12:02:17 +0200 Bakcsi Zsolt wrote:

BZ> > On Wed, 30 May 2012 11:35:46 +0200 Bakcsi Zsolt wrote:
BZ> > BZ> http://www.crystax.net/en/android/ndk/7
BZ>
BZ> The good news is, I can compile unit tests with CrystaX NDK

Great!

BZ> The bad news is, our app crashes with CrystaX NDK too, when wx is
BZ> compiled in utf-8 mode. The debug output is really sparse, it just
BZ> says the program exited with signal 5 (Trace/breakpoint trap). Anyway,
BZ> hopefully we'll know more when unit tests work.

Isn't there gdb included in NDK? I.e. can you run the program under it?

BZ> BTW, should I upload an updated patch to the current trunk containing
BZ> wxAndroid unit test improvements too (as well as content from the
BZ> previous patch), or should I wait until good things from the previous
BZ> patch are merged into the trunk?

Feel free to update it, it's not a problem for me to split it in several
parts when committing later.

Thanks,
VZ

Vadim Zeitlin

unread,
May 31, 2012, 9:00:22 AM5/31/12
to wx-...@googlegroups.com
On Wed, 30 May 2012 23:14:58 +0200 Bakcsi Zsolt wrote:

BZ> I've got the answer. I would interpret it as
BZ> 'yes, you can use it with an LGPL-ed lib even for commercial
BZ> closed-source apps, but the copyright notice and the disclaimer must
BZ> be shown somehow'.

This is reasonable. We wouldn't be able to include the NDK in wxWidgets
distribution itself with such licence but then I don't think we'd want to
do this anyhow as it's big and relatively rapidly changing. So it would be
up to wxAndroid users to do it and this licence does allow all normal use.

Regards,
VZ

Kobus

unread,
Jun 1, 2012, 4:15:07 AM6/1/12
to wx-dev
Sorry for chipping in here in the middle of the discussion, just
subscribed to the dev list.

FWIW I have a stable working build of the base classes using the
standard ndk 7, but it is with
#define wxUSE_UNICODE 0

This causes some minor compile problems on the trunk, but once
compiled it works without problems in our application.

Using UTF8 ( ie.
#define wxUSE_UNICODE_UTF8 1
#define wxUSE_UTF8_LOCALE_ONLY 1
)

does not work because the wchar functions from the broken wchar.h are
still used, especially when using printf string formatting.
I also had to tweak config_android.h

If you want to, should I just post the setup and config_android.h
files here?

> The bad news is, our app crashes with CrystaX NDK too, when wx is
> compiled in utf-8 mode. The debug output is really sparse, it just
> says the program exited with signal 5 (Trace/breakpoint trap). Anyway,
> hopefully we'll know more when unit tests work.

Setting up native debugging is a pain on some devices (took me a few
days!), but really needed for ndk dev...

Best
Kobus

Vadim Zeitlin

unread,
Jun 1, 2012, 7:07:07 AM6/1/12
to wx-...@googlegroups.com
On Fri, 1 Jun 2012 01:15:07 -0700 (PDT) Kobus wrote:

K> If you want to, should I just post the setup and config_android.h
K> files here?

Please post them to Trac as diffs so that they could still be applied even
if the files change in the meanwhile. TIA!

K> > The bad news is, our app crashes with CrystaX NDK too, when wx is
K> > compiled in utf-8 mode. The debug output is really sparse, it just
K> > says the program exited with signal 5 (Trace/breakpoint trap). Anyway,
K> > hopefully we'll know more when unit tests work.
K>
K> Setting up native debugging is a pain on some devices (took me a few
K> days!), but really needed for ndk dev...

Would you have any pointers to (good) instructions about how to do it by
chance?

Thanks,
VZ

Bakcsi Zsolt

unread,
Jun 1, 2012, 7:17:40 AM6/1/12
to wx-...@googlegroups.com
> K> Setting up native debugging is a pain on some devices (took me a few
> K> days!), but really needed for ndk dev...
>
>  Would you have any pointers to (good) instructions about how to do it by
> chance?

Yes, +1, please do!

For me, it's still a pain even for the emulator...

Zsolt
Reply all
Reply to author
Forward
0 new messages