Convert const char* to wchar_t*. Trouble with mbstowcs

2,452 views
Skip to first unread message

progman

unread,
May 29, 2011, 2:03:04 AM5/29/11
to android-ndk
I can not convert ASCII string to Unicode correctly.
I to do so:
wchar_t mImageName[ 1024 ] = { 0 };
const char name[] = "test";

mbstowcs( mImageName, name, length );

LOGI("NDK pushTexture wcslen=%d\n", wcslen( mImageName ) ); - write
1


I think that mbstowcs converts to UTF-8. But wcslen works with UTF-32
imho.
May be need to call mbsinit. But I do not know how to set correct
params.

Pawel Veselov

unread,
May 29, 2011, 3:35:29 AM5/29/11
to andro...@googlegroups.com, progman
On 05/28/11 23:03, progman wrote:
I can not convert ASCII string to Unicode correctly.
I to do so:
wchar_t mImageName[ 1024 ] = { 0 };
const char name[] = "test";

mbstowcs( mImageName, name, length );

LOGI("NDK pushTexture  wcslen=%d\n",  wcslen( mImageName ) ); - write
1

I think that mbstowcs converts to UTF-8. But wcslen works with UTF-32
imho.

I'm not sure you know what you are doing judging by this...

There are multibyte strings (mb) and wide char strings (wc). Wide characters are encoding-less (you are not allowed to know what encoding they are in, only libc and/or compiler are supposed to know that). I guess that if you are writing code under a specific compiler, for a specific platform, then you can have some internal logic that will assume the actual encoding used for wide characters (I don't know what is it in Android NDK, but it's probably something like UCS-4).

multi-byte can be UTF-8, UTF-32, or whatever else you want it to be, but mbtoXX function (or other mbXXXX functions) use LC_CTYPE as an encoding to use for strings being manipulated (where WC strings are portable, and don't depend on any encoding).

It's really best to use iconv to convert between encodings when you know what those encodigns are.
I guess you can use mbtowc with one LC_CTYPE, and then wctomb with another LC_CTYPE (changing them using setlocale) to convert strings between different encodings. Seems like a hassle, but I don't know of a better way

May be need to call mbsinit. But I do not know how to set correct
params.
all right, mbsinit() is *only* needed if you are using mbsrtowcs() and wcsrtombs(), which are versions of mbstowcs and wcstombs(), but allow for processing of strings that can not be processed in one shot. Because conversion is a state machine, libc needs to maintain state outside of its function space, so it asks the developer to maintain the state instead. mbsinit() initializes such state. In other words, most likely, you don't need this.


http://en.wikipedia.org/wiki/Wide_character (http://ru.wikipedia.org/wiki/%D0%A8%D0%B8%D1%80%D0%BE%D0%BA%D0%B8%D0%B9_%D1%81%D0%B8%D0%BC%D0%B2%D0%BE%D0%BB)
http://www.linux.com/archive/feature/51836
http://www.freebsd.org/cgi/man.cgi?query=multibyte&apropos=0&sektion=0&format=html


Bob Holmboe

unread,
May 29, 2011, 4:09:59 AM5/29/11
to andro...@googlegroups.com
Have you tried to print out the hex format instead of the decimal ("Hexadecimal: %x\n",  wcslen( mImageName[0]) ); instead of the decimal ? You should be able to see what is going on.


--
You received this message because you are subscribed to the Google Groups "android-ndk" group.
To post to this group, send email to andro...@googlegroups.com.
To unsubscribe from this group, send email to android-ndk...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/android-ndk?hl=en.

Pawel Veselov

unread,
May 29, 2011, 4:34:44 AM5/29/11
to andro...@googlegroups.com, Bob Holmboe

Also, what is "length" in mbstowcs call? It should be 1024.
Return value of mbtowcs() also should be examined.

progman

unread,
May 29, 2011, 6:43:37 AM5/29/11
to android-ndk
for example
const char name[] = "test";
wchar_t mImageName[ 1024 ] = { 0 };
mbstowcs( mImageName, name, strlen( name ) );

LOGI("NDK pushTexture wcslen=%X\n", mImageName[0] ); //will print
74 65 73 74
LOGI("NDK pushTexture wcslen=%d\n", wcslen( mImageName ) ); //will
print 1

mbstowcs does easy memcpy from name to mImageName :(

On 29 май, 15:09, Bob Holmboe <bobh9...@gmail.com> wrote:
> Have you tried to print out the hex format instead of the decimal
> ("Hexadecimal: %x\n",  wcslen( mImageName[0]) ); instead of the decimal ?
> You should be able to see what is going on.
>
> >http://ru.wikipedia.org/wiki/%D0%A8%D0%B8%D1%80%D0%BE%D0%BA%D0%B8%D0%...
> > )
> >http://www.linux.com/archive/feature/51836
>
> >http://www.freebsd.org/cgi/man.cgi?query=multibyte&apropos=0&sektion=...

progman

unread,
May 29, 2011, 6:45:24 AM5/29/11
to android-ndk
length равен strlen( name )
mbtowcs тупо работае как memcpy
т.е никакого конвертирования в юникод нет.
> >     (http://ru.wikipedia.org/wiki/%D0%A8%D0%B8%D1%80%D0%BE%D0%BA%D0%B8%D0%...)
> >    http://www.linux.com/archive/feature/51836
> >    http://www.freebsd.org/cgi/man.cgi?query=multibyte&apropos=0&sektion=...
> >     <http://www.freebsd.org/cgi/man.cgi?query=multibyte&apropos=0&sektion=...>
>
> >     --
> >     You received this message because you are subscribed to the Google
> >     Groups "android-ndk" group.
> >     To post to this group, send email to andro...@googlegroups.com
> >     <mailto:andro...@googlegroups.com>.
> >     To unsubscribe from this group, send email to
> >     android-ndk...@googlegroups.com
> >     <mailto:android-ndk%2Bunsu...@googlegroups.com>.

alan

unread,
May 30, 2011, 3:19:12 AM5/30/11
to andro...@googlegroups.com
wide character strings are not supported in the ndk

wasabee18

unread,
May 29, 2011, 11:32:58 PM5/29/11
to android-ndk
Hi,

I do not think the function is supported. in STANDALONE-TOOLCHAIN.html
of the ndk:

5.2/ wchar_t support:
- If you target platform android-9 or higher, the size of wchar_t is
4 bytes, and most wide-char functions are available in the C
library
(with the exception of multi-byte encoding/decoding functions and
wsprintf/wsscanf).

- If you target any prior API level, the size of wchar_t will be 1
byte
and none of the wide-char functions will work anyway.

Hope that helps.
Cheers.
Reply all
Reply to author
Forward
0 new messages