16-bit strings between C and JS?

449 views
Skip to first unread message

Mark Hahn

unread,
Mar 31, 2014, 6:46:21 PM3/31/14
to emscripte...@googlegroups.com
My C app uses all wchar_t (16-bit chars).  How do I send strings of these back and forth between C and JS?  I am currently looping through char by char but this seems inefficient since JS is already using 16-bit strings.

Any ideas?

Mark Hahn

unread,
Mar 31, 2014, 6:47:59 PM3/31/14
to emscripte...@googlegroups.com
BTW, I tried `allocate(intArrayFromString(text), 'i16', ALLOC_STACK)` with no luck.  I ended up with only every other character.

Chad Austin

unread,
Mar 31, 2014, 6:51:01 PM3/31/14
to emscripte...@googlegroups.com
C++11 has four character types:

char: 1 byte
wchar_t: unspecified character type equal to or larger than char (in practice, 16-bit on some platforms, 32-bit on others)
char16_t: 16-bit quantity representing a UTF-16 code unit
char32_t: 32-bit quantity representing a Unicode code point

JavaScript strings, unfortunately, happen to be defined as arrays of UTF-16 code units.

Thus, if you assume that your application uses wchar_t as if it was char16_t, which it sounds like you are, then you can map directly from std::wstring or wchar_t* to JavaScript strings.

embind, as policy, assumes that wchar_t is approximately equal to char16_t.

I don't believe there is a more efficient mechanism than looping from character to character in JavaScript.  I think someone once proposed a set of proper ArrayBuffer -> String text decoders, but I don't know if that proposal gained traction.


On Mon, Mar 31, 2014 at 3:46 PM, Mark Hahn <ma...@reevuit.com> wrote:
My C app uses all wchar_t (16-bit chars).  How do I send strings of these back and forth between C and JS?  I am currently looping through char by char but this seems inefficient since JS is already using 16-bit strings.

Any ideas?

--
You received this message because you are subscribed to the Google Groups "emscripten-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to emscripten-disc...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Chad Austin
Technical Director, IMVU

Mark Hahn

unread,
Mar 31, 2014, 7:07:23 PM3/31/14
to emscripte...@googlegroups.com
Thanks.  I guess I'm stuck with looping for now.  At least it is looping in JS and not in C.


--
You received this message because you are subscribed to a topic in the Google Groups "emscripten-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/emscripten-discuss/R0-SDIRfRho/unsubscribe.
To unsubscribe from this group and all its topics, send an email to emscripten-disc...@googlegroups.com.

Jukka Jylänki

unread,
Apr 1, 2014, 12:05:10 AM4/1/14
to emscripte...@googlegroups.com
If you compile with the Clang option -fshort-wchar, then wchar_t will get treated as 16-bit instead of the default unix-world 32-bit. Whichever you are building with, check out the functions UTF16ToString()+stringToUTF16() and UTF32ToString()+stringToUTF32(). Here's a test that demonstrates their use: https://github.com/kripken/emscripten/blob/master/tests/utf32.cpp

Chad Austin

unread,
Apr 1, 2014, 2:11:25 AM4/1/14
to emscripte...@googlegroups.com
Nice.

Note that, in C++11, you can use std::u16string (16-bit UTF-16 code units) and std::u32string (32-bit UTF-32 code points) to be more explicit than std::wstring.


Reply all
Reply to author
Forward
0 new messages