std::string -> javascript string seems not work correctly using non ascii codepoints

271 views
Skip to first unread message

Sören König

unread,
Mar 31, 2016, 11:29:00 AM3/31/16
to emscripten-discuss
when using embind to call the following method test, the resulting string is broken (only non ascii codepoints)
std::string test()
{
   
return u8"test ö ä ü ende";
}

std
::wstring wtest()
{
   
return L"test ö ä ü ende";
}

EMSCRIPTEN_BINDINGS
(my_module) {
   
function("test", &test);
   
function("wtest", &wtest);
}

The version using wstring works with wchar_t size 4 byte but not with size 2 byte.(via Compiler-flag -fshort-wchar)


i did a similar experiment using cwrap/  EXPORT_FUNCTION to call

const char* test()
{
   
return u8"test ö ä ü ende";
}

which is working fine.

is this a bug?

Brion Vibber

unread,
Mar 31, 2016, 6:25:47 PM3/31/16
to emscripten Mailing List
If I'm reading the code correctly, embind's marshaling code currently just assumes Latin-1 encoding for std::string, which is incorrect in your case since you're using UTF-8 strings.

(You can find this in embind.js in the emscripten source tree; search for '_embind_register_std_string' and see the 'fromWireType' method that gets defined there.)

In a pinch, you can probably swap that code out, though I'm not sure there's an easy way to do it at runtime. Ideally this would be locale-sensitive or something, but I don't know offhand the encoding-safety properties of std::string...

-- brion



--
You received this message because you are subscribed to the Google Groups "emscripten-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to emscripten-disc...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Charles Vaughn

unread,
Apr 1, 2016, 12:55:08 PM4/1/16
to emscripten-discuss
Might be worth looking into leveraging TextEncoder/TextDecoder for browsers that support it
To unsubscribe from this group and stop receiving emails from it, send an email to emscripten-discuss+unsub...@googlegroups.com.

Brion Vibber

unread,
Apr 3, 2016, 7:23:25 AM4/3/16
to emscripten Mailing List
On Fri, Apr 1, 2016 at 7:55 PM, Charles Vaughn <cva...@gmail.com> wrote:
Might be worth looking into leveraging TextEncoder/TextDecoder for browsers that support it

I've filed issues in the tracker so we don't lose track:

https://github.com/kripken/emscripten/issues/4221 - lack of support for non-latin1
https://github.com/kripken/emscripten/issues/4222 - support TextEncoder/TextDecoder

-- brion
Reply all
Reply to author
Forward
0 new messages