If I'm reading the code correctly, embind's marshaling code currently just assumes Latin-1 encoding for std::string, which is incorrect in your case since you're using UTF-8 strings.
(You can find this in embind.js in the emscripten source tree; search for '_embind_register_std_string' and see the 'fromWireType' method that gets defined there.)
In a pinch, you can probably swap that code out, though I'm not sure there's an easy way to do it at runtime. Ideally this would be locale-sensitive or something, but I don't know offhand the encoding-safety properties of std::string...
-- brion