Possible regression in 1.39.0: "Invalid UTF-8 leading byte encountered when deserializing a UTF-8 string on the asm.js/wasm heap to a JS string!"

45 views
Skip to first unread message

ilammy

unread,
Oct 29, 2019, 10:47:26 AM10/29/19
to emscripten-discuss
Hi,

The subject line is confusing, but that's only the first visible symptom
that got my attention. I guess that's how it's going to look like for
the users which encounter the same issue.

Please tell me whether this should be reported to GitHub issues and how
can I help with debugging this.



We develop a cryptographic library in C which uses BoringSSL and compile
that to WebAssembly with Emscripten. Recently our CI builds started
failing [1] with very weird output:

    Running themis-core basic tests.
    node build-wasm/tests/soter_test.js
    [ ... ]
    Invalid UTF-8 leading byte 0x-7f encountered when deserializing a UTF-8 string on the asm.js/wasm heap to a JS string!
    QŰCRF񹢑򡺗NRꄍсvi풨jn񚞝񉝞»vﵹc񀀀
    Invalid UTF-8 leading byte 0x-7e encountered when deserializing a UTF-8 string on the asm.js/wasm heap to a JS string!
    [ ... ]
    Invalid UTF-8 leading byte 0x-4b encountered when deserializing a UTF-8 string on the asm.js/wasm heap to a JS string!
    𰣙+[혋忪$􁖲O񅱌򷕰驖s򛢀҈!񺊧s��\=HO>��on󶬅򟕒񜾠j𴴩{��ϕ��򷷹񴜋MG��窥𭾦#FLRЖ_QS}��s㏸񈫲U󬌞󿏝
                                                                           虧����ؤqۧꯔJ񬗡
      dz럄୰Cо3;̦��6 PQŰCRF񹢑򡺗NRꄍсvi풨jn񚞝񉝞»vﵹc񀀀
    [ ... ]
    𰣙+[혋忪$􁖲O񅱌򷕰驖s򛢀҈!񺊧s��\=HO>��on󶬅򟕒񜾠j𴴩{��ϕ��򷷹񴜋MG��窥𭾦#FLRЖ_QS}��CRF񹢑򡺗NRꄍсvi풨jn񚞝񉝞»vﵹc񀀀
    펆섦��汳V��ݐ۱ز|��寂G����V��
                         ~T��o𿍄Z鎎j������ӓ𢸡\񨲟󀴶I��Cr��d񀀀
    Bad file descriptor
    undefined
    undefined
    exception thrown: abort(undefined) at Error

After ruling out the possibility that we have woken up the Elder Ones,
it seems to be a regression introduced in Emscripten 1.39.0, because
compiling the code with the previous version (1.38.48) works fine [2].


You can try reproducing the issue by first installing the build
prerequisites for BoringSSL: Golang, Perl, CMake. Then please install
and activate Emscripten 1.39.0 SDK. After that clone the code and try
building it:

    cd themis
    git submodule init && git submodule update
    emmake make wasmthemis test

I'm working on a minimal example, but that's currently the easiest way
to trigger that behavior. I'm mostly certain that Themis source code is
not the issue, but BoringSSL is kinda huge so this might take a while.

I'm observing this issue on a Linux box with Debian running, on a macOS
machine. I don't have a Windows VM at the moment so no idea about that.



I've tried debugging the issue by inserting tracing printf() calls, but
to my surpise they either did not print anything or output garbage like
in the build logs, or start printing strange errors about not flushed
stdout and suggestions to EXIT...

After some experimentation I have found that -fPIC flag seems to be
influencing the behavior. Compiling a simple "Hello, world!" program
with -fPIC does not print anything:

    $ emcc hello.c -fPIC
    $ node ./a.out.js 
    $ 

while it's okay without the flag:

    $ emcc hello.c
    $ node ./a.out.js 
    Hello, world!

Needless to say, both are fine with Emscripten 1.38.48.

Disabling position-independent code for Themis and BoringSSL finally
made the tests output something in addition to Zalgo woes, but
ultimately they fail do to the same issues. Trace logging suggests that
BoringSSL code sometimes executes fine, but does not behave as it
should. For example, the tests that verify hash function behavior
indicate that SHA-256 computations give incorrect results, and later
tests calling encryption functions just crash the process.

Alon Zakai

unread,
Oct 29, 2019, 3:48:48 PM10/29/19
to emscripte...@googlegroups.com
I believe the -fPIC issue is


There is a fix for that, which you can test by installing "tot-upstream" (tip of tree build) instead of "latest".

The other issues are harder to guess at. A reduced testcase would be great, but yeah, sounds like that's not easy to get.

Some things that might help:

 * Check if tot-upstream works.
 * Check if tot-fastcomp works (latest uses upstream now, as of 1.39.0, so that's a likely suspect).
 * Check if building with sanitizers finds anything, https://emscripten.org/docs/debugging/Sanitizers.html
 * Is this with WASM=0, or the default wasm output?

- Alon


--
You received this message because you are subscribed to the Google Groups "emscripten-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to emscripten-disc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/emscripten-discuss/9eedf3a0-e9d8-4d85-add3-9cf40aa6b5fc%40googlegroups.com.

ilammy

unread,
Oct 31, 2019, 8:46:58 AM10/31/19
to emscripten-discuss
I believe the -fPIC issue is 
https://github.com/emscripten-core/emscripten/issues/9013 
There is a fix for that, which you can test by installing "tot-upstream" 
(tip of tree build) instead of "latest".
Thanks. Printing works fine with tip of the tree builds, so that seems
to be it.

 * Check if tot-fastcomp works (latest uses upstream now, as of 1.39.0, 
so that's a likely suspect).
Thanks for this suggestion. It turns out that fastcomp works. It seems
that this issue is actual only for the LLVM backend.

* Is this with WASM=0, or the default wasm output?
I’m seeing this behavior only with Wasm builds.

I’m currently trying out the sanitizers, though they don’t show anything
(so I suspect that I did not actually enable them).

On Tuesday, October 29, 2019 at 9:48:48 PM UTC+2, Alon Zakai wrote:
I believe the -fPIC issue is


There is a fix for that, which you can test by installing "tot-upstream" (tip of tree build) instead of "latest".

The other issues are harder to guess at. A reduced testcase would be great, but yeah, sounds like that's not easy to get.

Some things that might help:

 * Check if tot-upstream works.
 * Check if tot-fastcomp works (latest uses upstream now, as of 1.39.0, so that's a likely suspect).
 * Check if building with sanitizers finds anything, https://emscripten.org/docs/debugging/Sanitizers.html
 * Is this with WASM=0, or the default wasm output?

- Alon


Reply all
Reply to author
Forward
0 new messages