What the most efficient way to serialise a complex object to present to a worker thread?

329 views
Skip to first unread message

Marcos Scriven

unread,
Mar 25, 2013, 7:21:02 PM3/25/13
to emscripte...@googlegroups.com
I've tried manually parsing a large and complex object to a worker thread with Emscripten?

The objects I have in mind are 200K to 2Mb, with quite a convoluted tree structure. The code in question has a parser to serialise to a string (and back) with the << and >> stream operators.

However, that's taking 0.5 to 4 seconds, even in asm.js.

Is there any short cut here?

Marcos

Alon Zakai

unread,
Mar 25, 2013, 7:43:17 PM3/25/13
to emscripte...@googlegroups.com
If you generate the object in one big buffer (all the nodes in the
tree structure allocated by a bump allocator in the buffer), then you
could just copy that range. Otherwise, you need a
serializer/deserializer as you say. I'm surprised though it takes so
long, perhaps the serializer is hitting a bad case in our codegen? For
example if it has a big interpreter-type loop (switch with many cases)
we dont optimize that well yet.

- azakai
> --
> You received this message because you are subscribed to the Google Groups
> "emscripten-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to emscripten-disc...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

Marcos Scriven

unread,
Mar 26, 2013, 7:02:28 AM3/26/13
to emscripte...@googlegroups.com
Thanks for the suggestion, I've not heard about 'bump allocators', but if I can copy a chunk of memory like that that's just going to be so much faster.

Incidentally I'm using Google protobuf (also obviously compiled to Javascript) for the other simpler objects (which still requires manual mapping from objects to protobuf messages) and that's lightning fast. 

Marcos Scriven

unread,
Mar 26, 2013, 8:48:52 PM3/26/13
to emscripte...@googlegroups.com
I've been looking at Boost Interprocess, which fortunately is header only. Among other things, it allows you to specify a buffer in which to allocate objects (rather than shared mem or files etc.)

Having battled to find these macros (which are I think are unset due to the undefines EMCC sets)

  #define BOOST_INTERPROCESS_POSIX_TIMEOUTS
  #define BOOST_DATE_TIME_HAS_HIGH_PRECISION_CLOCK 
  #define BOOST_HAS_GETTIMEOFDAY

I got rid of the errors saying "microsec_clock undeclared", and a have a simple test case compiling with Emscripten. However at runtime in NodeJs it fails with:

  wcslen not implemented. 

Would it be safe to just delegate to strlen for now, on the basis I'm not using any unicode characters?

Seems to be a fairly straight forward function, so just wondering why it's not there? I could have a bash, but not entirely sure how to write asm.js compliant code.

Marcos

Marcos Scriven

unread,
Mar 27, 2013, 5:10:22 AM3/27/13
to emscripte...@googlegroups.com
Been adding any macros the look like they might pertain to reomving dependency on wchar:

#define BOOST_NO_CWCHAR
#define BOOST_NO_CWCTYPE
#define BOOST_NO_CTYPE_FUNCTIONS
#define BOOST_NO_INTRINSIC_WCHAR_T
#define BOOST_NO_STD_WSTREAMBUF
#define BOOST_NO_STD_WSTRING
#define BOOST_NO_CHAR32_T

#define BOOST_NO_CHAR16_T
#define BOOST_NO_UNICODE_LITERALS

None seem to stop the wcslen function being called...

Marcos Scriven

unread,
Mar 27, 2013, 5:53:26 AM3/27/13
to emscripte...@googlegroups.com
I moved back over to incoming, to see if picking up the latest libcxx stuff would help, but compilation fails with:

emcc: considering including libcxx: we need set(['_ZNSt3__113basic_ostreamIcNS_11char_traitsIcEEE6sentryD1Ev', '_ZNSt3__18ios_base5clearEj', '_ZNKSt3__18ios_base6getlocEv', '_ZNSt3__15ctypeIcE2idE', '_ZNSt3__113basic_ostreamIcNS_11char_traitsIcEEE6sentryC1ERS3_', '_ZNSt3__16localeD1Ev', '_ZNSt3__18ios_base33__set_badbit_and_consider_rethrowEv', '_ZNSt3__112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEE6__initEjc', '_ZNKSt3__16locale9use_facetERNS0_2idE', '_ZNSt3__112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEED1Ev', '_ZNSt3__112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEE6assignEPKc', '_ZNSt3__14coutE', '_ZNSt3__113basic_ostreamIcNS_11char_traitsIcEEElsEi']) and have set([])
emcc: including libcxx
emcc: building libcxx for cache
Traceback (most recent call last):
  File "/home/marcosscriven/sources/emscripten/emcc", line 1234, in <module>
    libfile = shared.Cache.get(name, create)
  File "/home/marcosscriven/sources/emscripten/tools/cache.py", line 37, in get
    shutil.copyfile(creator(), cachename)
  File "/home/marcosscriven/sources/emscripten/emcc", line 1176, in create_libcxx
    shared.Building.link(os, in_temp('libcxx.bc'))
  File "/home/marcosscriven/sources/emscripten/tools/shared.py", line 740, in link
    if Building.is_bitcode(f):
  File "/home/marcosscriven/sources/emscripten/tools/shared.py", line 1117, in is_bitcode
    b = open(filename, 'r').read(4)
IOError: [Errno 2] No such file or directory: '/tmp/tmpB2IgJi/stdexcept.cpp.o'

Bruce Mitchener

unread,
Mar 27, 2013, 6:31:41 AM3/27/13
to emscripte...@googlegroups.com
I'd like to hear more detail about that libcxx build failure ... more complete logs or more detail would be great.

Do you have a libcxx.bc in your ~/.emscripten_cache/ ? If you do, try removing it.

If you do a EMCC_DEBUG=1, you should see all of the warnings / errors from clang while building libcxx and perhaps that'll have a useful clue / hint as well?

 - Bruce



Marcos Scriven

unread,
Mar 27, 2013, 6:47:13 AM3/27/13
to emscripte...@googlegroups.com
Hi Bruce

I usually blast the cache with --clear-cache whenever I update the branch. But I manually cleared it just now to be sure, and ran with:

EMCC_DEBUG=1 em++ main.cpp  -v -I ~/sources/includes/ -g

Where ~/sources/includes is there just to pick up my Boost header files. There are no symlinks in there to system headers.

I added debug (sorry should have done that before), and it's obvious why now there error is occurring now:

emcc running: /usr/local/bin/clang++ -std=c++11 -m32 -U__i386__ -U__x86_64__ -U__i386 -U__x86_64 -Ui386 -Ux86_64 -U__SSE__ -U__SSE2__ -U__MMX__ -UX87_DOUBLE_ROUNDING -UHAVE_GCC_ASM_FOR_X87 -DEMSCRIPTEN -U__STRICT_ANSI__ -U__CYGWIN__ -D__STDC__ -Xclang -triple=i386-pc-linux-gnu -D__IEEE_LITTLE_ENDIAN -fno-math-errno -fno-ms-compatibility -nostdinc -Xclang -nobuiltininc -Xclang -nostdsysteminc -Xclang -isystem/home/marcosscriven/sources/emscripten/system/local/include -Xclang -isystem/home/marcosscriven/sources/emscripten/system/include/libcxx -Xclang -isystem/home/marcosscriven/sources/emscripten/system/include -Xclang -isystem/home/marcosscriven/sources/emscripten/system/include/emscripten -Xclang -isystem/home/marcosscriven/sources/emscripten/system/include/bsd -Xclang -isystem/home/marcosscriven/sources/emscripten/system/include/libc -Xclang -isystem/home/marcosscriven/sources/emscripten/system/include/gfx -Xclang -isystem/home/marcosscriven/sources/emscripten/system/include/net -Xclang -isystem/home/marcosscriven/sources/emscripten/system/include/SDL -U__APPLE__ -U__linux__ -emit-llvm -c /home/marcosscriven/sources/emscripten/system/lib/libcxx/exception.cpp -o /tmp/tmpNozl7i/exception_0.o
In file included from /home/marcosscriven/sources/emscripten/system/lib/libcxx/exception.cpp:31:
In file included from /usr/lib/gcc/i686-linux-gnu/4.7/../../../../include/c++/4.7/cxxabi.h:49:
In file included from /usr/lib/gcc/i686-linux-gnu/4.7/../../../../include/c++/4.7/i686-linux-gnu/bits/c++config.h:414:
/usr/lib/gcc/i686-linux-gnu/4.7/../../../../include/c++/4.7/i686-linux-gnu/bits/os_defines.h:45:19: error: token is not a valid binary operator in a preprocessor
      subexpression
#if __GLIBC_PREREQ(2,15) && defined(_GNU_SOURCE)

However, I'm not entirely sure why it's picking up the system includes.

Marcos

Marcos Scriven

unread,
Mar 27, 2013, 6:49:22 AM3/27/13
to emscripte...@googlegroups.com
There's one other error too:

emcc invocation:  /home/marcosscriven/sources/emscripten/emcc /home/marcosscriven/sources/emscripten/system/lib/libcxx/stdexcept.cpp -o /tmp/tmpJrL2pM/stdexcept.cpp.o -std=c++11 
(Emscripten: Running sanity checks)
emcc: compiling to bitcode
emcc: compiling source file:  /home/marcosscriven/sources/emscripten/system/lib/libcxx/stdexcept.cpp
emcc running: /usr/local/bin/clang++ -std=c++11 -m32 -U__i386__ -U__x86_64__ -U__i386 -U__x86_64 -Ui386 -Ux86_64 -U__SSE__ -U__SSE2__ -U__MMX__ -UX87_DOUBLE_ROUNDING -UHAVE_GCC_ASM_FOR_X87 -DEMSCRIPTEN -U__STRICT_ANSI__ -U__CYGWIN__ -D__STDC__ -Xclang -triple=i386-pc-linux-gnu -D__IEEE_LITTLE_ENDIAN -fno-math-errno -fno-ms-compatibility -nostdinc -Xclang -nobuiltininc -Xclang -nostdsysteminc -Xclang -isystem/home/marcosscriven/sources/emscripten/system/local/include -Xclang -isystem/home/marcosscriven/sources/emscripten/system/include/libcxx -Xclang -isystem/home/marcosscriven/sources/emscripten/system/include -Xclang -isystem/home/marcosscriven/sources/emscripten/system/include/emscripten -Xclang -isystem/home/marcosscriven/sources/emscripten/system/include/bsd -Xclang -isystem/home/marcosscriven/sources/emscripten/system/include/libc -Xclang -isystem/home/marcosscriven/sources/emscripten/system/include/gfx -Xclang -isystem/home/marcosscriven/sources/emscripten/system/include/net -Xclang -isystem/home/marcosscriven/sources/emscripten/system/include/SDL -U__APPLE__ -U__linux__ -emit-llvm -c /home/marcosscriven/sources/emscripten/system/lib/libcxx/stdexcept.cpp -o /tmp/tmpvCw_SJ/stdexcept_0.o
In file included from /home/marcosscriven/sources/emscripten/system/lib/libcxx/stdexcept.cpp:26:
In file included from /usr/lib/gcc/i686-linux-gnu/4.7/../../../../include/c++/4.7/cxxabi.h:49:
In file included from /usr/lib/gcc/i686-linux-gnu/4.7/../../../../include/c++/4.7/i686-linux-gnu/bits/c++config.h:414:
/usr/lib/gcc/i686-linux-gnu/4.7/../../../../include/c++/4.7/i686-linux-gnu/bits/os_defines.h:45:19: error: token is not a valid binary operator in a preprocessor
      subexpression
#if __GLIBC_PREREQ(2,15) && defined(_GNU_SOURCE)
    ~~~~~~~~~~~~~~^
1 error generated.
emcc: compiler frontend failed to generate LLVM bitcode, halting

Bruce Mitchener

unread,
Mar 27, 2013, 6:52:07 AM3/27/13
to emscripte...@googlegroups.com
Well, the key question for you to work out is why these files are getting picked up:


In file included from /usr/lib/gcc/i686-linux-gnu/4.7/../../../../include/c++/4.7/cxxabi.h:49:
In file included from /usr/lib/gcc/i686-linux-gnu/4.7/../../../../include/c++/4.7/i686-linux-gnu/bits/c++config.h:414:
/usr/lib/gcc/i686-linux-gnu/4.7/../../../../include/c++/4.7/i686-linux-gnu/bits/os_defines.h:45:19: error: token is not a valid binary operator in a preprocessor
      subexpression
#if __GLIBC_PREREQ(2,15) && defined(_GNU_SOURCE)


 - Bruce



--

Marcos Scriven

unread,
Mar 27, 2013, 7:06:57 AM3/27/13
to emscripte...@googlegroups.com
In both cases it's down to this:

#elif defined(LIBCXXRT) || __has_include(<cxxabi.h>)
  #include <cxxabi.h>

I tried adding -ULIBCXXRT, but it still picks up cxxabi. Am I correct in thinking it's thus the clang preprocessor that's returning 'true' for __has_include(<cxxabi.h> ?

I thought EMCC told clang to not include its own headers?

Bruce Mitchener

unread,
Mar 27, 2013, 7:13:23 AM3/27/13
to emscripte...@googlegroups.com
-nostdinc should do that ...

In my builds on OS X, that doesn't trigger because I removed libcxxabi's include paths from the search paths passed to clang as part of that set of commits. You'll notice that that wasn't in the -isystem flags being passed.

Something is telling your clang to look at the system-wide stuff ... and that's terribly wrong.

 - Bruce



--

Marcos Scriven

unread,
Mar 27, 2013, 7:49:51 AM3/27/13
to emscripte...@googlegroups.com
Found the issue.

Had to manually add -nostdinc++  

Bruce Mitchener

unread,
Mar 27, 2013, 7:52:54 AM3/27/13
to emscripte...@googlegroups.com
I will submit that as a pull request shortly after I do some testing ... I have another pull request or two that I'm working on anyway. :)

 - Bruce



On Wed, Mar 27, 2013 at 6:49 PM, Marcos Scriven <mar...@scriven.org> wrote:
Found the issue.

Had to manually add -nostdinc++  

--

Marcos Scriven

unread,
Mar 27, 2013, 8:05:37 AM3/27/13
to emscripte...@googlegroups.com
In tools/shared.py it claims its unnecessary:

  # Note that -nostdinc++ is not needed, since -nostdinc implies that!

Maybe it's a platform specific thing?

Marcos Scriven

unread,
Mar 27, 2013, 8:11:59 AM3/27/13
to emscripte...@googlegroups.com
Anyway, back to the original issue - now I can compile with incoming (using the added option), I still unfortunately get:

/home/marcosscriven/sources/boostipc/boosttest/a.out.js:144806
      throw e;
            ^
wcslen not implemented

The calls to this are precipitated by:

std::size_t namelen  = std::char_traits<CharT>::length(name);

Now despite trying all the Boost macros to turn off wchar and wstring, this still seems to happen. Any guidance on how I might resolve that much appreciated.

Marcos 

Marcos Scriven

unread,
Mar 27, 2013, 8:51:28 AM3/27/13
to emscripte...@googlegroups.com
Working - my fault for copying and pasting a Boost Interprocess example. I needed:

    managed_external_buffer objects_in_static_memory2
       (open_only, &static_buffer2, localSize);

Instead of:

    wmanaged_external_buffer objects_in_static_memory2
       (open_only, &static_buffer2, localSize);

And now of course I don't need wcslen in Javascript library.

Marcos

Jonathan Berling

unread,
Sep 6, 2013, 6:19:50 PM9/6/13
to emscripte...@googlegroups.com
Sorry to side tract this discussion, but were you able to compile protobuf using emscripten? Or are you coding the javascript serialization of protobuf messages by hand?

If you were able to compile protobuf using emscripten, do you have any pointers?

Thanks!

Dave Nicponski

unread,
Jan 29, 2014, 2:51:13 AM1/29/14
to emscripte...@googlegroups.com
Thread necromancy: ARISE!

Sorry, was winding through old threads and came across this one.  I just did the whole protobuffer library --> javascript process yesterday, and i didn't really have major issues.
One thing i _did_ need to do though was to make minor modifications to the atomic_* portions of the codebase (basically, implement the most basic possible primitives, since JS is single threaded) and make that file be linked.

Was there a specific issue you were having?

         -dave-

(Fair disclosure: I actually was emscripting a superset of the protobuffer library called protorpc, easily found via google.  However, for the JS portion, i excluded the RPC additions and only emcc-compiled the protobuffer stuff itself.  Not out-of-the-box, but still no major problems)

Jonathan Berling

unread,
Jan 29, 2014, 12:08:42 PM1/29/14
to emscripte...@googlegroups.com
This is an old thread!

At the time we were just looking at different options. I think we
ended up going with a protobuf <-> JSON converter on our server. It's
nice to know that protobuf / emscripten isn't too much trouble.
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "emscripten-discuss" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/emscripten-discuss/JeLvBMn9X60/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
Reply all
Reply to author
Forward
0 new messages