Extremely large (28M) .mem file

69 views
Skip to first unread message

Dan Vanderkam

unread,
Nov 9, 2015, 11:14:14 AM11/9/15
to emscripten-discuss
I've had good luck getting dds-bridge to build using emscripten. I can run it in the browser just great.

My main issue is that emcc generates a 28MB .mem file which it has to load via XHR to initialize. Based on how well this file gzips (28MB→30k), I suspect that it consists almost entirely of zeros. How can I debug why such a large file is being generated? The equivalent executable built with g++ is only ~86k.

Here are steps to repro:

$ git clone https://github.com/danvk/dds.git
$ cd dds/src
$ emcc --closure 1 -s TOTAL_MEMORY=$((256 * 1024 * 1024)) -O2 dds.cpp ABsearch.cpp ABstats.cpp CalcTables.cpp DealerPar.cpp Init.cpp LaterTricks.cpp Moves.cpp Par.cpp PlayAnalyser.cpp PBN.cpp QuickTricks.cpp Scheduler.cpp SolveBoard.cpp SolverIF.cpp Stats.cpp Timer.cpp TransTable.cpp next-plays.cc -o out.html -s EXPORTED_FUNCTIONS="['_solve']"
$ ls -lh out*
  101K out.html
   28M out.html.mem
  320K out.js
$ gzip -c out.html.mem | wc -c
   31098

Thanks for such an amazing tool!

  - Dan

Jukka Jylänki

unread,
Nov 9, 2015, 11:23:57 AM11/9/15
to emscripte...@googlegroups.com
If I remember correctly, llvm-nm should show the data sizes as well? Try doing "-o out.bc", and llvm-nm on the out.bc file.

--
You received this message because you are subscribed to the Google Groups "emscripten-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to emscripten-disc...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Dan Vanderkam

unread,
Nov 9, 2015, 11:51:57 AM11/9/15
to emscripte...@googlegroups.com
Do I need to use a special version of llvm-nm?

$ emcc --closure 1 -s TOTAL_MEMORY=$((256 * 1024 * 1024)) -O2 dds.cpp ABsearch.cpp ABstats.cpp CalcTables.cpp DealerPar.cpp Init.cpp LaterTricks.cpp Moves.cpp Par.cpp PlayAnalyser.cpp PBN.cpp QuickTricks.cpp Scheduler.cpp SolveBoard.cpp SolverIF.cpp Stats.cpp Timer.cpp TransTable.cpp next-plays.cc -o out.bc -s EXPORTED_FUNCTIONS="['_solve']"
$ brew install llvm
$ /usr/local/opt/llvm/bin/llvm-nm out.bc
error: Invalid record

You received this message because you are subscribed to a topic in the Google Groups "emscripten-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/emscripten-discuss/nZz2tjepBPY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to emscripten-disc...@googlegroups.com.

Alon Zakai

unread,
Nov 9, 2015, 1:21:50 PM11/9/15
to emscripten-discuss
Yes, you need to use the same llvm-nm as came with the llvm that built the bitcode. It should be in the same directory as clang, clang++ etc. from the emscripten fastcomp install.

Dan Vanderkam

unread,
Nov 9, 2015, 1:32:06 PM11/9/15
to emscripten-discuss
Now we're getting somewhere:

$ /usr/local/Cellar/emscripten/1.34.12/libexec/llvm/bin/llvm-nm --print-size out.bc
         D AB_ptr_list
         D AB_ptr_trace_list
         T AnalyseAllPlaysBin
         T AnalyseAllPlaysPBN
         T AnalysePlayBin
...

Unfortunately, I don't get sizes or addresses, just the types of symbols. Is there an emcc or llvm-nm flag I need to set to include addresses?

Alon Zakai

unread,
Nov 9, 2015, 1:44:55 PM11/9/15
to emscripten-discuss
Hmm, the help says there is  -print-size  but it prints all zeros for me, not sure why.

In any case, you can do llvm-dis on the bitcode and look at the globals. Probably there are some very large zeroinit globals (maybe something like      @a = [i8 x 100000000] zeroinit        ). If that's the issue here, we could probably optimize those better in the backend.

Jukka Jylänki

unread,
Nov 9, 2015, 2:57:24 PM11/9/15
to emscripte...@googlegroups.com
Yes, try the version of llvm-nm that is part of the Emscripten fork of LLVM. (emsdk/clang/fastcomp/build_incoming_64/bin/llvm-nm).

Jukka Jylänki

unread,
Nov 9, 2015, 2:58:39 PM11/9/15
to emscripte...@googlegroups.com
Oh sorry, ignore the above, gmail was not showing the rest of the communication, I see now that you got further in diagnosing.

Dan Vanderkam

unread,
Nov 9, 2015, 3:26:24 PM11/9/15
to emscripte...@googlegroups.com
Using llvm-dis, I can see some of the zero-initialized global symbols:

$ /usr/local/Cellar/emscripten/1.33.0/libexec/llvm/bin/llvm-dis out.bc
$ grep 'zeroinitializer' out.ll
@relRank = global [8192 x [15 x i8]] zeroinitializer, align 1  // ~122k
@winRanks = global [8192 x [14 x i16]] zeroinitializer, align 2  // ~229k
@groupData = global [8192 x %struct.moveGroupType] zeroinitializer, align 4  // ~950k
@maskBytes = global [8192 x [4 x [4 x i32]]] zeroinitializer, align 4 // ~524k

...
%struct.moveGroupType = type { i32, [7 x i32], [7 x i32], [7 x i32], [7 x i32] }

Those were the largest symbols by far but, combined, they only account for ~2MB of zeros. It would be nice if llvm-dis (or llvm-nm) could just print out the total size of each of these!

Dan Vanderkam

unread,
Nov 9, 2015, 3:36:09 PM11/9/15
to emscripte...@googlegroups.com
Aha! This was the culprit:

@localVar = global [16 x %struct.localVarType] zeroinitializer, align 8

sizeof(localVarType) = 1688368 (~1.6MB), so this accounts for the lion's share of the memory. The 16 is a max_threads constant, so if I change that to 1 (I assume asm.js code is single-threaded?) and allocate the local var dynamically, I should get a much lighter .mem file.

Thanks for the help,
  - Dan

Dan Vanderkam

unread,
Nov 10, 2015, 10:28:02 AM11/10/15
to emscripte...@googlegroups.com
I filed an issue against emscripten to be more efficient with zero-initialized values in the .mem file:
Reply all
Reply to author
Forward
0 new messages