Compiling with MAIN_MODULE=1 pulling in more than expected.

372 views
Skip to first unread message

Paul Austin

unread,
Aug 18, 2015, 5:38:38 PM8/18/15
to emscripten-discuss

Hi, 


I’m Trying to use shared libraries and looking at the instructions on MAIN_MODULE=1 and SIDE_MODULE=1. The basics seem to be woking, but the main module got much larger than expected. It jumped from 800k to 2.7 meg. So I’ll looking to determine If i am missing one of the options, or using in correctly.


I reduced it to a simple example  and tried to make a small *.js output file following the tips in the maximally minimal discussion thread 


#include <stdio.h>

int main()

{

    puts("hello world\n");

}


Building that program with no optimization yields about 360k. If the standard set of options for smaller builds is used that can shrink down to about 56k, roughly the size mentioned in the other thread.  This is the command line used:


emcc linktest.c -Os --memory-init-file 0 -s NO_FILESYSTEM=1 -s NO_BROWSER=1 -o linktest.js


When adding -s MAIN_MODULE=1, the size bumped to 890k, (1.7meg it optimization is omitted). The side modules I want to build will have minimal need for standard libraries beyond what main uses  so I’d still like to still trip out most of the standard library. Looking at the the linktest.js file I can see that larger file has SDL, and many parts (all??) of the the standard C and C++ libraries.


I was able to get it a bit smaller  (790k) by setting the following environment variables. 


EMCC_FORCE_STDLIBS=libc

EMCC_ONLY_FORCED_STDLIBS=1

 

As a check I  also tried:


EMCC_FORCE_STDLIBS=1.


That bumped it to 3.7Meg, Well that pulled in the entire standard set (as expected),  the flags are having an effect.


So, is there a way to strip out unused symbol in the main module but still allow some dlopen() functionality?

Thank you,

Paul

Alon Zakai

unread,
Aug 18, 2015, 5:51:31 PM8/18/15
to emscripten-discuss
Yes, that's all correct. The issue is, as you said, that we pull in standard libraries and cannot do dead code elimination on them, because you might dlopen a library that calls printf, and if you didn't have printf in the main file, that wouldn't work. So even a small program generally will have all of libc included.

We could add an option that says "do dead code elimination normally, any dynamically-linked code will not expect anything to be linked to it." But I worry that wouldn't be very useful. The dlopened or dynamically-linked library would only be able to call code that exists in itself, nothing outside. The only way for it to call anything outside of it would be if you passed it a function pointer. Otherwise, it would be entirely cut off from the world, a library that the main file can call into, but that's it.

Would that still be useful for you? Curious about the use cases here.


--
You received this message because you are subscribed to the Google Groups "emscripten-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to emscripten-disc...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Paul Austin

unread,
Aug 18, 2015, 6:09:49 PM8/18/15
to emscripte...@googlegroups.com
The use case is a core run time that has several math packages. The external packages don't use fileio, or much in the way of strings either. The main dependencies are some standard math functions, malloc and free. 

We could add an option that says "do dead code elimination normally, any dynamically-linked code will not expect anything to be linked to it." But I worry that wouldn't be very useful. The dlopened or dynamically-linked library would only be able to call code that exists in itself, nothing outside. The only way for it to call anything outside 

Does that mean the dynamically loaded code could not link to symbols that were not stripped either. In other words does every thing need to go through function pointers?  This is more of a internal build process so some limitations are OK, for mobile apps the smaller size would be nice.  I'll be glad to explain more if it helps. Emscripten has been working very well for us.

I saw the other post on dynamic linking optimizations. That's great.

- Paul 

--
You received this message because you are subscribed to a topic in the Google Groups "emscripten-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/emscripten-discuss/nz2rXAp_vuQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to emscripten-disc...@googlegroups.com.

Paul Austin

unread,
Aug 18, 2015, 7:42:36 PM8/18/15
to emscripten-discuss
Looking at  it some more, there is a simpler question, I can see a good chunk of the js file it is SDL, code that that is not there if MAIN_MOULE=1 is omitted

The following environment variables are set.

EMCC_FORCE_STDLIBS=libc
EMCC_ONLY_FORCED_STDLIBS=1

Should SDL still be there? I suspect I'm just missing a setting.

Alon Zakai

unread,
Aug 18, 2015, 8:28:36 PM8/18/15
to emscripten-discuss
Hmm, actually, what happens to be present in the main file is still accessible to the libraries. So if you carefully make sure that those methods (malloc, free, math functions) are also used in the main file, the library would be able to use them. So even if we do normal dead code elimination in the main file, libraries can use system library code.

But, it would be the user's responsibility to make sure that what the libraries need is in the main file, either by using them (and making sure the optimizer doesn't inline them out), or EXPORTED_FUNCTIONS or some other method. This sounds a little bug-prone, but I suppose it could be done.

If that sounds useful I can add an option for it.


Alon Zakai

unread,
Aug 18, 2015, 8:30:31 PM8/18/15
to emscripten-discuss
I suspect that is SDL JS code from src/library_sdl.js, and not compiled SDL2 code? We do include all the JS libraries by default when dynamic linking is present. In the proposed option to allow dead code elimination in the main file, we could also disable that when the option is enabled. (And, if a library needs something from a JS system library, it would be the user's responsibility to make sure it was linked in.)

Alon Zakai

unread,
Aug 18, 2015, 11:41:23 PM8/18/15
to emscripten-discuss
Thinking about this some more, this does seem very useful. I added it as an option on incoming now: building with -s MAIN_MODULE=2 (instead of 1) creates a main module that has normal dead code elimination turned on. EXPORTED_FUNCTIONS can of course be used to keep things alive that the child needs. There is a test (other.test_minimal_dynamic) showing this in action.

Paul Austin

unread,
Aug 19, 2015, 5:16:14 PM8/19/15
to emscripten-discuss

Great, I did the emsdk update/install/activate for clang-incoming-64bit and emscripten-incoming-64bit. And "-s MAIN_MODULE=2" works well. App shrunk from 2.7meg to 1.1 meg
looking at the js I can see that C++ mangled names are left intact, I can see why. For my project its was easy to configure most of the functions to be extern "C" linkage to disable mangled names for those symbols. That shaved another 50k off. I woudn't be hard to collapse  those further, I may try that later, but this works very well.

Thank You.


Alon Zakai

unread,
Aug 19, 2015, 5:55:28 PM8/19/15
to emscripten-discuss
Great!


Reply all
Reply to author
Forward
0 new messages