Migrating Emscripten to use npm mechanism to ship Closure and html-minifier

93 views
Skip to first unread message

Jukka Jylänki

unread,
Dec 15, 2019, 9:31:15 AM12/15/19
to emscripte...@googlegroups.com
Hello all,

the PRs

https://github.com/emscripten-core/emsdk/pull/404,
https://github.com/emscripten-core/emscripten/pull/9989, and
https://github.com/emscripten-core/emscripten/pull/9990

propose migrating Closure compiler and html-minifier to reside outside
the Emscripten tree, and be installed via npm.

The benefits of this change are:
1. Future updates to closure and html-minifier will not bloat up the
size of the Emscripten git repository (each update would increase the
size of the git repository by ~+10% otherwise)
2. Emscripten developers do not need to maintain a CDN and code for
distributing closure/html-minifier, easing CDN and testing burden from
emsdk,
3. Tracking which version of closure and html-minifier we are on
becomes explicit and idiomatic to people familiar with npm community
(can be found in package.json) rather than having to look up git logs,
4. Updating versions become easier, as migrating to a new versions
can be changed into the file package.json

The drawbacks of this change are:
5. developers who follow the "I git cloned all repositories myself"
approach and do not use emsdk need to run "npm install" once in the
Emscripten root directory if they want to use closure or
html-minifier,
6. If npm goes down, it will disrupt emsdk installation

Can people think of other benefits/drawbacks that should influence
this design change?

Cheers,
Jukka

Gabriel Cuvillier

unread,
Dec 15, 2019, 10:37:34 AM12/15/19
to emscripte...@googlegroups.com

Hello,

On CMake-based projects where I want to keep separate the web app code (= several vanilla JS files) from the various emscripten modules that are used (= several emscripten-generated JS/wasm), I conveniently use Closure as provided by Emscripten directly from my CMakefile to bundle/optimize the web app code.

=> CMakeList.txt is looking like:

add_executable(<my_emscripten_module_1> file_1.cpp file_2.cpp --closure 1) # => my_emscripten_module_1.js/wasm
add_executable(<my_emscripten_module_2> file_3.cpp file_4.cpp --closure 1) # => my_emscripten_module_2.js/wasm

add_custom_target(<my_webapp_using_the_emscripten_modules>
  COMMAND "java" "-jar" "$ENV{EMSDK}/upstream/emscripten/third_party/closure-compiler/compiler.jar"
  "--js" "<file_1>.js"
  "--js" "<file_2>.js"
  "--js_output_file" "<my_webapp_using_the_emscripten_module>.js"
  "--language_in=ECMASCRIPT_2017"
  "--language_out=ECMASCRIPT_2017"
  "--compilation_level" "<BUNDLE|SIMPLE|ADVANCED>"
  DEPENDS <my_emscripten_module_1> <my_emscripten_module_2) # => my_webapp_using_the_emscripten_modules.js

Then, to build the app, I just have to do:

make <my_webapp_using_the_emscripten_modules>

And pretty much, that's it, the 3 bundles are created: the web app code (closure optimized/bundled vanilla JS), the emscripten module 1 (closure optimized generated JS+wasm), and emscripten module 2 (closure optimized generated JS+wasm)

I find this very convenient to do, as it allows to do everything from CMake, and without having to use another module bundler (Webpack, and co.) and the full node/npm ecosystem that I simply don't need. I admit it feels a bit old-school/an heterodox way of doing things on the Web (using Make!?), but I find this much more in spirit with the C++ way of building projects.  After all, we are using Emscripten for a reason, isn't it ?

But I suppose that scenario would be feasible also if Closure was provided through npm and not through a .jar directly in Emscripten distribution (I would have to call some npm command for this, right ?).   I can live with a scary 'node_module' folder ;)


Cheers,

gabriel

Jukka Jylänki

unread,
Dec 15, 2019, 1:35:52 PM12/15/19
to emscripte...@googlegroups.com
In that CMake line I can see you are using Emsdk. For you this change
would mean that instead of

COMMAND "java" "-jar"
"$ENV{EMSDK}/upstream/emscripten/third_party/closure-compiler/compiler.jar"

you would issue

COMMAND "$ENV{EMSCRIPTEN}/node_modules/.bin/google-closure-compiler"

I.e. only the location of the closure-compiler file will change. Emsdk
takes care of the npm install line, so emsdk installing a sdk will
still come with Closure preinstalled and available.

However the above path change is in some sense unrelated to this
in-tree -> npm change, but instead a result of Closure getting updated
to a newer version (which we want to do no matter what). In the newer
version of Closure, java is optional. (google-closure-compiler
executable will use it if available)

su 15. jouluk. 2019 klo 17.37 Gabriel Cuvillier
(gabriel....@gmail.com) kirjoitti:
> --
> You received this message because you are subscribed to the Google Groups "emscripten-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to emscripten-disc...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/emscripten-discuss/3e9f7095-dee6-711e-1623-ae715c81c5ce%40gmail.com.

Gabriel Cuvillier

unread,
Dec 16, 2019, 4:24:48 AM12/16/19
to emscripte...@googlegroups.com
Ok, sounds good. I'll test this as soon as it lands

Thanks for the tip !

Alon Zakai

unread,
Dec 17, 2019, 1:33:39 PM12/17/19
to emscripte...@googlegroups.com
Some possible concerns are:

Is npm supported everywhere that the emsdk currently is? I assume it's supported in even more, but it would be good to check.

How hard will supporting npm issues be for the devs here? That is, people will file issues on emscripten/emsdk that are due to npm not being set up right on their machine, or using the wrong version, or it fails due to some npm-specific issue, etc.. For myself personally, I don't use npm daily, so I am not already super-familiar with it, and those error messages and workarounds may not be obvious. Do other people that can respond to github issues have experience with npm?

- Alon


--
You received this message because you are subscribed to the Google Groups "emscripten-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to emscripten-disc...@googlegroups.com.

Gabriel Cuvillier

unread,
Dec 18, 2019, 2:27:48 AM12/18/19
to emscripte...@googlegroups.com
My two cents on this, from the point of view of a daily Emscripten user on various projects :

If this change does not impact end-users of Emscripten, that just want to do C++ development targeting WebAssembly, then this is perfectly fine.   This change would just be an EMSDK "implementation detail", and I agree that from this point of view, removing the Java dependency is interesting.

However, if the users are expected to use Node/npm at some point, then please, please, don't do this :) Not everyone is willing to / can use the Node/npm ecosystem on the projects they are working on.


An alternative suggestion: instead of having to rely on 3rd party distribution system, why not "precompile/bundle" closure (as well as minify-html, or any other 3rd party dep), and use a more classic CDN-approach to deliver these big files.   This would remove the need for a package management system (npm, or even PIP if you have the same idea for Python tools) running on users machine... and potentially breaking for various reasons (network failures, misconfiguration of PATH, weird interaction with system-wide installation of Node/npm, etc..)


Cheers,

Gabriel

Shachar Langbeheim

unread,
Dec 18, 2019, 4:07:40 AM12/18/19
to emscripten-discuss
Gabriel, essentially you propose to you Git LFS, instead? That sounds reasonable.

Gabriel Cuvillier

unread,
Dec 18, 2019, 9:44:59 AM12/18/19
to emscripte...@googlegroups.com

Well... Git LFS is one solution, but maybe not the easiest one to handle for the devs.

But a simple file server with a nice directory structure containing each precompiled/packaged 3rd party dependency is another alternative (just have to be correctly tag/name the folders and files, so that Emsdk can download the correct versions according to the Emscripten release being pulled).  Put it differently, the simplest thing that could possibly work.

By doing so, there is some "reinvent the wheel" syndrome , as npm/pip/etc. are already addressing such kind of needs (manage packages/dependencies/delivery/etc.). The issue is that they are heavily tied to their own respective language/platforms (be it Node, Python, etc.). I wonder if there exist some kind of package management system that is programming language-neutral (and OS neutral also), and only focusing on having a bunch of files to be installed in a particular location according to some spec in a git branch. Basically, this is the need there.

But of course, I am probably not the right person to take any decision on this :)

Jukka Jylänki

unread,
Dec 18, 2019, 3:04:43 PM12/18/19
to emscripte...@googlegroups.com
ke 18. jouluk. 2019 klo 9.27 Gabriel Cuvillier
(gabriel....@gmail.com) kirjoitti:
>
> My two cents on this, from the point of view of a daily Emscripten user on various projects :
>
> If this change does not impact end-users of Emscripten, that just want to do C++ development targeting WebAssembly, then this is perfectly fine. This change would just be an EMSDK "implementation detail", and I agree that from this point of view, removing the Java dependency is interesting.

For emsdk users this is indeed just an implementation detail, as emsdk
install still sets up the SDK functionally like it used to.

> An alternative suggestion: instead of having to rely on 3rd party distribution system, why not "precompile/bundle" closure (as well as minify-html, or any other 3rd party dep), and use a more classic CDN-approach to deliver these big files.

The reason for this is point #1 from the first post, and was discussed
in https://github.com/emscripten-core/emsdk/pull/404#issuecomment-564720935,
basically each update of closure would increase the size of the git
repository by ~30MB/+13% overall download size. And since git tracks
all history, that means it will download all old versions when one git
clones. After a couple of updates of Closure, the size of all the
Closures in the tree history would be a larger portion of the overall
Emscripten git repository than Emscripten bits itself.

Git LFS route would work for this only if Closure and html-minifier
and other tools would provide a single file amalgamation of the whole
toolchain they provide, but unfortunately this is not the case.
Closure and html-minifier are split across hundreds of small files, so
Git LFS cannot be applied.

> This would remove the need for a package management system (npm, or even PIP if you have the same idea for Python tools) running on users machine... and potentially breaking for various reasons (network failures, misconfiguration of PATH, weird interaction with system-wide installation of Node/npm, etc..)

Only if Git LFS was feasible to avoid repository size bloat. Otherwise
it would mean developing and maintaining support into Emsdk + its CDN
backend to serve Closure and html-minifier etc. (which using npm helps
avoid, points #2 and #4)

Emsdk does provide a fixed version of Node.js for all platforms, so
system-wide installations of Node/npm, and also PATH configuration,
should not matter here. Network failures are a possibility, but that
may be a bit of a FUD point - the robustness and reliability of npm
infrastructure is likely better maintained compared to the robustness
of emsdk infra. (there are probably 100x more active users of npm
ecosystem compared to emsdk ecosystem)

Shachar Langbeheim

unread,
Dec 19, 2019, 2:19:29 AM12/19/19
to emscripten-discuss
You can use Git LFS and commit zipped directories, and then during the emsdk installation process unzip them.

--
You received this message because you are subscribed to the Google Groups "emscripten-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to emscripten-disc...@googlegroups.com.

Jukka Jylänki

unread,
Dec 19, 2019, 2:54:56 AM12/19/19
to emscripte...@googlegroups.com
That's right, such a scheme would definitely work. The benefit of that
scheme would be to avoid downloading from npm, which would keep the
repository more self-contained.

Downsides that come to mind:
- It will require asking users to install git-lfs separately.
- Would require developing a bundled installation script in
Emscripten that non-emsdk users are asked to run to uncompress.
(comparable and does not avoid a npm install)
- For npm developers, running npm install is already familiar and
idiomatic, but a custom script is, custom.
- There may be need to ship platform-specific versions of some npm
tools (closure-compiler-win.zip, closure-compiler-macos.zip,
closure-compiler-linux.zip, closure-compiler-arm.zip), or create fat
zips that have all platforms. For closure specifically there is a JS
version, to could use just that instead.
- More maintenance work compared to the npm approach (e.g. on
packaging and versioning)
- Migrating between versions becomes harder for users. With npm if
user wants to try to update to latest closure that the Emscripten tree
does not currently have, they can edit a single line in package.json
and reissue npm install. With a custom zip users would have to figure
out the directory structure and the install/update mechanism that the
custom scheme uses to replace the tool with a new installation.

Though there is no reason why such a "zip in the repository" approach
(or migrating distribution to emsdk) could not be adopted also later.
I think avoiding npm up front for the fear of it possibly being
unreliable is probably not sensible. Since npm is the least work to
use and maintain, I would recommend seeing that path out until it
fails, and then migrating to some other heavier method if npm is not
feasible.
> To view this discussion on the web visit https://groups.google.com/d/msgid/emscripten-discuss/CA%2B_KjGZxDc1YTSPjL-qCTZig13xf_pazZRO5KBAjH8x%3Dt1%3Dquw%40mail.gmail.com.

Gabriel Cuvillier

unread,
Dec 19, 2019, 4:04:47 AM12/19/19
to emscripte...@googlegroups.com
Le 19/12/2019 à 08:54, Jukka Jylänki a écrit :

> It will require asking users to install git-lfs separately

This.

Indeed, that's a good argument against git-lfs: one user-visible
additional requirement.

Shachar Langbeheim

unread,
Dec 19, 2019, 10:42:45 AM12/19/19
to emscripten-discuss
True, but the proposed change is to require the installation of node.js - so there's a user-visible requirement there, too.

--
You received this message because you are subscribed to the Google Groups "emscripten-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to emscripten-disc...@googlegroups.com.

Alon Zakai

unread,
Dec 19, 2019, 12:25:30 PM12/19/19
to emscripte...@googlegroups.com
The emsdk already installs node.js for users (and node.js is necessary to run emcc, for the JS compiler portion), so I don't think that is new. What is being proposed is to also have the emsdk run "npm install" automatically for users, that is, to use npm. That might be visible in some ways - npm output on the commandline, perhaps other things?

Sam Clegg

unread,
Jan 10, 2020, 6:09:17 PM1/10/20
to emscripte...@googlegroups.com
FYI, the emsdk change landed such that `npm install` is now run as a post install step.  Please report any issues with this.

The plan is to let this bake for a week or more before we actually start to rely on the feature in emscripten itself.

This also means that non-emsdk users (those who git clone emscripten directly) will need to start running `npm install` themselves after a git update.


Reply all
Reply to author
Forward
0 new messages