Experiments in minifying generated Elm code

929 views
Skip to first unread message

Erik Simmler

unread,
Mar 29, 2017, 10:11:13 PM3/29/17
to elm-dev
I've been experimenting on some compiled Elm code in attempt to optimize final file size as much as practical (https://github.com/tgecho/babel-plugin-elm-pre-minify)

The short version is that I was able to improve on vanilla uglify or closure results from 15% to over 40%. I did this by using a Babel plugin to surgically restructure some of the code based on assumptions I was able to make about how Elm's output is structured (pure functions, etc...). For example:

## elm-todomvc
file                  raw     gzip   zopfli
original.js           228750  43995  41480
vanilla-uglify.js     80394   23163  22276
vanilla-closure.js    71191   21981  21186
optimized-closure.js  60971   18574  17919
optimized-uglify.js   39882   12773  12319

There are two main techniques I used:

IIFE Unwrapping
Minifiers won't eliminate any expressions with potential side effects. Even accessing properties isn't safe due to getters.

var couldBeUnwrapped = function() {
 
return {exposedFunction: function() {}};
}();
var referenceToWrappedFunction = couldBeUnwrapped.exposedFunction;

Flattens out to allow minifiers to eliminate any unused functions/variables.

var couldBeUnwrapped = {},
 couldBeUnwrapped$iife_public$exposedFunction
= function() {};
var referenceToWrappedFunction = couldBeUnwrapped$iife_public$exposedFunction;


Pure function annotation
Elm uses the F2..9 functions as internal helpers, but minifiers see them as a potential source of side effects, and so won't eliminate any expressions they are used in. Uglify recently added support for annotating a function call as pure.

var potentiallyUnusedVar = /* #__PURE__ */F2(someFunc);


Feedback welcome! I'd be especially curious if anyone thinks this specific tool is actually a good idea! :)  I'm not sure I feel comfortable with it as a long term approach, but I hope some of what I've learned might prove useful in future work on dead code elimination and other size related optimizations. More info (including results from a few other projects) is in the repo: https://github.com/tgecho/babel-plugin-elm-pre-minify

Evan Czaplicki

unread,
Mar 29, 2017, 10:28:27 PM3/29/17
to elm-dev
I heard about your project and results through Richard recently. Very interesting and exciting results!


General Note

I can make code gen different. 0.19 actually requires some changes for bundling things, so this is an excellent time to discuss alternative strategies.

To help, can you share generated JS snippets that demonstrate the changes you are making? For example, what would the JS for this code look like with your project. (I don't know what snippet is an ideal example. I'd just scan through the JS you generate and find a snippet that seems instructive.)


Questions

This blog post outlines a bunch of flags that can help you get better compression. Interesting ones include pure_getters and pure_funcs. Do people use the pure_getters one? Does that help with the IIFE situation?

Also, is there documentation on the /* #__PURE__ */ annotation anywhere? I couldn't find it anywhere!

Could the pure_funcs flag take the place of those PURE comments?

--
You received this message because you are subscribed to the Google Groups "elm-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elm-dev+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elm-dev/6c464bf3-c279-4626-955b-e745f0f468c6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Evan Czaplicki

unread,
Mar 29, 2017, 10:57:55 PM3/29/17
to elm-dev
It seems like you are getting pretty good benefits from extracting things from the IIFEs used in Native code. They could just not have IIFEs in many cases. The reason they have them is mostly so that I could be more relaxed about variable names. What do you think of this strategy?


Simple Strategy

The compiler could be changed such that native code switches like this:

// OLD
var _elm_lang$core$Native_String = function() { ... }();

// NEW
var _elm_lang$core$Native_String$reverse = ...;
var _elm_lang$core$Native_String$foldl = ...;
...

Would that give the kind of benefits you are seeing? I was already thinking of doing this in 0.19, but there are some problems with the idea I'm still sorting out. 


Question

Say the code instead looked like this:

var root = window; // = {} in bundles
root._elm_lang$core$Native_String$reverse = ...;
root._elm_lang$core$Native_String$foldl = ...;
...

Would that get optimized the same way? This style could work better with bundling, but I sense there's a trick that could make it unnecessary.



Note to other posters: this thread is about OP, not a Q&A with me about 0.19. Remember!

Maxime Dantec

unread,
Mar 30, 2017, 3:35:43 AM3/30/17
to elm-dev
The source code of uglify seems to imply that the __pure__ annotation is the only one : https://github.com/mishoo/UglifyJS2/blob/7bea38a05dbe357434001fe59dbe06bb659a585f/lib/compress.js#L1576

After that code block, stands the code about effects detection, and before what to eliminate, still reading.
To unsubscribe from this group and stop receiving emails from it, send an email to elm-dev+u...@googlegroups.com.

Erik Simmler

unread,
Mar 30, 2017, 9:19:41 AM3/30/17
to elm-dev
The only change my code makes to that Dict.size/sizeHelp code is to add the /*#__PURE__*/ annotation. Without it, Uglify would keep sizeHelp even if size wasn't used, since it didn't understand that F2 doesn't have side effects.

var _elm_lang$core$Dict$sizeHelp = /*#__PURE__*/F2(function (n, dict) { ... truncated ... });
var _elm_lang$core$Dict$size = function (dict) {
 
return A2(_elm_lang$core$Dict$sizeHelp, 0, dict);
};

The __PURE__ annotation is pretty new and not particularly documented. Here's the pull request where they added it along with links to some of the Uglify/Typescript issues that prompted the change: https://github.com/mishoo/UglifyJS2/pull/1448

Here's an obnoxiously long gist with full samples from elm-todomvc: https://gist.github.com/tgecho/cb6f1364874df7bd1ae45d811719566b
uglify-beautified is my plugin + uglify without the mangle option so you can see what it actually eliminated.

pure_getters and pure_funcs

These options are exactly what I experimented with initially. You can feed a bunch of names into pure_funcs (specifically F2...9) to achieve some of the savings. pure_getters helps a bit as well, but only to eliminate the expressions that refer to the object created by the IIFE. The IIFE itself (and any values referenced by the object it returns) will be retained.

The other problems with using these options is that they get awkward fast, as you need to maintain and feed a whitelist into Uglify. I actually do that in the tool right now as a pragmatic stopgap. When I run it on a new example project, I look for the "Side effects found in unused ..." warnings from Uglify and whitelist the function if appropriate. This is clearly not scalable. Also, these options apply to the entire input file, so you could potentially break any accompanying Javascript/libraries if they're part of the same payload being minified.


Native code

Much of the savings do come from helping minifiers eliminate unused native code. The JS compiled from actual Elm is already fairly flat, though the F2..9 helpers are still a stumbling block.

Here's a representative partial example from Native.String:

var _elm_lang$core$Native_String = function() {
function isEmpty(str)
{
 
return str.length === 0;
}
function cons(chr, str)
{
 
return chr + str;
}
return {
 isEmpty
: isEmpty,
 cons
: F2(cons),
}
}();

var _elm_lang$core$String$cons = _elm_lang$core$Native_String.cons;
var _elm_lang$core$String$isEmpty = _elm_lang$core$Native_String.isEmpty;

After
function _elm_lang$core$Native_String$iife_private$cons(chr, str) {
 
return chr + str;
}
function _elm_lang$core$Native_String$iife_private$isEmpty(str) {
 
return str.length === 0;
}
var _elm_lang$core$Native_String = {},
    _elm_lang$core$Native_String$iife_public$isEmpty
= _elm_lang$core$Native_String$iife_private$isEmpty,
    _elm_lang$core$Native_String$iife_public$cons
= /*#__PURE__*/F2(_elm_lang$core$Native_String$iife_private$cons);
var _elm_lang$core$String$cons = _elm_lang$core$Native_String$iife_public$cons;
var _elm_lang$core$String$isEmpty = _elm_lang$core$Native_String$iife_public$isEmpty;

Your simple strategy should achieve essentially the same thing. Unfortunately, what is ideal for the minifier is much less ergonomic for the writer :)

I believe the `root._elm_lang$core$Native_String$reverse` version would not be effective. All of the values referenced by the root/bundle object will be kept alive by their membership, and any references to them from elsewhere will keep the `root.foo` expressions alive since getters can potentially be functions in JS. I'm not aware of any minifiers that do the flow analysis required to successfully handle this sort of case.

Sorry this is a bit scattered since I need to run, but I'd be happy to post more complete examples this evening. There is a Makefile in the examples dir to generate all of the examples and stats yourself, but I can also pull out any sort of excerpts you're interested in.
To unsubscribe from this group and stop receiving emails from it, send an email to elm-dev+u...@googlegroups.com.

Evan Czaplicki

unread,
Mar 30, 2017, 3:08:05 PM3/30/17
to elm-dev
Thanks for answering all my questions Erik!

The code snippets also helped me understand that flag better. It seems odd that you can't just annotate the function definition, rather than every single call site. ¯\_(ツ)_/¯

* * *

At the end of my last mail I said "This style could work better with bundling, but I sense there's a trick that could make it unnecessary."

I figured out the trick :D So I think it'll be possible to just get rid of the IIFEs in native code. I'll report back in a status report if it works out!

To unsubscribe from this group and stop receiving emails from it, send an email to elm-dev+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elm-dev/fc0ef547-b011-4553-9adc-540892021973%40googlegroups.com.

Erik Simmler

unread,
Mar 30, 2017, 8:40:17 PM3/30/17
to elm-dev
My pleasure!

Yes, the pure flag is definitely a little unergonomic from the code writer/generator side. :/

For what it's worth, elm-todomvc gzips down pretty well if you only mark the F2..9 functions, so that would be an easy win if there isn't a nice way to annotate more broadly.

elm-todomvc gzipped
23163 original
13262 only F2..9 marked as pure
12773 full whitelist marked as pure

The biggest project I've run it on is Kite, and the proportionate difference is very similar.

kite
88124 original
76763 F2..9
74854 full whitelist

I want to do a bit more study of how well third party Elm only libraries minify down when you only use a few functions.

Anyway... just some data to chew on. Let me know if there's anything in particular I can get for you!

Zachary Kessin

unread,
Apr 2, 2017, 1:36:21 AM4/2/17
to elm...@googlegroups.com
It would be nice if at some point in the future there was a command "elm make-mini" which would do this automatically if the correct packages were installed. 

Zach

To unsubscribe from this group and stop receiving emails from it, send an email to elm-dev+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elm-dev/b10b193c-9239-4aae-897b-591a812200a9%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Zach Kessin
Skype: zachkessin

Evan Czaplicki

unread,
Apr 3, 2017, 6:48:17 PM4/3/17
to elm-dev
Erik, I am testing out different code gen theories. Right now I have all the JS code for core and virtual-dom taken out of IIFEs so that they can be eliminated more easily by uglify and closure compiler. From there, I am using the following command:

uglifyjs code.js -mc 'pure_funcs="F2,F3,F4,F5,F6,F7,F8,F9"' > mini.js

This seems to work well, stripping out a bunch of extra stuff. From there, I have two questions for you:
  • I don't understand the full list of names you linked. For example, what is the benefit of having _elm_lang$core$Dict$RBEmpty_elm_builtin on that list?
  • Once things are out of the IIFE, is there a significant difference between using the pure_funcs flag and actually rewriting the JS?


To unsubscribe from this group and stop receiving emails from it, send an email to elm-dev+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elm-dev/b10b193c-9239-4aae-897b-591a812200a9%40googlegroups.com.

Erik Simmler

unread,
Apr 3, 2017, 8:33:32 PM4/3/17
to elm-dev
Once I got the basics in place, my workflow was to run samples through uglify and note all of the "WARN: Side effects in initialization of unused variable ..." it emitted. In all but a few very specific cases, this message is because a pure function is called to create the value. Whitelisting that function allows the entire statement (and dependencies such as the function) to be pruned. _elm_lang$core$Dict$RBEmpty_elm_builtin is a particularly poor example, as it only saves one line. :)

_elm_lang$html$Html$node is a somewhat better example. As I said earlier, the savings from all of this extra whitelisting seems to be small compared to at least marking the F2..9 functions. I've mostly just been pushing to see how far this can go without more invasive (and risky) rewriting.

As far as I can tell, the only difference between the pure_funcs flag and the annotation is that pure_funcs is global to the entire input file, which means it would apply to non Elm javascript you've bundled together. I think it's extremely improbable that this would be a problem, but it is worth noting.

Reply all
Reply to author
Forward
0 new messages