On Fri, Apr 23, 2010 at 5:10 PM, Nick Santos <
nicks...@google.com> wrote:
> On Fri, Apr 23, 2010 at 8:03 PM, Garrett Smith <
dhtmlk...@gmail.com> wrote:
>> +1
>>
>> Problems with GoogCC include:
>> * doesn't build a scope tree (so can't make a full optimization)
>
> i don't know what this means. can you elaborate?
>
In order explain what I mean by scope tree, it is first necessary
understand scope chain. Scope chain and identifier resolution requires
an understanding of execution context and Variable Object (VO).
ECMA-262 editions 3 and 5 are the normative reference for these.
I suppose I can give a brief explanation...
When a function is created, it gets a scope property that is the
calling context's variable object.
When a function is called, a new Variable object (VO) is created. The
VO is populated with the scope chain of that function. When an
identifier is to be resolved in that call, it is resolved against the
VO. If it is not found, the scope of the function is searched.
A diagram of the scope chain follows the example code:
Code:
function a() {
var v = "property of a VO";
return b;
function b() {
return c;
function c(){ alert( v ); }
}
}
a()()();
Diagram:
global
|
a[scope]
|
b[scope]
|
c[scope]
The first call is a call to `a`, "a()". Identifier `a` resolves on the
global object. When function `a` is called an execution context is
entered and a VO is created. Function `b` and variable `v` are added
as properties of the VO and `b` getting a scope of `a`. Next, the two
statements are evaluated:
1) v = "property of a VO"
2) return b;
The first statement resolves identifier `v` on the VO and then assigns
it a value. The second statement resolves identifier `b` on the VO and
returns that to the calling context, which is the global object.
That returned value is used in another CallExpression, i.e. the second
CallExpression in "a()()". The function that is now being called is
the function returned from `a`. A VO is created, identifier `c` is
added to it and passed in the scope chain, which now contains a VO
from the prior call to `b`, the VO from the call to `a`, and the
global object. The function `c` is returned to the calling context,
which is the global object.
The calling context uses that returned value in another
CallExpression, e.g. the third CallExpression in "a()()()". That
function is called. The VO that is created is empty (except for an
unobservable `arguments` object, if that did not get optimized away).
`alert` is called and identifier `v` is resolved up the scope chain.
The call to alert must resolve identifier `v`. It is not found in the
VO of the current execution context, and so the next object in the
scope chain is searched. That is the scope for the function identified
by `c`, which was the VO when `b` was created. That VO has a `c`
property but not a `v` property. The next object in the scope chain is
what was the VO for `a` when it was called. That has two properties:
`b` and `v`. Identifier `v` is resolved there and the Reference to `v`
is returned to `c`. (The base object for v is the activation object
(VO) and the value is "property of a VO")
The execution for `c` completes normally, returning undefined to its
caller (b), which returns undefined to its caller (a), which returns
undefined, completing the stack of execution contexts.
Now that scope chain has been discussed in more detail, a scope tree
should not be a hard concept to grasp.
At the root of every scope chain is the global object. Every scope
chain shares the root. Some scope chains may share scope. In the
example below, when `a` is called, `b` and `c` have the same scope,
passed in from the execution context of the call to `a`.
Code:
function a() {
var q=1, w=2;
return{ b:b, c:c };
function b(){ alert(q); }
function c(){ alert(w); }
}
var anb = a();
Diagram:
global
|
a[scope]
| |
b[scope] c[scope]
All identifiers are used, and so none can be removed, however it is
not difficult to imagine a case where a function could be removed.
The example below was borrowed (with changes) from:
http://code.google.com/p/closure-compiler/issues/detail?id=36#c16
function x(){
var ChocolateFudgeSundae = 3,
crap = 7;
return a;
function a(){
return b();
}
function b(){
return ChocolateFudgeSundae;
}
// function c() { crap++; }
}
x[[scope]]
| |
a[[scope]] b[[scope]]
By looking at the identifiers declared in the body of a and b, some
optimizations can be found.
We know that if x contains an identifier, and that identifier is not
present in any of the functions within x (or nested functions), then
that identifier can be removed.
That identifier is not present in the body of `x`, the body of `a` nor
the body of `b`, then that A warning should be sufficient for that;
that way it is removed by the developer, in full awareness, cleaning
up the original source code (and possibly taking a second look at why
that identifier was there).
e.g.
| WARN: Dead code: identifier `crap` is declared and assigned a value,
but not used.
With Simple Optimizations we can see that `crap` was renamed to `d`:
function x(){function a(){return b()}function b(){return c}var
c=3,d=7;return a};
`ChocolateFudgeSundae` declared in the body of `x` got renamed to `c`
and all references to it (only one, in `b`) were updated. That makes
sense. If that is possible, then was was `d` not removed?
If we remove the reference to identifier `ChocolateFudgeSundae` in
`b`, it can also be determined that the number of references to that
identifier is 0, and so if no calls to eval exist in `x` or any of its
nested functions, recursively, then the identifier can be removed. If
it is possible to lexically analyze FunctionBody for identifiers and
eval (and it is), then that can be done in a BFS-type search of all
functions within.
The example, updated:
function x(){
var ChocolateFugdeSundae = 3,
crap = 7;
return a;
function a(){
return b();
}
function b(){
return;
}
}
Output:
function x(){function a(){return b()}function b(){}var c=3;c=7;return a};
This is totally pointless declaration and assignment of c, followed by
another pointless assignment to `c`. If `crap` can be removed
completely, then so can the assignment expression.
Advanced optimizations results in:
| Your code compiled to down to 0 bytes. Perhaps
| you should export some functions? Learn more
This does not safely remove unused identifiers. Instead, it removes
used identifiers. It is no longer possible to call the function
returned from x.
x()(); // TypeError
Instead, the scope tree should be built internally, so that function
removal can be determined. If it can be determined that `a` returns
the result of calling `b` and that `b` has the same scope as `a`, then
the FunctionBody for `b` can be added inline to where it is used in
`a`, so long as none of the identifiers used in `b` are the same as
those in `a`.
Another example, hinted at prior:
function x(){
var ChocolateFudgeSundae = 3,
crap = 7;
return a;
function a(){
return b();
}
function b(){
return ChocolateFudgeSundae;
}
function c() { crap++; }
}
global
|
x[[scope]
|
+----------+-----------+
| | |
a[[scope]] b[[scope]] c[scope]
c is an unused identifier. Identifier `crap` appears in a dead node of
the scope tree. It can be removed, along with c.
Another optimization that can be done with scope tree is function
calls inlining and identifier inlining.
Take the code below, for example. The first function could be
optimized to intermediate result:-
function x(){
var y = 3;
return a();
function a(){
return y; // result of calling `b` inlined.
}
}
Identifier `y` can be inlined, resulting in:-
function x(){
return a();
function a(){
return 3;
}
}
- and since the VO for `a` does not access any identifiers in the
scope chain, and since its scope is known, it can be inlined, where it
appears within that shared scope.
function x(){
return 3;
}
Such optimizations are possible iff a scope tree is built.
That's a lot of explanation and a lot for you to read. I hope it made sense.
>> * confuses FunctionExpression and FunctionDeclaration (now that's alarming)
>
> this was a small bug, and was fixed a long time ago.
>
Good to know.
>> * removes conditional comments (another topic, but still a problem).
>
> i don't know of any JS compressor that preserves conditional comments
> correctly in all situations. If you can name one, then I'd love to see
> it.
>
I have used JScript conditional comments only in one place, and only
to identify older version of IE to do an event handler purge onunload
(doesn't break bfcache in good browsers) Checking two ScriptEngine
properties could work, e.g.
var scriptEngineVersion = +(ScriptEngineMajorVersion() + "."
+ScriptEngineMinorVersion());
- and it's not awful, but I did not want to have to change my code to
make closure compiler happy.
[...]