Global "this" wart. How to proceed?

23 views
Skip to first unread message

Mark Miller

unread,
May 17, 2008, 8:48:15 PM5/17/08
to Google Caja Discuss
As the spate of recent bug reports makes clear, we are not remotely
close to correctly supporting "this" at outer scope as a means to
address the per-plugin IMPORTS___ object. Since this wart isn't yet
working, it's not clear that we need to support it even temporarily,
as it isn't yet part of our own "legacy". Should we fix these recent
"this" bugs by making this wart work, or by excising it? (Btw, I
volunteer to do the excising.)

Btw, I will miss the Monday morning meeting, but I'll be in Monday afternoon.

--
Text by me above is hereby placed in the public domain

Cheers,
--MarkM

ihab...@gmail.com

unread,
May 17, 2008, 8:55:00 PM5/17/08
to google-ca...@googlegroups.com
On Sat, May 17, 2008 at 5:48 PM, Mark Miller <eri...@gmail.com> wrote:
As the spate of recent bug reports makes clear, we are not remotely
close to correctly supporting "this" at outer scope as a means to
address the per-plugin IMPORTS___ object. Since this wart isn't yet
working, it's not clear that we need to support it even temporarily,
as it isn't yet part of our own "legacy".

By the "wart", I take it you mean the fact (and all its downstream implications) that references to global variables, even those which are declared using a "var", are assigned to IMPORTS___?

Should we fix these recent "this" bugs by making this wart work, or by excising it? (Btw, I
volunteer to do the excising.)

Curved Metz in my right hand, DeBakeys in my left, suction please.

Ihab

--
Ihab A.B. Awad, Palo Alto, CA

Mark Miller

unread,
May 17, 2008, 9:00:29 PM5/17/08
to google-ca...@googlegroups.com
On Sat, May 17, 2008 at 5:55 PM, <ihab...@gmail.com> wrote:
> By the "wart", I take it you mean the fact (and all its downstream
> implications) that references to global variables, even those which are
> declared using a "var", are assigned to IMPORTS___?

No. I like these proposed cleanups as well, for which I'll be making
some further suggestions. But here, I was referring only to the
specific issue that we currently allow "this" used at global scope to
reify the IMPORTS___ object itself. Instead, we should statically
reject all such uses of "this" with an error message that explains how
they should fix their code.


> Curved Metz in my right hand, DeBakeys in my left, suction please.

;)

ihab...@gmail.com

unread,
May 17, 2008, 9:02:58 PM5/17/08
to google-ca...@googlegroups.com
On Sat, May 17, 2008 at 6:00 PM, Mark Miller <eri...@gmail.com> wrote:
I was referring only to the specific issue that we currently allow "this" used at global scope to reify the IMPORTS___ object itself.

Ah. Yes, that's clearly a malignant growth and we should take care of it while it's still operable.

Instead, we should statically reject all such uses of "this" with an error message that explains how they should fix their code.

No complaint from me.

Ihab

Mike Stay

unread,
May 17, 2008, 9:04:24 PM5/17/08
to google-ca...@googlegroups.com

Mark Miller

unread,
May 17, 2008, 9:50:57 PM5/17/08
to google-ca...@googlegroups.com
On Sat, May 17, 2008 at 6:00 PM, Mark Miller <eri...@gmail.com> wrote:
> On Sat, May 17, 2008 at 5:55 PM, <ihab...@gmail.com> wrote:
>> By the "wart", I take it you mean the fact (and all its downstream
>> implications) that references to global variables, even those which are
>> declared using a "var", are assigned to IMPORTS___?
>
> No. I like these proposed cleanups as well, for which I'll be making
> some further suggestions. [...]

I just reread David-Sarah's message of 5/16, "Re: [Caja] Confinement
in Caja at finer than module granularity (was: Change from
___OUTERS___ to IMPORTS___)", which I'm distressed to be unable to
find in the google-caja-discuss archive. In any case, the further
suggestions I have in mind agree with his message on all points. I
include his message in full below.

One further change we should make, that would both provide greater
safety, and simplify the cajoling rules. If 'foo' is used freely in
the module, rather than rewriting 'foo' to 'IMPORTS___.foo', we should
generate a prelude at the beginning of the module function like

___.loadModule({
moduleFunction: function(___, IMPORTS___) {
var foo = IMPORTS___.foo;
// ... and likewise for all other free variables ...

// translated module body
},
importNames: {foo: {}, ...}
});

Within the translated body of the module function, 'foo' can just
translate to 'foo', as we do for other variables. We should then be
able to remove the expandReferenceToOuters logic from the translator.


---------- Forwarded message ----------
From: David-Sarah Hopwood <david....@industrial-designers.co.uk>
Date: Fri, May 16, 2008 at 9:41 AM
Subject: [Caja] Confinement in Caja at finer than module granularity
(was: Change from ___OUTERS___ to IMPORTS___)
To: google-ca...@googlegroups.com

ihab...@gmail.com wrote:
> On Wed, May 14, 2008 at 12:27 PM, David-Sarah Hopwood wrote:
>> ihab.a...@gmail.com wrote:
>>> On Fri, May 9, 2008 at 9:17 PM, David-Sarah Hopwood wrote:
>>
>>>> IMPORTS___ is mutable, correct? In which case, it is acting as a global
>>>> scope for the module rather than just as a way of obtaining imports. So
>>>> the name is slightly misleading IMHO.
>>>
>>> It's not directly mutable by the module; it's where the module code's free
>>> variables are rewritten to come from. We are going to change the rewriter so
>>> 'var' declarations at the outer scope do *not* refer to IMPORTS___. This is
>>> a change we are doing slowly, piece by piece.
>>
>> OK. This seems like quite a significant spec change. Should we assume
>> that rewrite rules (34), (35), (36) and (41) [from page 19 of
>> caja-spec-2008-01-15.pdf] are no longer applicable?
>
> (34) - If "glob" is a free variable of the block of Caja code,

(Minor nit: "free identifier" is more precise here.)

> it will still be rewritten to point to IMPORTS___. Otherwise, it will not.

That means that IMPORTS___ is still necessarily mutable by the module.
For example:

<trunk/src/java/com/google/caja/demos/lolcat-search/slides.html>
# <code> location = '//evil.com/'; </code>
# ==>
# <code> IMPORTS___.location = '//evil.com/'; </code>

Why does this need to be allowed? That is, why is it not statically
rejected rather than rewritten?

If it were statically rejected, then that would remove the most
important language-level obstacle preventing confinement at the
granularity of sets of objects. At the moment, Caja can only support
confinement at module granularity, because there is a mutable scope
that is shared across all code in a module. For example, you cannot
write a system equivalent (in its security and reviewability
properties) to the sash powerbox demo:
<http://www.skyhunter.com/marcs/emily.pdf>
in Caja without using several modules.

(Of course, this only helps if the provider of IMPORTS___ ensures
that it is deeply immutable. The issue here is whether this aspect of
the language prevents or allows fine-grained confinement, not whether
confinement is guaranteed. There are also library-level changes that
would be needed in order to allow a module to specify a powerbox that
can obtain authority that should not be granted directly to the rest
of the module -- but these would not affect the language or the
rewriting rules.)

Note that if a module requires a module-scoped variable to be mutable,
it can easily define it as a var. It can do that even if the same
identifier is also defined in IMPORTS___ and the original value of
that variable is needed:

var originalFoo = Foo; // 'originalFoo' is binding, 'Foo' is free
(function() {
var Foo; // 'Foo' is binding
...Foo... // 'Foo' is bound
...originalFoo... // 'originalFoo' is bound
})();

So this change would not significantly affect expressiveness; it would
just require mutability of module-scoped variables to be made explicit.

--
David-Sarah Hopwood

Mark Miller

unread,
May 17, 2008, 11:45:33 PM5/17/08
to google-ca...@googlegroups.com
On Sat, May 17, 2008 at 6:50 PM, Mark Miller <eri...@gmail.com> wrote:
> One further change we should make, that would both provide greater
> safety, and simplify the cajoling rules. If 'foo' is used freely in
> the module, rather than rewriting 'foo' to 'IMPORTS___.foo', we should
> generate a prelude at the beginning of the module function like
>
> ___.loadModule({
> moduleFunction: function(___, IMPORTS___) {
> var foo = IMPORTS___.foo;

A fly in the ointment. That should of be

var foo = ___.readPub(IMPORTS___, 'foo');

An unfortunate consequence of this simpler rewrite is that an
unresolved free variable will become an actual variable with value
undefined. This differs from JavaScript's behavior of throwing. In
other words, we'd lose the distinction currently supported by the
optional third argument to readPub. We can, of course, regain the
distinction with extra bookkeeping. But then it's not clear that this
would remain enough of a simplification to be worth it.

David-Sarah Hopwood

unread,
May 18, 2008, 2:14:34 AM5/18/08
to google-ca...@googlegroups.com
Mark Miller wrote:
> On Sat, May 17, 2008 at 6:00 PM, Mark Miller <eri...@gmail.com> wrote:
>> On Sat, May 17, 2008 at 5:55 PM, <ihab...@gmail.com> wrote:
>>> By the "wart", I take it you mean the fact (and all its downstream
>>> implications) that references to global variables, even those which are
>>> declared using a "var", are assigned to IMPORTS___?
>> No. I like these proposed cleanups as well, for which I'll be making
>> some further suggestions. [...]
>
> I just reread David-Sarah's message of 5/16, "Re: [Caja] Confinement
> in Caja at finer than module granularity (was: Change from
> ___OUTERS___ to IMPORTS___)", which I'm distressed to be unable to
> find in the google-caja-discuss archive. In any case, the further
> suggestions I have in mind agree with his message on all points. I
> include his message in full below.
>
> One further change we should make, that would both provide greater
> safety, and simplify the cajoling rules. If 'foo' is used freely in
> the module, rather than rewriting 'foo' to 'IMPORTS___.foo', we should
> generate a prelude at the beginning of the module function like
>
> ___.loadModule({
> moduleFunction: function(___, IMPORTS___) {
> var foo = IMPORTS___.foo;
> // ... and likewise for all other free variables ...

[later corrected to


var foo = ___.readPub(IMPORTS___, 'foo');

]

> // translated module body
> },
> importNames: {foo: {}, ...}
> });
>
> Within the translated body of the module function, 'foo' can just
> translate to 'foo', as we do for other variables. We should then be
> able to remove the expandReferenceToOuters logic from the translator.

Actually, the "var foo = ___.readPub(IMPORTS___, 'foo');" bit is not needed,
if you have 'importNames'.

The security problem with allowing free identifiers to be used without
rewriting in a module body is that:

a) the global scope contains all sorts of "polluting" properties that
we don't want the module to be able to access.

b) we don't know what the names of these polluting properties are, because:
- some of them are 'DontEnum';
- some of them are inherited from objects on the prototype chain that
can't be reached on JS implementations that don't support __proto__;
- additional polluting properties might appear during execution.

b) is actually the bigger problem than a). If not for b), we could create
an object 'moduleScope' that has "blocking" properties corresponding to each
polluting property, and then run the module code (including event handlers)
in the scope of 'with (moduleScope)'.

However, we don't need to know what the names of polluting properties
are, if we know what free identifiers the module might access. In that
case, we can use the same approach but with a blocking property for each
free identifier.

In Jacaranda, I require each module to declare all of its imports in a
way that can be read programmatically. This isn't too onerous; it's
just the same as a /*global ... */ declaration in JSLint
<http://www.jslint.com/lint.html>, but written as an array of strings
rather than in a comment. Just like in JSLint, there might be an implied
set of imports for a given environment, that don't need to be explicitly
declared.

Actually there are two imports lists, one for required imports (where
the module cannot run at all if they are not present), and one for
optional imports (where the module will fall back to not using the
import if it is 'undefined'). So a module definition looks something
like (this is not set in stone yet):

module$({
name: 'MyModule',
ecmascript_version: 3.0,
imports: ['foo', ...],
optionalImports: ['bar', ...],

code: function() {
...module body...
}
});

In the case of Caja, the import list(s) could be generated automatically.

Mark Miller wrote:
> An unfortunate consequence of this simpler rewrite is that an
> unresolved free variable will become an actual variable with value
> undefined. This differs from JavaScript's behavior of throwing.

In Jacaranda this is only the case for optional imports. If a required
import is missing, the module will not run at all. I consider this behaviour
to be quite reasonable: by putting something in the optional imports list,
the programmer is committing to checking that it is not undefined.

--
David-Sarah Hopwood

David-Sarah Hopwood

unread,
May 18, 2008, 2:30:13 AM5/18/08
to google-ca...@googlegroups.com

Also, a module's powerbox (not shown in the above example) is granted the
authority to replace its imports. So the powerbox could replace a missing
import with a fallback object, before the rest of the module code runs.

--
David-Sarah Hopwood

Ben Laurie

unread,
May 19, 2008, 8:39:09 AM5/19/08
to google-ca...@googlegroups.com
On Sun, May 18, 2008 at 2:00 AM, Mark Miller <eri...@gmail.com> wrote:
>
> On Sat, May 17, 2008 at 5:55 PM, <ihab...@gmail.com> wrote:
>> By the "wart", I take it you mean the fact (and all its downstream
>> implications) that references to global variables, even those which are
>> declared using a "var", are assigned to IMPORTS___?
>
> No. I like these proposed cleanups as well, for which I'll be making
> some further suggestions. But here, I was referring only to the
> specific issue that we currently allow "this" used at global scope to
> reify the IMPORTS___ object itself. Instead, we should statically
> reject all such uses of "this" with an error message that explains how
> they should fix their code.

+1

David-Sarah Hopwood

unread,
Jul 17, 2008, 6:32:14 PM7/17/08
to google-ca...@googlegroups.com
I made a suggestion in May on how to handle top-level imports. It turns
out that this suggestion doesn't work, for an interesting reason that
sheds some light on the security properties of scoping in JavaScript.
The translation currently used by Caja avoids the mistake I made.

David-Sarah Hopwood wrote:

> handlers in the scope of 'with (moduleScope)'.


>
> However, we don't need to know what the names of polluting properties
> are, if we know what free identifiers the module might access. In that
> case, we can use the same approach but with a blocking property for each
> free identifier.
>
> In Jacaranda, I require each module to declare all of its imports in a

> way that can be read programmatically. [...]

Here's the problem: the free identifiers of a module are a lexical property
of that module. But 'with' implements dynamic scoping, not lexical scoping.

Suppose that the above approach is used, and one module instance, Alice,
creates an object 'foo' that is passed to another module instance, Bob.
When a method of foo is called by Bob, it will be evaluated in the dynamic
scope of the 'with' statement for Bob's moduleScope. This is wrong; there
is no guarantee that 'foo' won't use identifiers that are not in Bob's
import list, and besides, the actual imports will be Bob's imports, not
Alice's imports as we need for lexical scoping. Dynamic scoping is insecure
for a capability language, because it allows an object (the function 'foo'
in this case), to use authority available to the module that calls it, that
has not been explicitly granted to it under capability rules.

Conclusion: 'with' cannot be used safely even in the internal implementation
of a safe JavaScript subset; it implements entirely the wrong scoping
semantics.

The translation that MarkM suggested above (and that I think is currently
used by the Caja implementation) avoids this problem because it puts a
module's imports into the lexical scope of its module function.

Jacaranda will be changed so that it also binds imports lexically.
I don't know of a way to directly simulate the lexical equivalent of
'with', but it is possible to do something that still allows a module
definition to be just as concise, and does not require rewriting:

(function($init) {
var a, b, c; // declare imports
eval($init); // initialize imports

// rest of module code
})();

(where 'eval' can only appear exactly as in this boilerplate).

--
David-Sarah Hopwood

David-Sarah Hopwood

unread,
Jul 17, 2008, 9:18:23 PM7/17/08
to google-ca...@googlegroups.com
David-Sarah Hopwood wrote:
> Jacaranda will be changed so that it also binds imports lexically.
> I don't know of a way to directly simulate the lexical equivalent of
> 'with', but it is possible to do something that still allows a module
> definition to be just as concise, and does not require rewriting:
>
> (function($init) {
> var a, b, c; // declare imports
> eval($init); // initialize imports
>
> // rest of module code
> })();
>
> (where 'eval' can only appear exactly as in this boilerplate).

That doesn't quite work because $init doesn't know what imports were
declared.

Anyway, I worked out how to fix this problem with only very minimal
changes to the Jacaranda design. A module definition will look something
like:

(function() { eval($module({
name: 'Foo',
imports: ['a','b','c'],
...
}));})();

Now the object literal passed to $module is in the lexical scope of
the enclosing anonymous function, and the 'eval' can introduce arbitrary
variables into that scope.

(Apologies if this is obscure given that I still haven't posted the
spec for Jacaranda. I think the fact that it's possible to simulate a
lexical version of 'with' in JavaScript is independently interesting,
though.)

A more complete, although still not fully realistic demonstration
(this code doesn't check that names are declarable identifiers,
for example):

var modules = {}, moduleImports = {};

function $module(literal) {
modules[literal.name] = literal;
var imports = literal.imports;
var s = 'var ___imports___ = moduleImports.'+literal.name+';\n';
for (var i = 0; i < imports.length; i++) {
var id = imports[i];
s = s + 'var '+id+' = ___imports___.'+id+';\n';
}
return s;
}

moduleImports.Foo = {a:77};

(function() { eval($module({
name: 'Foo',
imports: ['a','b','c'],
main: function() {
print(a);
}
}));})();

modules.Foo.main();
// 77

--
David-Sarah Hopwood

David-Sarah Hopwood

unread,
Jul 18, 2008, 11:41:47 PM7/18/08
to google-ca...@googlegroups.com
David-Sarah Hopwood wrote:
> David-Sarah Hopwood wrote:
>> Jacaranda will be changed so that it also binds imports lexically.
>> I don't know of a way to directly simulate the lexical equivalent of
>> 'with', but it is possible to do something that still allows a module
>> definition to be just as concise, and does not require rewriting:
>>
[...]

> Anyway, I worked out how to fix this problem with only very minimal
> changes to the Jacaranda design. A module definition will look something
> like:
>
> (function() { eval($module({
> name: 'Foo',
> imports: ['a','b','c'],
> ...
> }));})();

I said that 'with' implemented dynamic scoping. Turns out that's wrong,
it's approximately lexical (but not always). The reason it looked like
dynamic scoping in some of the examples I tried, is that the functions
defined in the module were accessing the global scope (I think).

To reliably get lexical scoping using 'with', the module definition has
to be explicitly in the scope of a 'with' statement that is inside a
function body, for example:

$module(function(_) {with(_) return {


name: 'Foo',
imports: ['a','b','c'],
...

}});

which avoids using the 'eval' sledgehammer, and is potentially a little
more efficient.

Anyway, JavaScript scoping in its full, baroque generality is absolutely
insane. It's worse than natural languages, never mind any other programming
languages.

--
David-Sarah Hopwood

Felix

unread,
Jul 19, 2008, 3:05:23 AM7/19/08
to google-ca...@googlegroups.com
David-Sarah Hopwood wrote:
> I said that 'with' implemented dynamic scoping. Turns out that's wrong,
> it's approximately lexical (but not always). The reason it looked like
> dynamic scoping in some of the examples I tried, is that the functions
> defined in the module were accessing the global scope (I think).

I'm confused by your description. 'with' interposes dynamic scope
lookup before lexical scope lookup.

given:
var y = {};
var f = function(x) { with (x) { return y; } };

sometimes f(x)===y
sometimes f(x)===x.y

David-Sarah Hopwood

unread,
Jul 19, 2008, 5:21:30 PM7/19/08
to google-ca...@googlegroups.com
Felix wrote:
> David-Sarah Hopwood wrote:
>> I said that 'with' implemented dynamic scoping. Turns out that's wrong,
>> it's approximately lexical (but not always). The reason it looked like
>> dynamic scoping in some of the examples I tried, is that the functions
>> defined in the module were accessing the global scope (I think).
>
> I'm confused by your description. 'with' interposes dynamic scope
> lookup before lexical scope lookup.

No, it doesn't, as shown by the second example below.

> given:
> var y = {};
> var f = function(x) { with (x) { return y; } };
>
> sometimes f(x)===y
> sometimes f(x)===x.y

Suppose that f is always called with an x such that x.y exists. Then,
the scoping is lexical, i.e. the 'with' statement is equivalent in this
case to 'var y = x.y; return y;'.

It might help to compare the following three examples:

1)
function $module(m) {
var a = m({y:42});
a.code();
}

$module(function(_) { with(_) { return {
code: function() {print(y);}
}}});
// 42

2)
(function () {
var y = 123;
function $module2(m) {
with ({y:42}) { m().code(); }
}

$module2(function() { return {
code:function() {print(y);}
}});
})();
// 123

3)
y = 123;
function $module3(m) {
with ({y:42}) { m().code(); }
}

$module3(function() { return {
code:function() {print(y);}
}});
// 123

If 'with' implemented dynamic scoping in the conventional sense, then
at least the third example would print 42 (perhaps the second would as
well, depending on how lexical and dynamic scoping interact). That is
because the call to 'code' is in the dynamic scope of the 'with' statement,
and that binding "should" (for dynamic scoping) have been found before
the outer binding of y = 123.

The language in ECMA-262 section 12.10 about adding a scope object to
the "front of the scope chain" can easily lead to the misimpression
that both you and I had. I think the explanation is as follows: although
the object will be at the front of the scope chain at step 4 of the
algorithm in that section, evaluating 'Statement' in step 5 will itself
add further scopes to the front of the scope chain, corresponding to the
lexical environment of the code within the statement. So any closures
that are created within the statement will behave as though the 'with'
had just introduced a 'var' declaration for each of the properties of
the scope object (except that these declarations only have effect in
the 'with', not the enclosing function body).

The first example illustrates (but does not prove) that 'with' actually
provides lexical scoping, *provided* that all free identifiers in the
scope of the 'with' exist as properties of the supplied scope object.

[All examples tested on IE7 and FF2 in the squarefree shell.]

--
David-Sarah Hopwood

David-Sarah Hopwood

unread,
Jul 19, 2008, 5:34:02 PM7/19/08
to google-ca...@googlegroups.com
David-Sarah Hopwood wrote:
> Felix wrote:
>> David-Sarah Hopwood wrote:
>>> I said that 'with' implemented dynamic scoping. Turns out that's wrong,
>>> it's approximately lexical (but not always). The reason it looked like
>>> dynamic scoping in some of the examples I tried, is that the functions
>>> defined in the module were accessing the global scope (I think).
>>
>> I'm confused by your description. 'with' interposes dynamic scope
>> lookup before lexical scope lookup.
>
> No, it doesn't, as shown by the second example below.
>
>> given:
>> var y = {};
>> var f = function(x) { with (x) { return y; } };
>>
>> sometimes f(x)===y
>> sometimes f(x)===x.y

That is true, but...

I should also have mentioned that if you remove the 'y = 123', you get
a ReferenceError for the second and third examples. I.e. the y = 42
binding will still not be found in that case, because the closure m()
is not created within the lexical scope of the 'with'.

> [All examples tested on IE [7.0.6000.16681] and FF [2.0.0.16] in the
> squarefree shell.]

and as a <script> tag in plain HTML, just in case.

--
David-Sarah Hopwood

Felix

unread,
Jul 19, 2008, 6:17:33 PM7/19/08
to google-ca...@googlegroups.com
ah, yeah. I see.
with(x) is similar to
from x import *
the effect on the lexical scope cannot be determined until runtime, but
it doesn't affect bindings outside the lexical scope.
Reply all
Reply to author
Forward
0 new messages