Module loading discussion

13 views
Skip to first unread message

ihab...@gmail.com

unread,
Nov 19, 2009, 12:27:35 AM11/19/09
to Google Caja Discuss
Caja fans,

A couple days ago, I presented to the Caja group some material that I had previously presented to the ECMA TC39 committee, describing the module system we have implemented in Caja. The slides are here:


     *  *   *   *   *

At the Cajita level, we implement a module loader such that the result of calling:

  load('foo')

where "foo" is a module ID, returns a "module function". This is a closed function (i.e., it has no free variables) and is instantiated by calling it with an object literal providing bindings for its free variables. So if "foo.js" contained, say:

  x + y;

then the following expression:

  load('foo')({ x: 3, y: 4 });

would evaluate to 7. So far so good.

One desideratum is that the IDs of modules are "self-relative". Let's say module "a/b/c.js" contains the following expression:

  load('../d/e');

This should be relative to its _own_ ID; hence, the result of this should be to load:

  a/d/e.js

This means that a module must "know" its own ID, in some sense, and hand it to its own loaders so that they can compute IDs relative to that.

The way we did that in Cajita, we just _gave_ the module access to its own loader. But Mark Miller pointed out correctly that this was a violation of the assumption that module functions are transitively immutable -- i.e., powerless. If a module function is connected to something that, at the mere utterance of "load()", goes out to the internets and fetches guff, that is definitely an ambient authority. So, what's to do?

The simplest way to fix this is to allow a module to know its own ID (i.e., the ID it was loaded by). Let's say each module function is given a well-known constant in its lexical scope; for this description, we will call it:

  _thisModuleId_

A module can then load something relative to its own ID by saying:

  load('foo', _thisModuleId_);

and thus the module function does not have to close over its loader. Capability security regained. The specific syntax of this sort of thing remains to be hashed out.

     *   *   *   *   *

Mark Miller made some concrete suggestions as well. He stipulated two "load()" forms, which we name for the purposes of this discussion only. "loadf" stands for "load function" and is the basic "load()" we have now. To load a module function, do:

  moduleFunction = loadf('a/b/c');

which will load "a/b/c.js" as before. He also proposed "loadi", which stands for "load instance" and actually instantiates the module, in addition to loading its module function. To use it, do:

  moduleInstance = loadi('a/b/c', { x: 3, y: 4 });

The trick with "loadi" is that it has two conveniences: (a) it loads a module function and instantiates it in one shot; and (b) it desugars to:

  moduleInstance = loadf('a/b/c')({ loadf: loadf.for('a/b/c'), x: 3, y: 4});

In other words, it automatically passes down to the module being loaded a version of the current loader that is pre-configured to search relative to the path "a/b/c".

The reason why conveniences (a) and (b) are mixed together is that I may call any given module function with two different loaders, so providing a loader is an instantiation time, not a loading time, thing. So say I load some module function:

  mf = loadf('a/b/c');

I can instantiate this with two different loaders:

  mi_1 = mf({ loadf: theFirstLoader, x: 3, y: 4 });
  mi_2 = mf({ loadf: theSecondLoader, x: 3, y: 4 });

and the object graphs created in mi_1 and mi_2, including the code they are transitively connected to, may be wildly different because -- well -- they were instantiated with different loaders.

There is a final wrinkle in this. Note that we implement synchronous "load()" on top of an async loader by stipulating that (i) the topmost module loading is always async; and (ii) sync dependencies are declared in the Caja module record, and are thus prefetched prior to calling the module. Thus the sync dependencies are already in the loader's cache when the code is running. Now, what if an instantiating entity switches loaders along the way, and the loader provided does _not_ have the requested module in the cache? The answer is => this is a predictable failure mode. An exception is thrown. This is, after all, a fairly uncommon case.

Cheers,

Ihab

-- 
Ihab A.B. Awad, Palo Alto, CA

kkasravi

unread,
Nov 19, 2009, 2:44:29 PM11/19/09
to Google Caja Discuss
Hi Ihab

Would there be a way to determine id's of module instances beyond the
module being loaded.
EG something like moduleBNamespace = resolve(moduleBInstance)? Also
would there be a benefit
for a module to be able to query the loader for the set of namespaces
within its context?
Both approach introducing reflection capabilities to modules via their
loader, I'm wondering
it this is within the scope of modules and module loaders. I can think
of various use cases
where a module would want to resolve a namespace given an instance or
know what namespaces
are available.

Assuming _thisModuleId_ is exposed to every module would its object
properties be
_thisModuleId_ = {
value: "a/b/c",
writable: false,
enumerable: false,
configurable: false
}
Would the property be exposed outside of the module or be considered
private?

In terms of switching loaders (the last paragraph), would there be an
advantage to building a loader hierarchy similar to
java's where a loader may request a parent loader to find the module
reference if child and parent loaders
are mutually accessible within a sandbox?


Thanks
Kam

ihab...@gmail.com

unread,
Nov 19, 2009, 9:10:33 PM11/19/09
to Discussion of E and other capability languages, Google Caja Discuss
Retweeting a message from google-caja-discuss to e-lang.

Mark Miller

unread,
Dec 3, 2009, 9:51:28 AM12/3/09
to Google Caja Discuss
---------- Forwarded message ----------
From: Jonathan Rees <j...@mumble.net>
Date: Thu, Dec 3, 2009 at 5:50 AM
Subject: Re: [e-lang] Module loading discussion
To: Discussion of E and other capability languages <e-l...@mail.eros-os.org>


I wrote a reply and put it here:
http://odontomachus.wordpress.com/2009/12/03/javascript-modules/
I'm not on google-caja-discuss so did not attempt to cc: there. (maybe
I should be.)
Best
Jonathan
_______________________________________________
e-lang mailing list
e-l...@mail.eros-os.org
http://www.eros-os.org/mailman/listinfo/e-lang


_______________________________________________
e-lang mailing list
e-l...@mail.eros-os.org
http://www.eros-os.org/mailman/listinfo/e-lang




--
Text by me above is hereby placed in the public domain

Cheers,
--MarkM

ihab...@gmail.com

unread,
Dec 3, 2009, 6:26:42 PM12/3/09
to Discussion of E and other capability languages, Google Caja Discuss
Hi Jonathan,

On Thu, Dec 3, 2009 at 5:50 AM, Jonathan Rees <j...@mumble.net> wrote:
> I wrote a reply and put it here:
> http://odontomachus.wordpress.com/2009/12/03/javascript-modules/

Thanks for the remarks.

> I'm not on google-caja-discuss so did not attempt to cc: there. (maybe I
> should be.)

Please let me know if I should add you. :)

* * * * *

It turns out that we _do_ some early-phase stuff. It's so well hidden,
however, that you would not see it if you didn't know where to look.
One outcome of your remarks could be that we should make it clearer.

The Caja 'load()' (and CommonJS 'require()', fwiw, works similarly)
follows a convention that 'load()', when invoked where 'load' is a
free variable of the module, can only be called with a string literal
as its argument. Our compiler enforces this, effectively making 'load'
a built-in operator.

To dynamically load, we provide 'load.async()' which returns a promise
for the result.

We take advantage of this in our compiler. As you might know, we
compile each module to an object literal of the form:

{
instantiate: function(___, IMPORTS___) { ... },
includedModules: [ 'module1', 'module2', ... ]
}

where 'includedModules' shows precisely the static dependencies. A
build or deployment system can then take advantage of this
information. Hopefully, we can work through ECMA TC39 to find a way to
bake this information into the JS syntax in a more easily parseable
manner than just looking for a function call with a string literal.

At the moment, on CommonJS, we are trying to spec out "packaging"
where a group of modules (module = individual JS file) form a package
(e.g. as a ZIP file). The idea is to embed metadata in the package
headers that can securely specify where to fetch a required package
and how to verify its contents.

Kris Kowal

unread,
Dec 3, 2009, 7:25:32 PM12/3/09
to google-ca...@googlegroups.com
On Thu, Dec 3, 2009 at 3:26 PM, <ihab...@gmail.com> wrote:
> The Caja 'load()' (and CommonJS 'require()', fwiw, works similarly)
> follows a convention that 'load()', when invoked where 'load' is a
> free variable of the module, can only be called with a string literal
> as its argument. Our compiler enforces this, effectively making 'load'
> a built-in operator.
>
> To dynamically load, we provide 'load.async()' which returns a promise
> for the result.

The architecture I'm shooting for starts with an "evaluate(text,
fileName_opt, lineNo_opt)" function, provided at the lowest level from
the engine. This evaluator function takes the text of a module and
returns a module factory function that in turn accepts a scope object
for names to inject in its scope.

var factory = evaluate(text); // compiles the module program
factory({}) // executes the module program with the given owned
// variables added to its scope, in a hermetically sealed
// primordial context

In a secure context, the factory would be frozen and could be reused
safely in any context. It is just code that computes with the
capabilities given.

Mix the evaluator with the capability to grab the text of modules to
produce the "load(id)" and "load.async(id)" capabilities.

var baseId = "…";
var load = Object.freeze(function (id) {
id = resolve(id, baseId);
return evaluate(fetch(id), id, 1);
});

load(id)({x: 10, y: 20, load: load});

We can then construct a per-module "load" that closes on its baseId.

Then, we can construct require.

http://github.com/kriskowal/narwhal/blob/refactor/lib/sandbox.js

I think there was a concern that having the "load", "load.async",
"require", and "require.async" methods constructed per-module to close
on the calling module's id for the purpose of resolving relative
module identifiers would leak a capability. The intent is that the
"load" function provides the ability to load any module that the
module system has been given access to through an attenuated "fetch".
That is to say, the ability to load a module with a relative
identifier does not grant any additional capabilities. This supposes
that module identifiers are common knowledge. Knowing an identifier
does not give you the capability to load a module; the "fetch"
provided to your system of modules is the exclusive channel for the
ability to load a module by its text. Furthermore, the ability to
load a module only gives you the ability to compute; all other
capabilities must be expressly provided to the module system through
the "load" injection argument.

Here's a loader proposal for CommonJS:

http://wiki.commonjs.org/wiki/Modules/Loaders/B

That outlines the API but not the security layer. The intent is for
the API to be common between secure and permissive implementations.
It does not yet mention require.async and load.async because CommonJS
does not yet have a ratified promise object, but Narwhal implements
require.async for browsers with the working Promise draft and Tyler
Closes's ref-send.

Kris Kowal
Reply all
Reply to author
Forward
0 new messages