Re: Module system

27 views
Skip to first unread message

Kris Kowal

unread,
Sep 7, 2008, 2:36:39 AM9/7/08
to ihab...@gmail.com, helm...@googlegroups.com
Ihab,

For the last week, I've been participating in a long discussion with
people from an "Helma-NG" server-side JavaScript project which seems
well positioned to take advantage of a variation on the module system
I've drafted and as such would be able to share modules between
client- and server-side module systems, potentially with an easy
migration path to ES-H. There's a Wiki page on the Helma-NG site
where the standard proposal is evolving with the discussion. Please
join in, if you have the time.

http://groups.google.com/group/helma-ng/browse_thread/thread/6d002cb42a47ae42

http://dev.helma.org/wiki/ModuleSystem/


For the benefit of the list, which I have CCed, I'm reviewing Ihab's
ideas and arguments for a standard ES-Harmony module system.

http://google-caja.googlecode.com/svn/trunk/experimental/doc/html/harmonyModules/index.html


2.1 Informal Description - Example Module

There's been some discussion over whether to use Python/Java-style
identifiers or URI's to reference modules. I'm starting in the URI
camp, but willing to concede to the desires of the Helma-NG crowd.
However, your proposal contains some ideas that I think strongly favor
using URI's. I particularly like the idea of using "urn:" URI's to
identify modules not corresponding to HTTP resources. Chiron's module
loader, modules.js, already provides certain resources that would be
better served from "urn:" identifiers since they're preloaded. I
imagine that many module loader systems would have need of
special-case URI's. Just as an example, Hemla-NG could benefit from a
Java URN scheme for loading names from Java modules like
"urn:java:module.name".

I'm not a fan of preserving eval's mongrel exec/eval semantics with
the module loader. In my opinion, eval should only evaluate
expressions and exec should only evaluate statements. Implicitly
returning the module exports reminds me of Perl's convention for
modules where the last evaluated expression reports whether the module
successfully loaded or not (exceptions, Larry?). I very much prefer
using the Object constructor notation where the module author adds
exports to "this" and uses "var" or "let" for locals.


2.2 - Loading the module

I think that it would be a shame if module loader functions were
anything but top-scope functions. I strongly favor "import" for the
ES3.1 standard, since it's already a reserved keyword with behavior on
which no-one depends.


2.4 - Asynchronous module loading

This is a point where I think that an ES-3 native implementation can
help a lot. Plain and simple: JavaScript developers shouldn't have to
code in continuations to manage asynchronous module loads. A native
implementation should sleep a JavaScript thread on an "import"
statement and perhaps speculatively load any modules imported later
down in the program. In a Userland implementation, I find that using
exclusively blocking "import" directives greatly simplifies the
program, to the point that I don't have the time or mind-share to do
it any other way. Furthermore, I find that there is no need to do it
any other way. When I am operating in a debug mode, I do not mind
going to an HTTP request for each individual module file
synchronously. When I am operating in production mode, one of the
steps I have to perform is a server-side compilation of my modules
that bundles all of my scripts and their dependencies into a single
module file with no need to change any of my source code or HTML. So,
in this situation, the module loader never blocks on an HTTP request:
the module constructor has already been downloaded whenever you arrive
at an "import" call.


3.2 - Standardization Issues - Loader Support

You mention that scope chain manipulation would require native
support. I've found that I can get around needing a lot of native
support simply by fiddling with the scope chain at very low
performance cost. On the flip-side, I am very skeptical about doing
any pre-processing on the client-side. As Mark said when we last met,
it would invite a problem of cascading virtualization.

With scope chain manipulation, porting jQuery to Chiron is a two-line
patch, with which I completely box jQuery in a module.

Let me illuminate some of the techniques.

To prevent "eval" from capturing any variables from the module
loader's scope chain, i declare an anonymous global-scope eval
function that binds _no_ names except "arguments".

function () {
return eval(arguments[0]);
}

I then pass the global eval function into my module loader's
enclosure, binding the function to an argument of the module loader's
scope (which is to say, not available from the scope in which
evalGlobal is declared).

(function (evalGlobal) {
})(function () {
return eval(arguments[0]);
});

Then, to construct the scope chain I described in my proposal, I
create an evalGlobalWith function inside my module loader that injects
scopes into the scope chain inside evalGlobal using another eval in
the text of the evalGlobal, a with block, and an anonymous function:

(function (evalGlobal) {
var evalGlobalWith = function (text, locals, module) {
return evalGlobal(
"""
with (arguments[3]) {
with (arguments[2]) {
(function () {
eval(arguments[0]);
}).call(this, arguments[1])
}
}
""",
text,
locals,
module
);
};
})(function () {
return eval(arguments[0]);
});

I think you'll find that this technique effectively, completely, and
predictably controls the scope chain of the module program, all with
contemporary JavaScript with very little performance cost. I hope
that helps turn the balance of your value judgments. We actually can
do a lot without client-side parsing.


3.3 - Module Format

Consider the module environment I'm proposing, where adding properties
to "this" expresses an export. This permits a very clean migration
path from current ES, to current ES with a module loader
implementation I'm describing, to ES with a module loader provided by
the browser.

Take jQuery, for example. The only pity with modern JavaScript is
that "window", "self", and "this" all refer to the same object, even
though their meanings are distinct. Even with this problem, jQuery
manages to almost always use "window" when it wants "window" and
"this" when it wants to export a variable (to global scope). Adding
the jQuery object to "this" works today. When I subvert jQuery into
my module system, I simply change two occurrences of "window" to
"this", and suddenly jQuery has no impact on global scope when it's
used in the module loader, but still works if it's imported as a
global <script> tag. By extension, if we preserve these semantics in
a native module loader, most libraries will not have to change. As
older browsers fall out of fashion, the maintainers of those modules
will be able to remove their module-pattern boilerplate and explicate
their dependencies as a gradual optimization.


3.4 - Module metadata

I believe that the kind of metadata you're describing (dependencies
and versioning I presume) could easily be generated through static
analysis of "import" statements inside the program and assignment to
standard variables, like "local.version". To that end and in the
interest of not requesting that module authors repeat themselves
needlessly, I think that module metadata should be generated
programatically and can safely be implementation specific.


4.1 - Alternatives and rationale - Explicit Module Syntax

I actually have need for explicit module exports in Chiron. I use a
script bundler that "concatenates" modules to reduce HTTP requests.
In my proposal, I've reserved "register" for this purpose. ES-H could
easily use it's reserved "export" keyword to perform the same
function, thus providing a migration path from user-space module
loaders to native module loaders. I imagine it would look like:

export "urn:myModule" {
var {a} = import("module.js");
this.b = 10;
}

desugaring to the same notation as a Userland implementation:

register("urn:myModule", function () {
var {a} = import("module.js");
this.b = 10;
});

To get the most ! for our $, we probably should ask Brendan for a
Function.detach(), or something of the kind, to null out the closure's
scope reference, thus permitting us to guarantee that the registered
functions have laundered scope chains, but for our transient,
potentially insecure purposes, living without this is acceptable.

For Caja's purposes, I imagine that "register" would be a microkernel
function that, like the use of "eval" and "with", is restricted to the
cajoler. This would put Caja in a place of responsibility to perform
script bundling and compression server-side. You're well placed to do
that anyway, until ES-3.1-H, as I believe we all hope, obviates the
need for Caja in the long-term.


4.2.1 - Import and Export Mechanisms

This particular counter-proposal is very similar to mine. The only
difference is the introduction of a head-most function block scope
that intercepts "var" and "function" declarations, making them
implicitly local to the module and preventing them from being
exported. Very early on, I tried to make the scope chain you are
proposing work in Userland. However, the only way to capture exports
from the head of the scope chain on the client-side is to use a "with"
block. Now, most unfortunately, inner with blocks don't work
consistently across all browsers, and more importantly, in the
browsers that are standards compliant, you can't capture a "var" on
the head of the scope chain if it's a "with" block: the declaration
forwards to the head-most function block scope (usually above the with
block) and the assignment gets evaluated in the context of the "with".
This leads to some VERY strange behavior. Here's an excerpt of me
talking to myself:

Kris: alright. I'm saying "var a = 10" inside a with block. sounds good.
Kris: so, what do you want? "a" to be 10 in the function block scope
or "a" to be 10 for the with object?
Kris: the "with" object, of course.
Kris: oh, well, it's going to depend you see. If you already have an
"a" on the with object, that's going to be set to 10. If you don't,
the "a" variable in the function block scope is going to be 10. Have
a nice day.
Kris: and, by the way, the "var" statement will always add "a" to the
function block scope, even if it's left "undefined", but it won't
redeclare it or reset it to undefined if it was already there. It's
high time for some carbonated high-fructose corn-syrup. Who else
wants some?

On the bright side, if you put an enclosure inside the with block,
suddenly everything works exactly the same way in all browsers. So I
do. The cost is that I can't implicitly export head-scope functions
and vars. I figure that's okay since constructor functions certainly
don't export local variables either. The browser bugs and spec bugs
are ironically forcing you to "do the right thing".

The other downside is that I'm using "eval" and "with" a lot. I wish
I could convince people to just use modules.js without reading it. I
figure the only way is to get something that works the same way in a
native module loader ;-)


4.2.2 - Explicit calls to import

I believe I've addressed how using the proposal I've outlined, adding
support for "import" calls can be emulated with user-space JavaScript
until a standard solution provide a migration path. I made a post to
Helma-NG that explicitly draws out the migration path for import
statements.


4.2.3 - Allowing return at the top-level

I agree that an explicit, top-level return would be very good in the
long run. It might just be good if, in ES-3H, we were to have in
addition to Function.detatch (which would give us evalGlobal through
eval.detatch()), an eval variant that evaluated its contents in its
own function block scope so that "var" and "let" declarations were
forcibly localized to the inner program and "return" would be a
freebie.


Kris Kowal

Peter Michaux

unread,
Sep 7, 2008, 1:50:32 PM9/7/08
to helm...@googlegroups.com, ihab...@gmail.com
On Sat, Sep 6, 2008 at 11:36 PM, Kris Kowal <cowber...@gmail.com> wrote:

[snip]

> I'm not a fan of preserving eval's mongrel exec/eval semantics with
> the module loader. In my opinion, eval should only evaluate
> expressions and exec should only evaluate statements. Implicitly
> returning the module exports reminds me of Perl's convention for
> modules where the last evaluated expression reports whether the module
> successfully loaded or not (exceptions, Larry?). I very much prefer
> using the Object constructor notation where the module author adds
> exports to "this" and uses "var" or "let" for locals.

I don't like augmenting "this" with the exports. It is very cryptic
and would certainly confuse a non-expert that doesn't understand that
"this" can mean the global object, module object, or an oop object and
need to figure out which it means in each particular case. I believe
that is part of the reason for adding the "global" global variable
(similar to window) to access the global object. It is much better to
read

(function(){
// ...
global.fn = function(){};
})()

Than it is to read

(function(){
// ...
this.fn = function(){};
})()

and the above is why I currently write the same first line inside
almost all of my (function(){})() expressions.

(function() {
var global = this;
//...
global.fn = function(){};
})();

> 2.2 - Loading the module
>
> I think that it would be a shame if module loader functions were
> anything but top-scope functions. I strongly favor "import" for the
> ES3.1 standard, since it's already a reserved keyword with behavior on
> which no-one depends.

It seems the loader functionality contains two things: getting the
code (e.g. HTTP, etc) and executing the code in the correct scope of
the program. I thought Brendan made it clear somewhere that the
getting-the-code part will not be part of ECMAScript.


> 2.4 - Asynchronous module loading
>
> This is a point where I think that an ES-3 native implementation can
> help a lot. Plain and simple: JavaScript developers shouldn't have to
> code in continuations to manage asynchronous module loads.

Module loads should not be asynchronous. It is too complex and all of
this loading business is not really related to the idea that a module
is a language construct related to variable scope.

> A native
> implementation should sleep a JavaScript thread on an "import"
> statement and perhaps speculatively load any modules imported later
> down in the program.

Very hard to determine in a dynamic language as the modules to be
imported later may not be statically coded as expected.

> In a Userland implementation, I find that using
> exclusively blocking "import" directives greatly simplifies the
> program, to the point that I don't have the time or mind-share to do
> it any other way. Furthermore, I find that there is no need to do it
> any other way. When I am operating in a debug mode, I do not mind
> going to an HTTP request for each individual module file
> synchronously. When I am operating in production mode, one of the
> steps I have to perform is a server-side compilation of my modules
> that bundles all of my scripts and their dependencies into a single
> module file with no need to change any of my source code or HTML. So,
> in this situation, the module loader never blocks on an HTTP request:
> the module constructor has already been downloaded whenever you arrive
> at an "import" call.

Bundling scripts is an essential capability. When you bundle the
scripts, the modules suddenly are self-naming. You posted the
following code before.

# register("module1.js", function () {
# with (this.moduleScope) {
# with (this) {
# (function () {
# …
# }).call(this);
# }
# }
# });
#
# register("module2.js", function () {
# with (this.moduleScope) {
# with (this) {
# (function () {
# …
# }).call(this);
# }
# }
# });

If modules need to be self-naming in any way (e.g. like above) at any
time (e.g. in production where it matters most) then I think it is
better to admit that self-naming is inescapable and make them self
naming all the time.

[snip]

> To get the most ! for our $, we probably should ask Brendan for a
> Function.detach(), or something of the kind, to null out the closure's
> scope reference, thus permitting us to guarantee that the registered
> functions have laundered scope chains, but for our transient,
> potentially insecure purposes, living without this is acceptable.

This is undoing lexical scoping. I don't think that is a good idea.
When a reader is looking at a function definition he should be able to
depend on the surrounding code's usual capture in the closure. If the
detach call happens far away then it would lead to very difficult
debugging.

[snip]

-------------

There is an assumption that some problem that developers are having
currently can be solved with "a module system". Unfortunately there is
no detailed description of that problem developers are having
currently in either of the following proposals. Without such
motivation, there is no way to judge if a particular proposal achieves
its goals a simply as possible or if the language can already do what
is needed.

http://dev.helma.org/wiki/ModuleSystem/

http://google-caja.googlecode.com/svn/trunk/experimental/doc/html/harmonyModules/index.html

I would really like to know exactly what problems these proposals are
trying to solve for the average developer. The Caja proposal
essentially looks like a file is a constructor function which the
language already has. Kris' proposal looks like it can essentially be
done simply by using (function(){})() and not forgetting any "var" as
the whole idea of an anonymous module breaks down when bundling
scripts for production. As far as I know the loading (e.g. HTTP, etc)
is out of scope for ECMAScript (though could still be implemented in
libraries), complex for authors (worrying about async), loading tiny
bits of code at a time is expensive in connections and could cause
delays in responses to user actions. A really clear statement of what
problem cannot already be solved easily and are solved by the
proposals would be very helpful.

I'm not anti-module. As little as I understand Perl modules, I like
how they seem to work.

I also like the module pattern in JavaScript and use it all the time.
I can write the following in JavaScript and then continue on my merry
way having all the namespacing-collision protection that I need.

var ca_michaux_fooModule = {};
(function() {
ca_michaux_fooModule.a = function(){};
})();

or

var ca_michaux_fooModule = {};
(function() {
var mod = ca_michaux_fooModule;
mod.a = function(){};
})();

That gives me private scope, explicit exports, naturally easy script
bundling, no text-processing of the code, it is less than 20
characters (i.e. almost nothing) of boilerplate for the entire concept
of variable scoping, and nothing new to learn. Because a module is
usually large (>100 lines) I don't see the desperate need to reduce
the above boilerplate. This is not a situation like the desugaring of
classes where the desugared version is very bulky and complex to read
and the sugared version is very light and easy to read.

Some concepts of "module" in other languages allow for reopening a
module and modifying it (i.e. monkey patching). That is something I
have not seen suggested for JavaScript modules. I'm not necessarily in
favor of reopening modules. I haven't seen anything in the proposals
for JavaScript modules that is as revolutionary as being able to
reopen a module. It seems like the few lines I wrote above with the
module pattern basically cover all the bases trying to be covered.

Peter

ihab...@gmail.com

unread,
Sep 7, 2008, 3:38:09 PM9/7/08
to Peter Michaux, Kris Kowal, helm...@googlegroups.com
Hi folks,

Wow! :)

Here's my position regarding module systems. I will try to reply in
detail to your other, quite thought provoking posts separately. And
please note that my proposal was and remains an early draft.

As for whether we need a standard ES module system at all, I am fully
in support of not having one. Most pressingly, I simply want to make
sure that, _if_ any module proposal is adopted, it does not get in the
way of features that I consider important. These are:

1. Module code evaluated in a strictly isolated context, with imports
and exports under client control. This allows secure isolation of
untrusted code, and clean isolation of dependencies via dependency
injection patterns.

2. Modules multiply instantiable, allowing one parsed version of the
code to be re-used to create several instances with guaranteed
isolation between them. This is further in support of secure isolation
and dependency injection.

3. Modules manipulated as first-class entities independently of how
they are named. This allows for code to manipulate the code and
instances of more than one version of a module. That in turn stems
from wanting an ES system to be able to deal gracefully with more and
more complex mashups of independently developed code, as it is already
beginning to do.

Again, as noted by Peter and Kris, there is a lot that can be done
without a native modules spec in ES (though I would note that ES 3.1
strict does not permit the 'with' construct; that might get in the way
of some of the techniques).

Cheers,

Ihab

--
Ihab A.B. Awad, Palo Alto, CA

Kris Kowal

unread,
Sep 7, 2008, 7:14:32 PM9/7/08
to helm...@googlegroups.com
Peter has called for a statement of purpose. Ihab's statement of
purpose is a good place to start. I'll iterate it here instead of
starting from scratch.

---

As I was writing up this "Statement of Purpose", one word came up
pretty frequently: "sovereignty". Since I'm using it to imply
"desirability", I will define it here. Sovereignty is the right of
self-determination. In the context of a name space, albeit the local
scope of a function, the scope of a module, the global scope, or the
names of files in a directory tree, sovereignty is the ability to
chose all of the names that are in your scope. Sovereignty, to
whatever extent it is possible to grant, is desirable because it
creates an operational box where an author can chose names that do not
collide with one another. It also creates a security box, one where
every name was explicitly created in that domain such that it cannot
be subverted by an attacker.

Module systems allow module authors to completely control their local
scope. This doesn't just mean isolating private variables. This
means that module authors control the variable names by which other
modules and their contents are referred. For example, with a module
system, two modules can both export "encode" and "decode" functions.
If a module author only needs to use one of these modules, they have
the option of bringing them directly into the local scope. If the
programmer is using both modules, they can import them both and bind
them to different names such that, say, "base64.encode" and
"utf8.encode" can be referenced in the same program. A module system
gives you ease and flexibility of controlling the estuary of
variables.

The purpose of a module loader should, given value judgements between
internal tensions, perform the following duties:

* Linearize the execution of recursive module dependencies. A module
should be sovereign over its direct dependencies. This means that a
module author should NOT need to inspect or mention the recursive
dependency tree of its own dependencies. The topological
linearization of the execution of a module and all of its recursive
dependencies should be the sole purview of the module system. In
order for a module to be sovereign over its own dependencies, it must
not be beholden to the open range of modules that depend upon it to
maintain a particular dependency tree.
* Orthogonalize name-spaces into three, sovereign domains:
* file names relative to the module root directory,
* file names inside a module directory,
* variable names inside a module.
* Encapsulation of private, local variables. This is necessary for
modules to be secure "capability objects". This is also necessary to
ensure sovereignty of variable names inside a module.
* Encapsulation of other private module objects. This is necessary
for modules to receive and not share capabilities in a security model.
This is also necessary to ensure sovereignty over variable names
inside a module.
* Flexible dependency injection patterns. A module author should
have the ability to easily extract names from a foreign, untrusted
module object and bind them to names of their choice. This is
desirable because it ensures that people will take advantage of their
ability to control the sovereign name-space of a module. It also
permits module authors to make their code as brief as possible while
also balancing that need against being as explicit as necessary to
avoid name collisions. Also, injecting module objects or properties
of module objects into the scope chain can improve performance because
it reduces lengthy property lookups. A module should never depend on
the name another module chose to expose its API on through a common
global scope. Doing so invites opportunities for malicious modules to
perform "man-in-the-middle" attacks by subverting names exposed by
other modules. Global object sharing also increases coupling; since a
module author does not control its scope, they are privy to other
module author's fickle naming conventions and inter-module name
collisions.
* Modules provided as capability objects. This permits dependent
modules to isolate untrusted code and control the passage of
information to that module through its exported API.
* Module versioning. Modules should be possible to request by
version or version range in response for the need to mash-up scripts
from multiple sources each requiring multiple versions of the same
library. A strong versioning system would require modules to request
modules by version range and limit those versions to ones that have
already been published such that new versions must be explicitly
qualified.
* Modules should be stateless. Modules objects should not be a
vector for untrusted modules to obtain data from other modules. In
module systems that can guarantee security invariants, module objects
should be instantiated uniquely for each dependee. If possible, the
module object should be unique and any internal state should be
"copy-on-write" to ensure good performance and separation, much like
virtual memory shared between a process and its forked children.
* A module system should abstract the process of loading, parsing,
managing execution context, and executing module code. There should
be no constrain on the mechanism by which these are performed, in
particular:
* whether a particular transport like local files, HTTP, or
in-memory objects are used to host or produce modules.
* whether a scope chain or prototype chain is used to produce a
particular lookup order for the top scope object.
* whether a particular function is used to evaluate code.
* whether modules are pre-parsed.
* whether module objects are singleton, memoized, or
re-instantiated. These issues depend on performance and whether
security invariants can be maintained in the host environment.

# :vim:wrap:lbr:co=80:

Kris Kowal

unread,
Sep 7, 2008, 7:23:05 PM9/7/08
to helm...@googlegroups.com
Ihab schreib:

> Again, as noted by Peter and Kris, there is a lot that can be done
> without a native modules spec in ES (though I would note that ES 3.1
> strict does not permit the 'with' construct; that might get in the way
> of some of the techniques).

Whoo! That make my heart flutter a bit. Without "with", I think the
importance of having the browser subsume the responsibility of
managing modules ticks up a few notches! I hope that ECMAScript
enforces strict compliance of the "with" block dynamically so I can
try it to catch an exception as a signal to defer to the native module
system, which hopefully provides the same interface as a Userland
implementation.

Kris

Peter Michaux

unread,
Sep 7, 2008, 9:21:05 PM9/7/08
to helm...@googlegroups.com
On Sun, Sep 7, 2008 at 4:14 PM, Kris Kowal <cowber...@gmail.com> wrote:

I think it is a good idea to deal with the following issue first and I
think it is a show stopper to the anonymous modules and similar ideas.
(It is not my fault an HTTP connection and download is the most
expensive part of building a client-side app.)

> A module should never depend on the name another module chose to
> expose its API on through a common global scope.

When a module chooses its own name to expose its API, I call this a
"self-naming module." Modules are self-naming in many/most(/all?)
languages.

How can the above quoted requirement even be done efficiently? As I've
written several times now, multiple modules need to be able to exist
in one file for efficient download. "Module" cannot equal "file" and
Brendan agreed on the es-discuss list as though it was obvious and
essential that they are not equal. If multiple modules can be in a
single file then the modules need to be self-naming, don't they? The
name may be a URL like in Kris' register function or just a regular
JavaScript name like used in the conventional module pattern.

Does everyone agree that, for a client-side app, multiple modules must
be able to exist in a single file?

Peter

ihab...@gmail.com

unread,
Sep 7, 2008, 10:26:12 PM9/7/08
to helm...@googlegroups.com
Hi Peter,

On Sun, Sep 7, 2008 at 6:21 PM, Peter Michaux <peterm...@gmail.com> wrote:
> On Sun, Sep 7, 2008 at 4:14 PM, Kris Kowal <cowber...@gmail.com> wrote:
> (It is not my fault an HTTP connection and download is the most
> expensive part of building a client-side app.)

};->

> When a module chooses its own name to expose its API, I call this a
> "self-naming module." Modules are self-naming in many/most(/all?)
> languages.

True; but, for reasons I discussed earlier, this should not be a
requirement on compatible implementations of ES.

> ... multiple modules need to be able to exist


> in one file for efficient download. "Module" cannot equal "file" and
> Brendan agreed on the es-discuss list as though it was obvious and
> essential that they are not equal. If multiple modules can be in a
> single file then the modules need to be self-naming, don't they?

I'll introduce some terminology --

* Fetching -- getting the program text of a module from the internets.

* Parsing -- parsing the program text and generating a set of ASTs in
the language.

* Instantiating -- evaluating the ASTs in a client-supplied
environment to yield an instance of the module's data structures.

We are concerned with optimizing the fetching operation. This means
that we should be able to statically locate all points where modules
are fetched. So in --

var m = loader.fetchAndParse('com.example.widgets', '1.03')
var aMod = myLoader.fetchParsed('http://example.com/widgets.js')

Optimization to put the content into one file involves recognizing the
necessary calls, prefetching the program text, and replacing the calls
with structures that evaluate to the same value:

var m = (function() { ... })();
var aMod = with (...) { ... };

This does not depend on whether modules are or are not self-naming. It
*does* require that calls to the loader be statically visible. Whether
this means that ES needs to standardize the loader interface or not is
up in the air -- it may or may not:

* On the pro side, it is good because I can grab any third-party code
and transitively fetch its dependencies from the internets, hence
building a single-file application

* On the con side, there are all sorts of ways to load modules (as I
outlined eariler) and a standard does not help us examine these.

Making module fetch calls statically analyzable means we need
*keywords* (or their moral equivalent) and not just standard
functions, since standard functions can be aliased.

Kris Kowal

unread,
Sep 7, 2008, 11:29:18 PM9/7/08
to helm...@googlegroups.com
Peter,

On Sun, Sep 7, 2008 at 6:21 PM, Peter Michaux <peterm...@gmail.com> wrote:

> On Sun, Sep 7, 2008 at 4:14 PM, Kris Kowal <cowber...@gmail.com> wrote:
>
> When a module chooses its own name to expose its API, I call this a
> "self-naming module." Modules are self-naming in many/most(/all?)
> languages.

Programming languages where one explicitly names the module name in
the file that contains its code are actually in the minority. Only
Java, Perl, and Haskell come to mind.

Java at least comes with a concession that autonomous (that is,
"self-named") modules come with a curse. The Eclipse IDE comes with
special refactoring tools that allow you to reparent a tree of modules
somewhere else on the file-system. It's the kind of thing that can be
done in one "mv" command for most other languages.

Precedent aside, you are _absolutely right_ to point out this tension
between security and performance, and I don't think Ihab's point that
the process of module loading can be split along parsing, fetching,
and instantiating address the security problem completely. Perhaps
I'm wrong. Allow me to propose a man-in-the-middle-attack.

Module Main depends on modules A and B.
File B is trustworthy.
File A is untrustworthy.
File A declares both module A and B.
The true Module B is never fetched, parsed, nor instantiated.

So yeah, there is a tension. I propose that the ECMAScript standard,
since there is a notion that the system might be able to host modules
as capability objects without the Caja Cajoler and that the purpose of
security might be attainable, should specify that a directory _tree_
of modules is sovereign over its module name-space. Thus, any module
can only register modules that are on the same domain and under the
same directory as itself.

I emphasize that bundling modules should be a build step for
JavaScript, not the modus operandi for authoring them. I also
acknowledge that it's the kind of thing a hacker might take advantage
of. Autonomous modules fall in the same security nightmare realm as
"with" and "eval".

> Does everyone agree that, for a client-side app, multiple modules must
> be able to exist in a single file?

I for one agree, but also agree that permitting autonomous modules is
a security hole. Thus there is a tension and it'll take all of our
collective whit to find a solution and resolve the tension amicably.


= Migration Path =

I also would like to clarify that I am proposing that there be two
standards: one for contemporary JavaScript wherein security is an
illusion and a myth and thus cannot rule design decisions, and one for
ECMAScript. It's my desire that these standards be compatible but not
the same, to the end that there is an easy migration path from
Userland module loaders to a native module loader. I think it's
acceptable for that migration to require refactoring, but I also feel
that there should be defined transition periods where module files
should work in two realms at once: a migration path.

PHASE 1: The Userland module loader proposal should be written such
that a module can be used as a global <script> tag or a <module>
assuming that it does not use the import mechanism, defines its
exports by attaching to "this", and uses an anonymous function block
called once to isolate its local scope. During this period, modules
can slowly be ported. Eventually the need to support old-fashioned
<script> tag modules will expire and these module authors will be able
to add module loading calls and remove their module pattern
boilerplate.

PHASE 2: The ECMAScript module loader proposal should be written such
that a module can be used as both a Userland module and a Native
module. During this period, modules can slowly be ported to the
native loader by changing the function names used to import and export
modules into Native module-loader keywords. Also, during this period,
the Cajoler could assume the role of a Userland module loader and thus
maintain its security invariants using the new ECMAScript syntax for
module loading. Eventually the need for Userland module loaders will
expire.

Kris

Peter Michaux

unread,
Sep 8, 2008, 12:12:29 AM9/8/08
to helm...@googlegroups.com
On Sun, Sep 7, 2008 at 7:26 PM, <ihab...@gmail.com> wrote:
> On Sun, Sep 7, 2008 at 6:21 PM, Peter Michaux <peterm...@gmail.com> wrote:
>> On Sun, Sep 7, 2008 at 4:14 PM, Kris Kowal <cowber...@gmail.com> wrote:

[snip]

> I'll introduce some terminology --
>
> * Fetching -- getting the program text of a module from the internets.
>
> * Parsing -- parsing the program text and generating a set of ASTs in
> the language.
>
> * Instantiating -- evaluating the ASTs in a client-supplied
> environment to yield an instance of the module's data structures.

I'm not so keen on instantiating modules. JavaScript already has
constructors that can be instantiated and will likely have classes
that use Object.freeze a lot to make the objects immutable. I think of
modules as single objects just as the module pattern provides.


> We are concerned with optimizing the fetching operation. This means
> that we should be able to statically locate all points where modules
> are fetched. So in --
>
> var m = loader.fetchAndParse('com.example.widgets', '1.03')
> var aMod = myLoader.fetchParsed('http://example.com/widgets.js')
>
> Optimization to put the content into one file involves recognizing the
> necessary calls, prefetching the program text, and replacing the calls
> with structures that evaluate to the same value:
>
> var m = (function() { ... })();
> var aMod = with (...) { ... };

The distributed production code just became huge with a lot of
recursive, repetitive "inlining" of modules. I don't think that is ok.
What if there is a module A depends on module B depends on module A
cycle? This is something Brendan said was necessary to support. I'm
not sure it is necessary to support.

> This does not depend on whether modules are or are not self-naming. It
> *does* require that calls to the loader be statically visible.

[snip]

> Making module fetch calls statically analyzable means we need
> *keywords* (or their moral equivalent) and not just standard
> functions, since standard functions can be aliased.

As long as "eval" is around is static analysis ever possible?

Peter

Peter Michaux

unread,
Sep 8, 2008, 12:12:45 AM9/8/08
to helm...@googlegroups.com
On Sun, Sep 7, 2008 at 8:29 PM, Kris Kowal <cowber...@gmail.com> wrote:
>
> Peter,
>
> On Sun, Sep 7, 2008 at 6:21 PM, Peter Michaux <peterm...@gmail.com> wrote:
>> On Sun, Sep 7, 2008 at 4:14 PM, Kris Kowal <cowber...@gmail.com> wrote:

> Programming languages where one explicitly names the module name in
> the file that contains its code are actually in the minority.
> Only Java, Perl, and Haskell come to mind.

That doesn't seem like water tight evidence and I don't know the facts
are about this. There are a lot of languages out there. I know Ruby
and Scheme also have modules named in the code. By lines of code, Java
and Perl probably make a pretty good bulk of web programs. In the end,
I don't think it should matter too much what other language do other
than learning from their mistakes. JavaScript is in a very
weird/unique niche.

[snip]

> Allow me to propose a man-in-the-middle-attack.

If we are going to show holes in proposals (which I think is a very
good idea) then there is a man-in-the-middle attack that, as far as I
know, cannot be fixed. A proxy server can rewrite content and a proxy
server is the actual physical man-in-the-middle. Therefore no
JavaScript will be secure. I am tempted to write "totally secure" but
in the security world things should either be secure or they are not
and the extreme cases matter. JavaScript, distributed as text over
HTTP, is not. So any statement of why a module system is being
introduced would need to address this issue and at least state that
JavaScript is not secure no matter what. I don't think security is an
issue for modules to solve. JavaScript applications are insecure. If
JavaScript was distributed as compiled code perhaps the situation
would be a little brighter.

This may seem quite dramatic but see my next note below...

[snip]

>> Does everyone agree that, for a client-side app, multiple modules must
>> be able to exist in a single file?
>
> I for one agree, but also agree that permitting autonomous modules is
> a security hole. Thus there is a tension and it'll take all of our
> collective whit to find a solution and resolve the tension amicably.

Before bothering to try solving this problem, it is probably a good
idea to see if it is the only security hole and that solving this
problem will then make JavaScript secure.

[snip]

> PHASE 1: The Userland module loader proposal should be written such
> that a module can be used as a global <script> tag or a <module>
> assuming that it does not use the import mechanism, defines its
> exports by attaching to "this", and uses an anonymous function block
> called once to isolate its local scope. During this period, modules
> can slowly be ported. Eventually the need to support old-fashioned
> <script> tag modules will expire and these module authors will be able
> to add module loading calls and remove their module pattern
> boilerplate.

In a web page, there will always be the need for at least one script
tag to bootstrap. That is how HTML works. In production that one tag
will load all the code so I don't see that there is any phasing out of
<script>.

> PHASE 2: The ECMAScript module loader proposal

According to Brendan, a module loader is not in scope for the
ECMAScript language. Each host can define its own loading system
appropriate to the host environment. Makes sense.

[snip]

Peter

ihab...@gmail.com

unread,
Sep 8, 2008, 12:14:59 AM9/8/08
to helm...@googlegroups.com
On Sun, Sep 7, 2008 at 9:12 PM, Peter Michaux <peterm...@gmail.com> wrote:
> According to Brendan, a module loader is not in scope for the
> ECMAScript language. Each host can define its own loading system
> appropriate to the host environment. Makes sense.

My reading of es-discuss is that this is not yet concluded, by any means.

ihab...@gmail.com

unread,
Sep 8, 2008, 12:29:36 AM9/8/08
to helm...@googlegroups.com
On Sun, Sep 7, 2008 at 8:29 PM, Kris Kowal <cowber...@gmail.com> wrote:
> Precedent aside, you are _absolutely right_ to point out this tension
> between security and performance, and I don't think Ihab's point that
> the process of module loading can be split along parsing, fetching,
> and instantiating address the security problem completely.

I'm saying that you are vulnerable to your loader, but not necessarily
to the modules it loads. In other words, if you build a loader that
you trust, you can be assured that modules cannot circumvent that
loader.

> Allow me to propose a man-in-the-middle-attack.
> Module Main depends on modules A and B.
> File B is trustworthy.
> File A is untrustworthy.
> File A declares both module A and B.
> The true Module B is never fetched, parsed, nor instantiated.

I'm not sure what you are saying here -- if File A declares modules A
and B, that is equivalent to "self-named" modules, no? In any case,
your example assumes a global writeable namespace into which File A
places a reference to a corrupt Module B. My whole point is that a
secure loader implementation would allow no such thing.

> ... a directory _tree_ of modules is sovereign over its module name-space.


> Thus, any module can only register modules that are on the same domain
> and under the same directory as itself.

Again, you imply a global namespace into which things are
"registered". I claim no such thing:

- There is a loader that you (some module) are given by your parent.
You must trust it since you are vulnerable to your parent regardless.
Imagine therefore that your parent is the "operating system" -- in
other words, part of your Trusted Computing Base (TCB).

- The loader allows you to load some modules depending on some data
you give it. It may include supplying names or URIs, like --

var aModule = loader.load('http://foo/bar.js');

or it may be something else like --

var theModule = loader.getTheWidgetModule();

You are in turn responsible for supplying a loader to your child
modules. The recursion continues.

> I emphasize that bundling modules should be a build step for
> JavaScript, not the modus operandi for authoring them.

Yes, agreed.

ihab...@gmail.com

unread,
Sep 8, 2008, 12:32:37 AM9/8/08
to helm...@googlegroups.com
On Sun, Sep 7, 2008 at 9:12 PM, Peter Michaux <peterm...@gmail.com> wrote:
> I don't think security is an
> issue for modules to solve. JavaScript applications are insecure. If
> JavaScript was distributed as compiled code perhaps the situation
> would be a little brighter.

I'm not sure what compilation would do but, in any case, a good module
system would permit one to execute foreign code while placing a
boundary around the effects it can cause.

ihab...@gmail.com

unread,
Sep 8, 2008, 12:42:27 AM9/8/08
to helm...@googlegroups.com
On Sun, Sep 7, 2008 at 9:12 PM, Peter Michaux <peterm...@gmail.com> wrote:
> I'm not so keen on instantiating modules. JavaScript already has
> constructors that can be instantiated and will likely have classes
> that use Object.freeze a lot to make the objects immutable. I think of
> modules as single objects just as the module pattern provides.

The important invariant here is that, if we consider a module to be a
function, then the module function does not close over any variables
outside its body. This means that calling the module function twice
results in two completely independent object graphs. That, in turn, is
important for security and dependency control.

It may well be that not everyone needs it. Yet, as I mentioned
earlier, any standardized module system should not make this usage
unduly difficult.

ihab...@gmail.com

unread,
Sep 8, 2008, 12:56:24 AM9/8/08
to Kris Kowal, helm...@googlegroups.com
On Sat, Sep 6, 2008 at 11:36 PM, Kris Kowal <cowber...@gmail.com> wrote:
> 2.1 Informal Description - Example Module
> There's been some discussion over whether to use Python/Java-style
> identifiers or URI's to reference modules.

That should imho be up to the environment. I know of no *single* good
answer that works for all cases and, as I argue elsewhere, it is not
beneficial to assume a global namespace. Versioning and caching, for
one, cause the assumption of name-based equivalence to fail.

> 2.2 - Loading the module
> I think that it would be a shame if module loader functions were
> anything but top-scope functions. I strongly favor "import" for the
> ES3.1 standard, since it's already a reserved keyword with behavior on
> which no-one depends.

Pursuant to my terminology of "fetch, parse, instantiate", the "fetch"
function can be anything. The "parse" function may need to be native
(viz. removal of "with" in ES 3.1 strict). The "instantiate" function
can be a direct function call on the module itself.

> 2.4 - Asynchronous module loading
> This is a point where I think that an ES-3 native implementation can
> help a lot. Plain and simple: JavaScript developers shouldn't have to
> code in continuations to manage asynchronous module loads.

I'm not sure I like the async solution either, yet I think handling
modules as first-class, multiply instantiable objects is important, so
I would love to see a solution that preserves this.

> In a Userland implementation, I find that using
> exclusively blocking "import" directives greatly simplifies the

> program ...

Do you use XMLHttpRequest to get module code? If so, how do you manage to block?

> 3.4 - Module metadata
> I believe that the kind of metadata you're describing (dependencies
> and versioning I presume) could easily be generated through static
> analysis of "import" statements inside the program and assignment to
> standard variables, like "local.version". To that end and in the
> interest of not requesting that module authors repeat themselves
> needlessly, I think that module metadata should be generated
> programatically and can safely be implementation specific.

Perhaps... Hm.

> export "urn:myModule" {
> var {a} = import("module.js");
> this.b = 10;
> }

Cool, but does that write to a global namespace that the clients of
the module rely on? How many copies of this module are running? How
many versions? Who trusts it? What if it tramples on the namespace of
the "real" urn:myModule?

> ... we probably should ask Brendan ...

Ahem, you mean ECMA TC39, right? };->

> For Caja's purposes, I imagine that "register" would be a microkernel

> function ...

No, Caja just builds first-class modules out of whatever loader you
give it. There is no global registration of anything.

> It's high time for some carbonated high-fructose corn-syrup.

That's always true.

ihab...@gmail.com

unread,
Sep 8, 2008, 1:01:31 AM9/8/08
to Peter Michaux, helm...@googlegroups.com
On Sun, Sep 7, 2008 at 10:50 AM, Peter Michaux <peterm...@gmail.com> wrote:
> I don't like augmenting "this" with the exports. ...

> (function(){
> // ...
> global.fn = function(){};
> })()

Interesting!

> I thought Brendan made it clear somewhere that the
> getting-the-code part will not be part of ECMAScript.

If I understand correctly, that was the way people on es-discuss were leaning.

> Module loads should not be asynchronous. It is too complex and all of
> this loading business is not really related to the idea that a module
> is a language construct related to variable scope.

Well not really -- if that's all it is, you can declare everything in
one big file and be done with it, and use lexical scoping like it was
meant to be used. Anything different, to the extent of the difference,
is just a "#include" in your code that you can easily write a
pre-processor to resolve.

As I see it, a module system for the broader ES effort is about how
independently written, perhaps mutually suspicious pieces of code can
be composed at run-time safely.

Kris Kowal

unread,
Sep 8, 2008, 2:24:20 AM9/8/08
to helm...@googlegroups.com
On Sun, Sep 7, 2008 at 9:42 PM, <ihab...@gmail.com> wrote:
> The important invariant here is that, if we consider a module to be a
> function, then the module function does not close over any variables
> outside its body. This means that calling the module function twice
> results in two completely independent object graphs. That, in turn, is
> important for security and dependency control.
>
> It may well be that not everyone needs it. Yet, as I mentioned
> earlier, any standardized module system should not make this usage
> unduly difficult.

I agree.

I think that the desired effect could be produced in many ways,
dependent on the implementation of the module loader:

* shared memoized singleton for performance when security is not achievable
* copy-on-write shapshots for performance where native support is possible
* instantiate on demand module objects, when security is achievable
but native support is not available

I think that contemporary JavaScript modules could be written in such
a way that they do not depend on whether the module loader opts to
give them a unique object, a snapshot, or a singleton. In
contemporary JavaScript, it's important that modules be stateless so
that they are agnostic to this feature. It's also important that
contemporary JavaScript be fast and memory-light. To that end, in
contemporary JavaScript, where security is not worth trying to attain
strictly client-side and performance is paramount, a memoized, shared,
singleton object suffices for anyone who wants it. In a module
provided by a Caja microkernel module loader, a completely unique
object graph would suffice, or it could enforce a rule that module
capability objects are shallow and frozen to be immutable. It would
then be the purview of the module to not be a vector for transferring
shared state among mutually untrusting dependencies. Then, in a
native module loader provided by a future version of JavaScript, the
modules could be copy-on-write snapshots.

Kris

Kris Kowal

unread,
Sep 8, 2008, 2:47:51 AM9/8/08
to ihab...@gmail.com, helm...@googlegroups.com
On Sun, Sep 7, 2008 at 9:56 PM, <ihab...@gmail.com> wrote:
> Do you use XMLHttpRequest to get module code? If so, how do you manage to block?

The third parameter of XMLHttpRequest.open is whether to be
asynchronous. true = nonblocking, false = blocking. If you opt to
block, the send method waits. Whether onreadystatechange fires in
blocking mode is implementation dependent, but it's considered
standard for it to be called at least once.

>> export "urn:myModule" {
>> var {a} = import("module.js");
>> this.b = 10;
>> }
>
> Cool, but does that write to a global namespace that the clients of
> the module rely on? How many copies of this module are running? How
> many versions? Who trusts it? What if it tramples on the namespace of
> the "real" urn:myModule?

I agree that this is a major problem. I don't think we can live
without it though. Minimizing round-trip-times for HTTP requests is
still really important especially as the web shifts to centralized
services and potentially TCP jumbo-frames. In a high-latency
environment, getting all of your data in one round-trip can save a lot
of time.

>> ... we probably should ask Brendan ...
>
> Ahem, you mean ECMA TC39, right? };->

*shy* yeah.


>> For Caja's purposes, I imagine that "register" would be a microkernel
>> function ...
>
> No, Caja just builds first-class modules out of whatever loader you
> give it. There is no global registration of anything.

We have to either reconcile the need for module bundling for
performance and the need for module isolation for security, or we must
chose one or the other. (I'm pretty sure that's a tautology; I would
not want to set up a false dilemma.)

Perhaps we could provide strong location-based convention for trust.
I know that's bitten us in the butt before and we haven't quite gotten
it right with any particular technology like the various Flash
conventions or various XHR conventions.

Perhaps we should just ditch "registration" for environments in which
security is possible to attain, that is to say, contemporary
JavaScript module loaders using Caja, and future native module
loaders. The register function could be the exclusive purview of
contemporary JavaScript module loaders. Sounds good to me. In fact,
I may have already provisioned for the possibility by specifying that
"register" and "publish" MAY be provided by some module loaders but
not others.

Kris Kowal

Kris Kowal

unread,
Sep 8, 2008, 2:57:12 AM9/8/08
to helm...@googlegroups.com
On Sun, Sep 7, 2008 at 10:01 PM, <ihab...@gmail.com> wrote:
>
> On Sun, Sep 7, 2008 at 10:50 AM, Peter Michaux <peterm...@gmail.com> wrote:
>> I don't like augmenting "this" with the exports. ...
>> (function(){
>> // ...
>> global.fn = function(){};
>> })()
>
> Interesting!

Yeah. In the specification I've outlined on the wiki, I recommend
using "module" and "local" to explicitly localize or export a
function. "module" is the same as "this" in the top-scope. I agree
that "module", "global", and "local" are all much more explicit than
"this", and I would encourage their use too. Sparing that the
closure-called-once is free in the specified module system, that
particular expression would be equivalent to:

global.fn = function () {};

And, if you wanted to follow the rule of not modifying the global
object and rather export functions by adding them to the module:

module.fn = function () {};

I'm only holding out for the provision of "this" == "module" still
because of that neat migration path for jQuery I described.

>> I thought Brendan made it clear somewhere that the
>> getting-the-code part will not be part of ECMAScript.
>
> If I understand correctly, that was the way people on es-discuss were leaning.

I think we're all in agreement. "fetch" should be implementation-specific.


>> Module loads should not be asynchronous. It is too complex and all of
>> this loading business is not really related to the idea that a module
>> is a language construct related to variable scope.
>
> Well not really -- if that's all it is, you can declare everything in
> one big file and be done with it, and use lexical scoping like it was
> meant to be used. Anything different, to the extent of the difference,
> is just a "#include" in your code that you can easily write a
> pre-processor to resolve.
>
> As I see it, a module system for the broader ES effort is about how
> independently written, perhaps mutually suspicious pieces of code can
> be composed at run-time safely.

Well said!

Kris

Kris Kowal

unread,
Sep 8, 2008, 3:01:51 AM9/8/08
to helm...@googlegroups.com
> In the end,
> I don't think it should matter too much what other language do other
> than learning from their mistakes. JavaScript is in a very
> weird/unique niche.

Agreed. Let's stick to citing requirements rather than language
precedent from now on.

> A proxy server can rewrite content and a proxy
> server is the actual physical man-in-the-middle. Therefore no
> JavaScript will be secure

This kind of attack requires DNS cracking. It isn't impossible, but
it is pretty hard, and it's a separate matter.

> I don't think security is an
> issue for modules to solve.

Not to belabor the point or anything, but I think we can move beyond
discord for these basic issues.

* Contemporary JavaScript is not secure at all.
* It can be made secure with server-side static analysis and injected
client-side assertions.
* Future JavaScript should make it easier to maintain security
invariants client-side.
* In the long term, future JavaScript should obviate the need for
server-side static analysis and injected client-side assertions.

1. For the purpose of developing a client-side module loader with
contemporary JavaScript, security is not a concern.
2. For the purpose of standardizing a server-side module system for
future versions of JavaScript, not introducing new security holes and
difficulties for static-analysis or runtime security assertions are
valid concerns.
3. In so far as that a client-side module loader with contemporary
JavaScript can provide an easy path of migration to a future, native
JavaScript module loader, the client-side module loader standard
should be patterned after the future JavaScript module loader
standard.
4. By premises (2) and (3), we can infer that security issues should
effect patterns that influence the contemporary client-side module
loader standard, even though they will not afford any level of
security in contemporary JavaScript without static analysis and
runtime assertions.

> In a web page, there will always be the need for at least one script
> tag to bootstrap. That is how HTML works. In production that one tag
> will load all the code so I don't see that there is any phasing out of
> <script>.

I believe in the first afterward of the draft specification I
deliberately addressed this issue. The specification does not
constrain the manner in which "main" modules are invoked. This can be
done with a <script>, as it is with modules.js. I don't believe that
I've suggested that <script> should be phased out, merely that its use
for loading modules should be reduced to a single latch-point with
contemporary JavaScript Userland module loaders. I imagine that
future versions of JavaScript could replace the <script> tag with a
<module> tag or <scriprt variant> to enforce a consistent use of
JavaScript modules and avoid the can of worms that globally evaluated
<script> tags, especially inline <script> tags, introduce, but it
isn't a pragmatic option, nor necessary if web developers choose to
use a secure native modules.

Kris

ihab...@gmail.com

unread,
Sep 8, 2008, 10:27:56 AM9/8/08
to helm...@googlegroups.com
On Sun, Sep 7, 2008 at 11:24 PM, Kris Kowal <cowber...@gmail.com> wrote:
> I think that contemporary JavaScript modules could be written in such
> a way that they do not depend on whether the module loader opts to
> give them a unique object, a snapshot, or a singleton.

Hm. But contemporary JS modules aren't. :)

> In
> contemporary JavaScript, it's important that modules be stateless so
> that they are agnostic to this feature.

Again, contemporary modules aren't.

> Then, in a
> native module loader provided by a future version of JavaScript, the
> modules could be copy-on-write snapshots.

The whole idea is to come up with that native module loader's
semantics (or say that we don't need to standardize them). All else is
worthy -- but nonstandard -- solutions.

ihab...@gmail.com

unread,
Sep 8, 2008, 12:52:09 PM9/8/08
to helm...@googlegroups.com
Fwiw --

On Mon, Sep 8, 2008 at 12:01 AM, Kris Kowal <cowber...@gmail.com> wrote:
> * Contemporary JavaScript is not secure at all.
> * It can be made secure with server-side static analysis and injected
> client-side assertions.
> * Future JavaScript should make it easier to maintain security
> invariants client-side.
> * In the long term, future JavaScript should obviate the need for
> server-side static analysis and injected client-side assertions.
>
> 1. For the purpose of developing a client-side module loader with
> contemporary JavaScript, security is not a concern.
> 2. For the purpose of standardizing a server-side module system for
> future versions of JavaScript, not introducing new security holes and
> difficulties for static-analysis or runtime security assertions are
> valid concerns.
> 3. In so far as that a client-side module loader with contemporary
> JavaScript can provide an easy path of migration to a future, native
> JavaScript module loader, the client-side module loader standard
> should be patterned after the future JavaScript module loader
> standard.
> 4. By premises (2) and (3), we can infer that security issues should
> effect patterns that influence the contemporary client-side module
> loader standard, even though they will not afford any level of
> security in contemporary JavaScript without static analysis and
> runtime assertions.

I agree with all the above; it's well stated. Thanks.

ihab...@gmail.com

unread,
Sep 8, 2008, 1:00:49 PM9/8/08
to Kris Kowal, helm...@googlegroups.com
On Sun, Sep 7, 2008 at 11:47 PM, Kris Kowal <cowber...@gmail.com> wrote:
> The third parameter of XMLHttpRequest.open is whether to be
> asynchronous.

Ah.

>> Cool, but does that write to a global namespace that the clients of
>> the module rely on? How many copies of this module are running? How
>> many versions? Who trusts it? What if it tramples on the namespace of
>> the "real" urn:myModule?
>
> I agree that this is a major problem. I don't think we can live
> without it though. Minimizing round-trip-times for HTTP requests is
> still really important especially as the web shifts to centralized
> services and potentially TCP jumbo-frames. In a high-latency
> environment, getting all of your data in one round-trip can save a lot
> of time.

These are orthogonal. If you want to pre-fetch modules, you go through
your code looking for calls to the module fetcher schmivit and
replacing them with other schmivits that return a closure or an object
or what not that is initialized from literal text that you autoslurp
from whatever the fetcher schmivit would have fetched at run time.

The following --

export "urn:myModule" {
var {a} = import("module.js");
this.b = 10;
}

seems to define in a *standard* the idea that the urn: namespace or
what not is hard-coded as a global space. My point is that you don't
need it and shouldn't do it.

Does this explain things adequately? If not we can try again over chat
or whatever. I may be missing something.

> Perhaps we should just ditch "registration" for environments in which
> security is possible to attain, that is to say, contemporary
> JavaScript module loaders using Caja, and future native module
> loaders. The register function could be the exclusive purview of
> contemporary JavaScript module loaders. Sounds good to me. In fact,
> I may have already provisioned for the possibility by specifying that
> "register" and "publish" MAY be provided by some module loaders but
> not others.

Well, I would restate that: the module fetcher takes *some* data
telling it what to fetch. Each module is vulnerable to its fetcher.
Each application is vulnerable to ts static analysis. But the
parameters of the fetcher need not be standardized at this point -- at
least not by ES.

Peter Michaux

unread,
Sep 8, 2008, 1:14:46 PM9/8/08
to helm...@googlegroups.com
On Sun, Sep 7, 2008 at 9:32 PM, <ihab...@gmail.com> wrote:
>
> a good module
> system would permit one to execute foreign code while placing a
> boundary around the effects it can cause.

So you want there to be something like a "Caja mode", more restrictive
than "strict", where the loader would limit the environment radically?

Do you envision this as part of the ECMAScript standard or as part of
a host standard (e.g. W3C)? I think this would be a host standard as
many of the features needing to be controlled are outside the
ECMAScript standard.

Peter

ihab...@gmail.com

unread,
Sep 8, 2008, 1:29:53 PM9/8/08
to helm...@googlegroups.com
On Mon, Sep 8, 2008 at 10:14 AM, Peter Michaux <peterm...@gmail.com> wrote:
> On Sun, Sep 7, 2008 at 9:32 PM, <ihab...@gmail.com> wrote:
>> a good module
>> system would permit one to execute foreign code while placing a
>> boundary around the effects it can cause.
>
> So you want there to be something like a "Caja mode", more restrictive
> than "strict", where the loader would limit the environment radically?

That is a useful way to think about it; the point is, what is the default?

> Do you envision this as part of the ECMAScript standard or as part of
> a host standard (e.g. W3C)? I think this would be a host standard as
> many of the features needing to be controlled are outside the
> ECMAScript standard.

Well, the specific *objects* passed around are outside ES, but the
fact of whether the loaded module automatically inherits its client's
globals is definitely in ES. Again, what's the default, and how can it
be made to work for everyone?

Do we standardize two native loaders, a "strict" and a "loose" one,
such that security critical environments just require the "strict"
version?

How does that inform the issue of how we pass imports and exports
across the loading boundary?

Peter Michaux

unread,
Sep 8, 2008, 2:18:37 PM9/8/08
to helm...@googlegroups.com
On Mon, Sep 8, 2008 at 10:29 AM, <ihab...@gmail.com> wrote:
>
> On Mon, Sep 8, 2008 at 10:14 AM, Peter Michaux <peterm...@gmail.com> wrote:
>> On Sun, Sep 7, 2008 at 9:32 PM, <ihab...@gmail.com> wrote:
>>> a good module
>>> system would permit one to execute foreign code while placing a
>>> boundary around the effects it can cause.
>>
>> So you want there to be something like a "Caja mode", more restrictive
>> than "strict", where the loader would limit the environment radically?
>
> That is a useful way to think about it; the point is, what is the default?

I don't know that there should be a default. In a server-side
environment there is really no need for a "Caja mode" or some other
security mode. It is on the client-side where there seems to be
concern about security.

>> Do you envision this as part of the ECMAScript standard or as part of
>> a host standard (e.g. W3C)? I think this would be a host standard as
>> many of the features needing to be controlled are outside the
>> ECMAScript standard.
>
> Well, the specific *objects* passed around are outside ES, but the
> fact of whether the loaded module automatically inherits its client's
> globals is definitely in ES.

Why is it "definitely"? If the module loader is outside of ECMAScript
it should do anything the host decides it should do, shouldn't it?

> Again, what's the default, and how can it
> be made to work for everyone?
>
> Do we standardize two native loaders, a "strict" and a "loose" one,
> such that security critical environments just require the "strict"
> version?

If a single loader is to be used by every host then maybe it is
solvable and maybe not. Will we know all loading concerns for all
types of hosts?

If the module loader is outside of ECMAScript then, of course, it
doesn't have to work for every host.

Why should a single loader be used by every host? In Java, I believe
different runtimes have different class loaders appropriate to the
environment. There is just some standard class loader interface, isn't
there? I have never really looked into this part of Java.

> How does that inform the issue of how we pass imports and exports
> across the loading boundary?

I'm not sure.

----

The idea that a loader is more of a W3C-type spec that is
browser-specific is appealing rather than a ECMAScript spec. Script
loading has always been in the host realm anyway.

Peter

ihab...@gmail.com

unread,
Sep 8, 2008, 2:37:40 PM9/8/08
to helm...@googlegroups.com
On Mon, Sep 8, 2008 at 11:18 AM, Peter Michaux <peterm...@gmail.com> wrote:
> In a server-side
> environment there is really no need for a "Caja mode" or some other
> security mode. It is on the client-side where there seems to be
> concern about security.

I would argue otherwise. The need to run suspicious code in a sandbox
is true on the server or the client. How do you host plug-ins to your
server-side framework without implicitly adding them to your TCB?

>> Well, the specific *objects* passed around are outside ES, but the
>> fact of whether the loaded module automatically inherits its client's
>> globals is definitely in ES.
>
> Why is it "definitely"? If the module loader is outside of ECMAScript
> it should do anything the host decides it should do, shouldn't it?

I mis-spoke: Definitely in ES to the extent that any module stuff is.
:) That said, the module *fetching* is outside ES. Parsing to a module
object and creating the calling convention for using it could likely
end up in ES, given (among other things) lack of 'with' in ES 3.1
strict.

> If the module loader is outside of ECMAScript then, of course, it
> doesn't have to work for every host.

True.

> Why should a single loader be used by every host?

It doesn't. As I mentioned before, if the solution is simply to say
that ES should not standardize any of this business, then that's cool.

... Again, of course, modulo the lack of 'with' in ES 3.1 strict and
how that affects the feasibility of userland loaders. ...

Peter Michaux

unread,
Sep 8, 2008, 9:40:33 PM9/8/08
to helm...@googlegroups.com
On Mon, Sep 8, 2008 at 11:37 AM, <ihab...@gmail.com> wrote:
>
> On Mon, Sep 8, 2008 at 11:18 AM, Peter Michaux <peterm...@gmail.com> wrote:
>> In a server-side
>> environment there is really no need for a "Caja mode" or some other
>> security mode. It is on the client-side where there seems to be
>> concern about security.
>
> I would argue otherwise. The need to run suspicious code in a sandbox
> is true on the server or the client. How do you host plug-ins to your
> server-side framework without implicitly adding them to your TCB?

True if the server-side plug-ins are from an untrusted source.

I suppose if you want users to generate server-side code (i.e.
untrusted) then you would need a sandbox. I don't think I'd trust any
sandbox to be well enough constructed for this purpose on the server.

If they are plugins from say, http://helma.org/plugins/, then I would
probably just trust them. I trust code from CPAN and from the Debian
main package repository and so do thousands of programmers and system
administrators. At some point there has to be trust or I will be
reading an awful lot of source code I'm not qualified to evaluate for
security.

Peter

Mark Miller

unread,
Sep 8, 2008, 10:03:57 PM9/8/08
to helm...@googlegroups.com


The word "trust" often causes confusing intuitions. Our use of the term in computer security differs from the natural language in various ways. I often find it clarifying to rephrase statements involving "trust" in terms of vulnerability relationships. Before I respond, would you agree that the following rephrasing adequately captures your intended meaning? If not, could you try to rephrase without using the word "trust"? Thanks.


True if I do not wish to be vulnerable to the source of the server-side plug-ins.


I suppose if you want users to generate server-side code (i.e.
code that the server is not vulnerable to) then you would need a sandbox. I don't think I'd be willing to be vulnerable to any
sandbox being well enough constructed for this purpose on the server.


If they are plugins from say, http://helma.org/plugins/, then I would
probably just be vulnerable to them. I am vulnerable code from CPAN and from the Debian
main package repository and so are thousands of programmers and system
administrators. At some point we have to risk vulnerability or I will be

reading an awful lot of source code I'm not qualified to evaluate for
security.


--
Text by me above is hereby placed in the public domain

Cheers,
--MarkM

ihab...@gmail.com

unread,
Sep 8, 2008, 11:01:40 PM9/8/08
to helm...@googlegroups.com
On Mon, Sep 8, 2008 at 6:40 PM, Peter Michaux <peterm...@gmail.com> wrote:
> I suppose if you want users to generate server-side code (i.e.
> untrusted) then you would need a sandbox. I don't think I'd trust any
> sandbox to be well enough constructed for this purpose on the server.

Really? What if you were sure that each module -- and all its
descendants -- only had access to the symbols you explicitly provide;
you spend a long time working out the public interface of these
symbols; and no two instances of the module have any communication
channel between them that you did not provide?

> If they are plugins from say, http://helma.org/plugins/, then I would
> probably just trust them. I trust code from CPAN and from the Debian
> main package repository and so do thousands of programmers and system
> administrators. At some point there has to be trust or I will be
> reading an awful lot of source code I'm not qualified to evaluate for
> security.

Indeed. But sandboxing the plugins gives you the ability to reason
about the worst case of the effects they can have.

Peter Michaux

unread,
Sep 8, 2008, 11:38:21 PM9/8/08
to helm...@googlegroups.com

I'd say that has the same general intent.

Peter

Peter Michaux

unread,
Sep 8, 2008, 11:53:28 PM9/8/08
to helm...@googlegroups.com
On Mon, Sep 8, 2008 at 8:01 PM, <ihab...@gmail.com> wrote:
>
> On Mon, Sep 8, 2008 at 6:40 PM, Peter Michaux <peterm...@gmail.com> wrote:
>> I suppose if you want users to generate server-side code (i.e.
>> untrusted) then you would need a sandbox. I don't think I'd trust any
>> sandbox to be well enough constructed for this purpose on the server.
>
> Really? What if you were sure that each module -- and all its
> descendants -- only had access to the symbols you explicitly provide;
> you spend a long time working out the public interface of these
> symbols; and no two instances of the module have any communication
> channel between them that you did not provide?

I'd still be nervous. "A long time" doesn't mean I did a good job.

Peter

ihab...@gmail.com

unread,
Sep 9, 2008, 12:07:20 AM9/9/08
to helm...@googlegroups.com
On Mon, Sep 8, 2008 at 8:53 PM, Peter Michaux <peterm...@gmail.com> wrote:
>> Really? What if you were sure that each module -- and all its
>> descendants -- only had access to the symbols you explicitly provide;
>> you spend a long time working out the public interface of these
>> symbols; and no two instances of the module have any communication
>> channel between them that you did not provide?
>
> I'd still be nervous. "A long time" doesn't mean I did a good job.

True. So you use a few strategies to mitigate your risk --

- The Principle of Least Authority (POLA). You give untrusted code the
Least Authority it needs to do its job. Since even trusted code can be
subverted into doing evil things by sufficiently evil untrusted code,
you should really apply POLA to all code. Fortunately, this leads to
precisely the same sorts of designs that the Dependency Injection
people say are good for maintenance and testability.

- Third party security reviews. The Caja team had one recently, and
the reviewers filed truly many dozens of bugs against us. Yet we fared
better than we feared we would, and I recommend this process to anyone
else writing critical code.

Reply all
Reply to author
Forward
0 new messages