For the last week, I've been participating in a long discussion with people from an "Helma-NG" server-side JavaScript project which seems well positioned to take advantage of a variation on the module system I've drafted and as such would be able to share modules between client- and server-side module systems, potentially with an easy migration path to ES-H. There's a Wiki page on the Helma-NG site where the standard proposal is evolving with the discussion. Please join in, if you have the time.
There's been some discussion over whether to use Python/Java-style identifiers or URI's to reference modules. I'm starting in the URI camp, but willing to concede to the desires of the Helma-NG crowd. However, your proposal contains some ideas that I think strongly favor using URI's. I particularly like the idea of using "urn:" URI's to identify modules not corresponding to HTTP resources. Chiron's module loader, modules.js, already provides certain resources that would be better served from "urn:" identifiers since they're preloaded. I imagine that many module loader systems would have need of special-case URI's. Just as an example, Hemla-NG could benefit from a Java URN scheme for loading names from Java modules like "urn:java:module.name".
I'm not a fan of preserving eval's mongrel exec/eval semantics with the module loader. In my opinion, eval should only evaluate expressions and exec should only evaluate statements. Implicitly returning the module exports reminds me of Perl's convention for modules where the last evaluated expression reports whether the module successfully loaded or not (exceptions, Larry?). I very much prefer using the Object constructor notation where the module author adds exports to "this" and uses "var" or "let" for locals.
2.2 - Loading the module
I think that it would be a shame if module loader functions were anything but top-scope functions. I strongly favor "import" for the ES3.1 standard, since it's already a reserved keyword with behavior on which no-one depends.
2.4 - Asynchronous module loading
This is a point where I think that an ES-3 native implementation can help a lot. Plain and simple: JavaScript developers shouldn't have to code in continuations to manage asynchronous module loads. A native implementation should sleep a JavaScript thread on an "import" statement and perhaps speculatively load any modules imported later down in the program. In a Userland implementation, I find that using exclusively blocking "import" directives greatly simplifies the program, to the point that I don't have the time or mind-share to do it any other way. Furthermore, I find that there is no need to do it any other way. When I am operating in a debug mode, I do not mind going to an HTTP request for each individual module file synchronously. When I am operating in production mode, one of the steps I have to perform is a server-side compilation of my modules that bundles all of my scripts and their dependencies into a single module file with no need to change any of my source code or HTML. So, in this situation, the module loader never blocks on an HTTP request: the module constructor has already been downloaded whenever you arrive at an "import" call.
3.2 - Standardization Issues - Loader Support
You mention that scope chain manipulation would require native support. I've found that I can get around needing a lot of native support simply by fiddling with the scope chain at very low performance cost. On the flip-side, I am very skeptical about doing any pre-processing on the client-side. As Mark said when we last met, it would invite a problem of cascading virtualization.
With scope chain manipulation, porting jQuery to Chiron is a two-line patch, with which I completely box jQuery in a module.
Let me illuminate some of the techniques.
To prevent "eval" from capturing any variables from the module loader's scope chain, i declare an anonymous global-scope eval function that binds _no_ names except "arguments".
function () { return eval(arguments[0]);
}
I then pass the global eval function into my module loader's enclosure, binding the function to an argument of the module loader's scope (which is to say, not available from the scope in which evalGlobal is declared).
(function (evalGlobal) {
})(function () {
return eval(arguments[0]);
});
Then, to construct the scope chain I described in my proposal, I create an evalGlobalWith function inside my module loader that injects scopes into the scope chain inside evalGlobal using another eval in the text of the evalGlobal, a with block, and an anonymous function:
(function (evalGlobal) { var evalGlobalWith = function (text, locals, module) { return evalGlobal( """ with (arguments[3]) { with (arguments[2]) { (function () { eval(arguments[0]); }).call(this, arguments[1]) } } """, text, locals, module ); };
})(function () {
return eval(arguments[0]);
});
I think you'll find that this technique effectively, completely, and predictably controls the scope chain of the module program, all with contemporary JavaScript with very little performance cost. I hope that helps turn the balance of your value judgments. We actually can do a lot without client-side parsing.
3.3 - Module Format
Consider the module environment I'm proposing, where adding properties to "this" expresses an export. This permits a very clean migration path from current ES, to current ES with a module loader implementation I'm describing, to ES with a module loader provided by the browser.
Take jQuery, for example. The only pity with modern JavaScript is that "window", "self", and "this" all refer to the same object, even though their meanings are distinct. Even with this problem, jQuery manages to almost always use "window" when it wants "window" and "this" when it wants to export a variable (to global scope). Adding the jQuery object to "this" works today. When I subvert jQuery into my module system, I simply change two occurrences of "window" to "this", and suddenly jQuery has no impact on global scope when it's used in the module loader, but still works if it's imported as a global <script> tag. By extension, if we preserve these semantics in a native module loader, most libraries will not have to change. As older browsers fall out of fashion, the maintainers of those modules will be able to remove their module-pattern boilerplate and explicate their dependencies as a gradual optimization.
3.4 - Module metadata
I believe that the kind of metadata you're describing (dependencies and versioning I presume) could easily be generated through static analysis of "import" statements inside the program and assignment to standard variables, like "local.version". To that end and in the interest of not requesting that module authors repeat themselves needlessly, I think that module metadata should be generated programatically and can safely be implementation specific.
4.1 - Alternatives and rationale - Explicit Module Syntax
I actually have need for explicit module exports in Chiron. I use a script bundler that "concatenates" modules to reduce HTTP requests. In my proposal, I've reserved "register" for this purpose. ES-H could easily use it's reserved "export" keyword to perform the same function, thus providing a migration path from user-space module loaders to native module loaders. I imagine it would look like:
export "urn:myModule" { var {a} = import("module.js"); this.b = 10;
}
desugaring to the same notation as a Userland implementation:
register("urn:myModule", function () { var {a} = import("module.js"); this.b = 10;
});
To get the most ! for our $, we probably should ask Brendan for a Function.detach(), or something of the kind, to null out the closure's scope reference, thus permitting us to guarantee that the registered functions have laundered scope chains, but for our transient, potentially insecure purposes, living without this is acceptable.
For Caja's purposes, I imagine that "register" would be a microkernel function that, like the use of "eval" and "with", is restricted to the cajoler. This would put Caja in a place of responsibility to perform script bundling and compression server-side. You're well placed to do that anyway, until ES-3.1-H, as I believe we all hope, obviates the need for Caja in the long-term.
4.2.1 - Import and Export Mechanisms
This particular counter-proposal is very similar to mine. The only difference is the introduction of a head-most function block scope that intercepts "var" and "function" declarations, making them implicitly local to the module and preventing them from being exported. Very early on, I tried to make the scope chain you are proposing work in Userland. However, the only way to capture exports from the head of the scope chain on the client-side is to use a "with" block. Now, most unfortunately, inner with blocks don't work consistently across all browsers, and more importantly, in the browsers that are standards compliant, you can't capture a "var" on the head of the scope chain if it's a "with" block: the declaration forwards to the head-most function block scope (usually above the with block) and the assignment gets evaluated in the context of the "with". This leads to some VERY strange behavior. Here's an excerpt of me
...
On Sat, Sep 6, 2008 at 11:36 PM, Kris Kowal <cowbertvon...@gmail.com> wrote:
[snip]
> I'm not a fan of preserving eval's mongrel exec/eval semantics with > the module loader. In my opinion, eval should only evaluate > expressions and exec should only evaluate statements. Implicitly > returning the module exports reminds me of Perl's convention for > modules where the last evaluated expression reports whether the module > successfully loaded or not (exceptions, Larry?). I very much prefer > using the Object constructor notation where the module author adds > exports to "this" and uses "var" or "let" for locals.
I don't like augmenting "this" with the exports. It is very cryptic and would certainly confuse a non-expert that doesn't understand that "this" can mean the global object, module object, or an oop object and need to figure out which it means in each particular case. I believe that is part of the reason for adding the "global" global variable (similar to window) to access the global object. It is much better to read
(function(){ // ... global.fn = function(){};
})()
Than it is to read
(function(){ // ... this.fn = function(){};
})()
and the above is why I currently write the same first line inside almost all of my (function(){})() expressions.
(function() { var global = this; //... global.fn = function(){};
})(); > 2.2 - Loading the module
> I think that it would be a shame if module loader functions were > anything but top-scope functions. I strongly favor "import" for the > ES3.1 standard, since it's already a reserved keyword with behavior on > which no-one depends.
It seems the loader functionality contains two things: getting the code (e.g. HTTP, etc) and executing the code in the correct scope of the program. I thought Brendan made it clear somewhere that the getting-the-code part will not be part of ECMAScript.
> 2.4 - Asynchronous module loading
> This is a point where I think that an ES-3 native implementation can > help a lot. Plain and simple: JavaScript developers shouldn't have to > code in continuations to manage asynchronous module loads.
Module loads should not be asynchronous. It is too complex and all of this loading business is not really related to the idea that a module is a language construct related to variable scope.
> A native > implementation should sleep a JavaScript thread on an "import" > statement and perhaps speculatively load any modules imported later > down in the program.
Very hard to determine in a dynamic language as the modules to be imported later may not be statically coded as expected.
> In a Userland implementation, I find that using > exclusively blocking "import" directives greatly simplifies the > program, to the point that I don't have the time or mind-share to do > it any other way. Furthermore, I find that there is no need to do it > any other way. When I am operating in a debug mode, I do not mind > going to an HTTP request for each individual module file > synchronously. When I am operating in production mode, one of the > steps I have to perform is a server-side compilation of my modules > that bundles all of my scripts and their dependencies into a single > module file with no need to change any of my source code or HTML. So, > in this situation, the module loader never blocks on an HTTP request: > the module constructor has already been downloaded whenever you arrive > at an "import" call.
Bundling scripts is an essential capability. When you bundle the scripts, the modules suddenly are self-naming. You posted the following code before.
# register("module1.js", function () { # with (this.moduleScope) { # with (this) { # (function () { # … # }).call(this); # } # } # }); # # register("module2.js", function () { # with (this.moduleScope) { # with (this) { # (function () { # … # }).call(this); # } # } # });
If modules need to be self-naming in any way (e.g. like above) at any time (e.g. in production where it matters most) then I think it is better to admit that self-naming is inescapable and make them self naming all the time.
[snip]
> To get the most ! for our $, we probably should ask Brendan for a > Function.detach(), or something of the kind, to null out the closure's > scope reference, thus permitting us to guarantee that the registered > functions have laundered scope chains, but for our transient, > potentially insecure purposes, living without this is acceptable.
This is undoing lexical scoping. I don't think that is a good idea. When a reader is looking at a function definition he should be able to depend on the surrounding code's usual capture in the closure. If the detach call happens far away then it would lead to very difficult debugging.
[snip]
-------------
There is an assumption that some problem that developers are having currently can be solved with "a module system". Unfortunately there is no detailed description of that problem developers are having currently in either of the following proposals. Without such motivation, there is no way to judge if a particular proposal achieves its goals a simply as possible or if the language can already do what is needed.
I would really like to know exactly what problems these proposals are trying to solve for the average developer. The Caja proposal essentially looks like a file is a constructor function which the language already has. Kris' proposal looks like it can essentially be done simply by using (function(){})() and not forgetting any "var" as the whole idea of an anonymous module breaks down when bundling scripts for production. As far as I know the loading (e.g. HTTP, etc) is out of scope for ECMAScript (though could still be implemented in libraries), complex for authors (worrying about async), loading tiny bits of code at a time is expensive in connections and could cause delays in responses to user actions. A really clear statement of what problem cannot already be solved easily and are solved by the proposals would be very helpful.
I'm not anti-module. As little as I understand Perl modules, I like how they seem to work.
I also like the module pattern in JavaScript and use it all the time. I can write the following in JavaScript and then continue on my merry way having all the namespacing-collision protection that I need.
var ca_michaux_fooModule = {}; (function() { ca_michaux_fooModule.a = function(){};
})();
or
var ca_michaux_fooModule = {}; (function() { var mod = ca_michaux_fooModule; mod.a = function(){};
})();
That gives me private scope, explicit exports, naturally easy script bundling, no text-processing of the code, it is less than 20 characters (i.e. almost nothing) of boilerplate for the entire concept of variable scoping, and nothing new to learn. Because a module is usually large (>100 lines) I don't see the desperate need to reduce the above boilerplate. This is not a situation like the desugaring of classes where the desugared version is very bulky and complex to read and the sugared version is very light and easy to read.
Some concepts of "module" in other languages allow for reopening a module and modifying it (i.e. monkey patching). That is something I have not seen suggested for JavaScript modules. I'm not necessarily in favor of reopening modules. I haven't seen anything in the proposals for JavaScript modules that is as revolutionary as being able to reopen a module. It seems like the few lines I wrote above with the module pattern basically cover all the bases trying to be covered.
Here's my position regarding module systems. I will try to reply in detail to your other, quite thought provoking posts separately. And please note that my proposal was and remains an early draft.
As for whether we need a standard ES module system at all, I am fully in support of not having one. Most pressingly, I simply want to make sure that, _if_ any module proposal is adopted, it does not get in the way of features that I consider important. These are:
1. Module code evaluated in a strictly isolated context, with imports and exports under client control. This allows secure isolation of untrusted code, and clean isolation of dependencies via dependency injection patterns.
2. Modules multiply instantiable, allowing one parsed version of the code to be re-used to create several instances with guaranteed isolation between them. This is further in support of secure isolation and dependency injection.
3. Modules manipulated as first-class entities independently of how they are named. This allows for code to manipulate the code and instances of more than one version of a module. That in turn stems from wanting an ES system to be able to deal gracefully with more and more complex mashups of independently developed code, as it is already beginning to do.
Again, as noted by Peter and Kris, there is a lot that can be done without a native modules spec in ES (though I would note that ES 3.1 strict does not permit the 'with' construct; that might get in the way of some of the techniques).
Peter has called for a statement of purpose. Ihab's statement of purpose is a good place to start. I'll iterate it here instead of starting from scratch.
---
As I was writing up this "Statement of Purpose", one word came up pretty frequently: "sovereignty". Since I'm using it to imply "desirability", I will define it here. Sovereignty is the right of self-determination. In the context of a name space, albeit the local scope of a function, the scope of a module, the global scope, or the names of files in a directory tree, sovereignty is the ability to chose all of the names that are in your scope. Sovereignty, to whatever extent it is possible to grant, is desirable because it creates an operational box where an author can chose names that do not collide with one another. It also creates a security box, one where every name was explicitly created in that domain such that it cannot be subverted by an attacker.
Module systems allow module authors to completely control their local scope. This doesn't just mean isolating private variables. This means that module authors control the variable names by which other modules and their contents are referred. For example, with a module system, two modules can both export "encode" and "decode" functions. If a module author only needs to use one of these modules, they have the option of bringing them directly into the local scope. If the programmer is using both modules, they can import them both and bind them to different names such that, say, "base64.encode" and "utf8.encode" can be referenced in the same program. A module system gives you ease and flexibility of controlling the estuary of variables.
The purpose of a module loader should, given value judgements between internal tensions, perform the following duties:
* Linearize the execution of recursive module dependencies. A module should be sovereign over its direct dependencies. This means that a module author should NOT need to inspect or mention the recursive dependency tree of its own dependencies. The topological linearization of the execution of a module and all of its recursive dependencies should be the sole purview of the module system. In order for a module to be sovereign over its own dependencies, it must not be beholden to the open range of modules that depend upon it to maintain a particular dependency tree. * Orthogonalize name-spaces into three, sovereign domains: * file names relative to the module root directory, * file names inside a module directory, * variable names inside a module. * Encapsulation of private, local variables. This is necessary for modules to be secure "capability objects". This is also necessary to ensure sovereignty of variable names inside a module. * Encapsulation of other private module objects. This is necessary for modules to receive and not share capabilities in a security model. This is also necessary to ensure sovereignty over variable names inside a module. * Flexible dependency injection patterns. A module author should have the ability to easily extract names from a foreign, untrusted module object and bind them to names of their choice. This is desirable because it ensures that people will take advantage of their ability to control the sovereign name-space of a module. It also permits module authors to make their code as brief as possible while also balancing that need against being as explicit as necessary to avoid name collisions. Also, injecting module objects or properties of module objects into the scope chain can improve performance because it reduces lengthy property lookups. A module should never depend on the name another module chose to expose its API on through a common global scope. Doing so invites opportunities for malicious modules to perform "man-in-the-middle" attacks by subverting names exposed by other modules. Global object sharing also increases coupling; since a module author does not control its scope, they are privy to other module author's fickle naming conventions and inter-module name collisions. * Modules provided as capability objects. This permits dependent modules to isolate untrusted code and control the passage of information to that module through its exported API. * Module versioning. Modules should be possible to request by version or version range in response for the need to mash-up scripts from multiple sources each requiring multiple versions of the same library. A strong versioning system would require modules to request modules by version range and limit those versions to ones that have already been published such that new versions must be explicitly qualified. * Modules should be stateless. Modules objects should not be a vector for untrusted modules to obtain data from other modules. In module systems that can guarantee security invariants, module objects should be instantiated uniquely for each dependee. If possible, the module object should be unique and any internal state should be "copy-on-write" to ensure good performance and separation, much like virtual memory shared between a process and its forked children. * A module system should abstract the process of loading, parsing, managing execution context, and executing module code. There should be no constrain on the mechanism by which these are performed, in particular: * whether a particular transport like local files, HTTP, or in-memory objects are used to host or produce modules. * whether a scope chain or prototype chain is used to produce a particular lookup order for the top scope object. * whether a particular function is used to evaluate code. * whether modules are pre-parsed. * whether module objects are singleton, memoized, or re-instantiated. These issues depend on performance and whether security invariants can be maintained in the host environment.
> Again, as noted by Peter and Kris, there is a lot that can be done > without a native modules spec in ES (though I would note that ES 3.1 > strict does not permit the 'with' construct; that might get in the way > of some of the techniques).
Whoo! That make my heart flutter a bit. Without "with", I think the importance of having the browser subsume the responsibility of managing modules ticks up a few notches! I hope that ECMAScript enforces strict compliance of the "with" block dynamically so I can try it to catch an exception as a signal to defer to the native module system, which hopefully provides the same interface as a Userland implementation.
On Sun, Sep 7, 2008 at 4:14 PM, Kris Kowal <cowbertvon...@gmail.com> wrote:
I think it is a good idea to deal with the following issue first and I think it is a show stopper to the anonymous modules and similar ideas. (It is not my fault an HTTP connection and download is the most expensive part of building a client-side app.)
> A module should never depend on the name another module chose to > expose its API on through a common global scope.
When a module chooses its own name to expose its API, I call this a "self-naming module." Modules are self-naming in many/most(/all?) languages.
How can the above quoted requirement even be done efficiently? As I've written several times now, multiple modules need to be able to exist in one file for efficient download. "Module" cannot equal "file" and Brendan agreed on the es-discuss list as though it was obvious and essential that they are not equal. If multiple modules can be in a single file then the modules need to be self-naming, don't they? The name may be a URL like in Kris' register function or just a regular JavaScript name like used in the conventional module pattern.
Does everyone agree that, for a client-side app, multiple modules must be able to exist in a single file?
On Sun, Sep 7, 2008 at 6:21 PM, Peter Michaux <petermich...@gmail.com> wrote: > On Sun, Sep 7, 2008 at 4:14 PM, Kris Kowal <cowbertvon...@gmail.com> wrote: > (It is not my fault an HTTP connection and download is the most > expensive part of building a client-side app.) };-> > When a module chooses its own name to expose its API, I call this a > "self-naming module." Modules are self-naming in many/most(/all?) > languages.
True; but, for reasons I discussed earlier, this should not be a requirement on compatible implementations of ES.
> ... multiple modules need to be able to exist > in one file for efficient download. "Module" cannot equal "file" and > Brendan agreed on the es-discuss list as though it was obvious and > essential that they are not equal. If multiple modules can be in a > single file then the modules need to be self-naming, don't they?
I'll introduce some terminology --
* Fetching -- getting the program text of a module from the internets.
* Parsing -- parsing the program text and generating a set of ASTs in the language.
* Instantiating -- evaluating the ASTs in a client-supplied environment to yield an instance of the module's data structures.
We are concerned with optimizing the fetching operation. This means that we should be able to statically locate all points where modules are fetched. So in --
var m = loader.fetchAndParse('com.example.widgets', '1.03') var aMod = myLoader.fetchParsed('http://example.com/widgets.js')
Optimization to put the content into one file involves recognizing the necessary calls, prefetching the program text, and replacing the calls with structures that evaluate to the same value:
var m = (function() { ... })(); var aMod = with (...) { ... };
This does not depend on whether modules are or are not self-naming. It *does* require that calls to the loader be statically visible. Whether this means that ES needs to standardize the loader interface or not is up in the air -- it may or may not:
* On the pro side, it is good because I can grab any third-party code and transitively fetch its dependencies from the internets, hence building a single-file application
* On the con side, there are all sorts of ways to load modules (as I outlined eariler) and a standard does not help us examine these.
Making module fetch calls statically analyzable means we need *keywords* (or their moral equivalent) and not just standard functions, since standard functions can be aliased.
On Sun, Sep 7, 2008 at 6:21 PM, Peter Michaux <petermich...@gmail.com> wrote: > On Sun, Sep 7, 2008 at 4:14 PM, Kris Kowal <cowbertvon...@gmail.com> wrote:
> When a module chooses its own name to expose its API, I call this a > "self-naming module." Modules are self-naming in many/most(/all?) > languages.
Programming languages where one explicitly names the module name in the file that contains its code are actually in the minority. Only Java, Perl, and Haskell come to mind.
Java at least comes with a concession that autonomous (that is, "self-named") modules come with a curse. The Eclipse IDE comes with special refactoring tools that allow you to reparent a tree of modules somewhere else on the file-system. It's the kind of thing that can be done in one "mv" command for most other languages.
Precedent aside, you are _absolutely right_ to point out this tension between security and performance, and I don't think Ihab's point that the process of module loading can be split along parsing, fetching, and instantiating address the security problem completely. Perhaps I'm wrong. Allow me to propose a man-in-the-middle-attack.
Module Main depends on modules A and B. File B is trustworthy. File A is untrustworthy. File A declares both module A and B. The true Module B is never fetched, parsed, nor instantiated.
So yeah, there is a tension. I propose that the ECMAScript standard, since there is a notion that the system might be able to host modules as capability objects without the Caja Cajoler and that the purpose of security might be attainable, should specify that a directory _tree_ of modules is sovereign over its module name-space. Thus, any module can only register modules that are on the same domain and under the same directory as itself.
I emphasize that bundling modules should be a build step for JavaScript, not the modus operandi for authoring them. I also acknowledge that it's the kind of thing a hacker might take advantage of. Autonomous modules fall in the same security nightmare realm as "with" and "eval".
> Does everyone agree that, for a client-side app, multiple modules must > be able to exist in a single file?
I for one agree, but also agree that permitting autonomous modules is a security hole. Thus there is a tension and it'll take all of our collective whit to find a solution and resolve the tension amicably.
= Migration Path =
I also would like to clarify that I am proposing that there be two standards: one for contemporary JavaScript wherein security is an illusion and a myth and thus cannot rule design decisions, and one for ECMAScript. It's my desire that these standards be compatible but not the same, to the end that there is an easy migration path from Userland module loaders to a native module loader. I think it's acceptable for that migration to require refactoring, but I also feel that there should be defined transition periods where module files should work in two realms at once: a migration path.
PHASE 1: The Userland module loader proposal should be written such that a module can be used as a global <script> tag or a <module> assuming that it does not use the import mechanism, defines its exports by attaching to "this", and uses an anonymous function block called once to isolate its local scope. During this period, modules can slowly be ported. Eventually the need to support old-fashioned <script> tag modules will expire and these module authors will be able to add module loading calls and remove their module pattern boilerplate.
PHASE 2: The ECMAScript module loader proposal should be written such that a module can be used as both a Userland module and a Native module. During this period, modules can slowly be ported to the native loader by changing the function names used to import and export modules into Native module-loader keywords. Also, during this period, the Cajoler could assume the role of a Userland module loader and thus maintain its security invariants using the new ECMAScript syntax for module loading. Eventually the need for Userland module loaders will expire.
On Sun, Sep 7, 2008 at 7:26 PM, <ihab.a...@gmail.com> wrote: > On Sun, Sep 7, 2008 at 6:21 PM, Peter Michaux <petermich...@gmail.com> wrote: >> On Sun, Sep 7, 2008 at 4:14 PM, Kris Kowal <cowbertvon...@gmail.com> wrote:
[snip]
> I'll introduce some terminology --
> * Fetching -- getting the program text of a module from the internets.
> * Parsing -- parsing the program text and generating a set of ASTs in > the language.
> * Instantiating -- evaluating the ASTs in a client-supplied > environment to yield an instance of the module's data structures.
I'm not so keen on instantiating modules. JavaScript already has constructors that can be instantiated and will likely have classes that use Object.freeze a lot to make the objects immutable. I think of modules as single objects just as the module pattern provides.
> We are concerned with optimizing the fetching operation. This means > that we should be able to statically locate all points where modules > are fetched. So in --
> var m = loader.fetchAndParse('com.example.widgets', '1.03') > var aMod = myLoader.fetchParsed('http://example.com/widgets.js')
> Optimization to put the content into one file involves recognizing the > necessary calls, prefetching the program text, and replacing the calls > with structures that evaluate to the same value:
> var m = (function() { ... })(); > var aMod = with (...) { ... };
The distributed production code just became huge with a lot of recursive, repetitive "inlining" of modules. I don't think that is ok. What if there is a module A depends on module B depends on module A cycle? This is something Brendan said was necessary to support. I'm not sure it is necessary to support.
> This does not depend on whether modules are or are not self-naming. It > *does* require that calls to the loader be statically visible.
[snip]
> Making module fetch calls statically analyzable means we need > *keywords* (or their moral equivalent) and not just standard > functions, since standard functions can be aliased.
As long as "eval" is around is static analysis ever possible?
On Sun, Sep 7, 2008 at 8:29 PM, Kris Kowal <cowbertvon...@gmail.com> wrote:
> Peter,
> On Sun, Sep 7, 2008 at 6:21 PM, Peter Michaux <petermich...@gmail.com> wrote: >> On Sun, Sep 7, 2008 at 4:14 PM, Kris Kowal <cowbertvon...@gmail.com> wrote: > Programming languages where one explicitly names the module name in > the file that contains its code are actually in the minority. > Only Java, Perl, and Haskell come to mind.
That doesn't seem like water tight evidence and I don't know the facts are about this. There are a lot of languages out there. I know Ruby and Scheme also have modules named in the code. By lines of code, Java and Perl probably make a pretty good bulk of web programs. In the end, I don't think it should matter too much what other language do other than learning from their mistakes. JavaScript is in a very weird/unique niche.
[snip]
> Allow me to propose a man-in-the-middle-attack.
If we are going to show holes in proposals (which I think is a very good idea) then there is a man-in-the-middle attack that, as far as I know, cannot be fixed. A proxy server can rewrite content and a proxy server is the actual physical man-in-the-middle. Therefore no JavaScript will be secure. I am tempted to write "totally secure" but in the security world things should either be secure or they are not and the extreme cases matter. JavaScript, distributed as text over HTTP, is not. So any statement of why a module system is being introduced would need to address this issue and at least state that JavaScript is not secure no matter what. I don't think security is an issue for modules to solve. JavaScript applications are insecure. If JavaScript was distributed as compiled code perhaps the situation would be a little brighter.
This may seem quite dramatic but see my next note below...
[snip]
>> Does everyone agree that, for a client-side app, multiple modules must >> be able to exist in a single file?
> I for one agree, but also agree that permitting autonomous modules is > a security hole. Thus there is a tension and it'll take all of our > collective whit to find a solution and resolve the tension amicably.
Before bothering to try solving this problem, it is probably a good idea to see if it is the only security hole and that solving this problem will then make JavaScript secure.
[snip]
> PHASE 1: The Userland module loader proposal should be written such > that a module can be used as a global <script> tag or a <module> > assuming that it does not use the import mechanism, defines its > exports by attaching to "this", and uses an anonymous function block > called once to isolate its local scope. During this period, modules > can slowly be ported. Eventually the need to support old-fashioned > <script> tag modules will expire and these module authors will be able > to add module loading calls and remove their module pattern > boilerplate.
In a web page, there will always be the need for at least one script tag to bootstrap. That is how HTML works. In production that one tag will load all the code so I don't see that there is any phasing out of <script>.
> PHASE 2: The ECMAScript module loader proposal
According to Brendan, a module loader is not in scope for the ECMAScript language. Each host can define its own loading system appropriate to the host environment. Makes sense.
On Sun, Sep 7, 2008 at 9:12 PM, Peter Michaux <petermich...@gmail.com> wrote: > According to Brendan, a module loader is not in scope for the > ECMAScript language. Each host can define its own loading system > appropriate to the host environment. Makes sense.
My reading of es-discuss is that this is not yet concluded, by any means.
On Sun, Sep 7, 2008 at 8:29 PM, Kris Kowal <cowbertvon...@gmail.com> wrote: > Precedent aside, you are _absolutely right_ to point out this tension > between security and performance, and I don't think Ihab's point that > the process of module loading can be split along parsing, fetching, > and instantiating address the security problem completely.
I'm saying that you are vulnerable to your loader, but not necessarily to the modules it loads. In other words, if you build a loader that you trust, you can be assured that modules cannot circumvent that loader.
> Allow me to propose a man-in-the-middle-attack. > Module Main depends on modules A and B. > File B is trustworthy. > File A is untrustworthy. > File A declares both module A and B. > The true Module B is never fetched, parsed, nor instantiated.
I'm not sure what you are saying here -- if File A declares modules A and B, that is equivalent to "self-named" modules, no? In any case, your example assumes a global writeable namespace into which File A places a reference to a corrupt Module B. My whole point is that a secure loader implementation would allow no such thing.
> ... a directory _tree_ of modules is sovereign over its module name-space. > Thus, any module can only register modules that are on the same domain > and under the same directory as itself.
Again, you imply a global namespace into which things are "registered". I claim no such thing:
- There is a loader that you (some module) are given by your parent. You must trust it since you are vulnerable to your parent regardless. Imagine therefore that your parent is the "operating system" -- in other words, part of your Trusted Computing Base (TCB).
- The loader allows you to load some modules depending on some data you give it. It may include supplying names or URIs, like --
On Sun, Sep 7, 2008 at 9:12 PM, Peter Michaux <petermich...@gmail.com> wrote: > I don't think security is an > issue for modules to solve. JavaScript applications are insecure. If > JavaScript was distributed as compiled code perhaps the situation > would be a little brighter.
I'm not sure what compilation would do but, in any case, a good module system would permit one to execute foreign code while placing a boundary around the effects it can cause.
On Sun, Sep 7, 2008 at 9:12 PM, Peter Michaux <petermich...@gmail.com> wrote: > I'm not so keen on instantiating modules. JavaScript already has > constructors that can be instantiated and will likely have classes > that use Object.freeze a lot to make the objects immutable. I think of > modules as single objects just as the module pattern provides.
The important invariant here is that, if we consider a module to be a function, then the module function does not close over any variables outside its body. This means that calling the module function twice results in two completely independent object graphs. That, in turn, is important for security and dependency control.
It may well be that not everyone needs it. Yet, as I mentioned earlier, any standardized module system should not make this usage unduly difficult.
On Sat, Sep 6, 2008 at 11:36 PM, Kris Kowal <cowbertvon...@gmail.com> wrote: > 2.1 Informal Description - Example Module > There's been some discussion over whether to use Python/Java-style > identifiers or URI's to reference modules.
That should imho be up to the environment. I know of no *single* good answer that works for all cases and, as I argue elsewhere, it is not beneficial to assume a global namespace. Versioning and caching, for one, cause the assumption of name-based equivalence to fail.
> 2.2 - Loading the module > I think that it would be a shame if module loader functions were > anything but top-scope functions. I strongly favor "import" for the > ES3.1 standard, since it's already a reserved keyword with behavior on > which no-one depends.
Pursuant to my terminology of "fetch, parse, instantiate", the "fetch" function can be anything. The "parse" function may need to be native (viz. removal of "with" in ES 3.1 strict). The "instantiate" function can be a direct function call on the module itself.
> 2.4 - Asynchronous module loading > This is a point where I think that an ES-3 native implementation can > help a lot. Plain and simple: JavaScript developers shouldn't have to > code in continuations to manage asynchronous module loads.
I'm not sure I like the async solution either, yet I think handling modules as first-class, multiply instantiable objects is important, so I would love to see a solution that preserves this.
> In a Userland implementation, I find that using > exclusively blocking "import" directives greatly simplifies the > program ...
Do you use XMLHttpRequest to get module code? If so, how do you manage to block?
> 3.4 - Module metadata > I believe that the kind of metadata you're describing (dependencies > and versioning I presume) could easily be generated through static > analysis of "import" statements inside the program and assignment to > standard variables, like "local.version". To that end and in the > interest of not requesting that module authors repeat themselves > needlessly, I think that module metadata should be generated > programatically and can safely be implementation specific.
Cool, but does that write to a global namespace that the clients of the module rely on? How many copies of this module are running? How many versions? Who trusts it? What if it tramples on the namespace of the "real" urn:myModule?
> ... we probably should ask Brendan ...
Ahem, you mean ECMA TC39, right? };->
> For Caja's purposes, I imagine that "register" would be a microkernel > function ...
No, Caja just builds first-class modules out of whatever loader you give it. There is no global registration of anything.
> It's high time for some carbonated high-fructose corn-syrup.
On Sun, Sep 7, 2008 at 10:50 AM, Peter Michaux <petermich...@gmail.com> wrote: > I don't like augmenting "this" with the exports. ... > (function(){ > // ... > global.fn = function(){}; > })()
Interesting!
> I thought Brendan made it clear somewhere that the > getting-the-code part will not be part of ECMAScript.
If I understand correctly, that was the way people on es-discuss were leaning.
> Module loads should not be asynchronous. It is too complex and all of > this loading business is not really related to the idea that a module > is a language construct related to variable scope.
Well not really -- if that's all it is, you can declare everything in one big file and be done with it, and use lexical scoping like it was meant to be used. Anything different, to the extent of the difference, is just a "#include" in your code that you can easily write a pre-processor to resolve.
As I see it, a module system for the broader ES effort is about how independently written, perhaps mutually suspicious pieces of code can be composed at run-time safely.
On Sun, Sep 7, 2008 at 9:42 PM, <ihab.a...@gmail.com> wrote: > The important invariant here is that, if we consider a module to be a > function, then the module function does not close over any variables > outside its body. This means that calling the module function twice > results in two completely independent object graphs. That, in turn, is > important for security and dependency control.
> It may well be that not everyone needs it. Yet, as I mentioned > earlier, any standardized module system should not make this usage > unduly difficult.
I agree.
I think that the desired effect could be produced in many ways, dependent on the implementation of the module loader:
* shared memoized singleton for performance when security is not achievable * copy-on-write shapshots for performance where native support is possible * instantiate on demand module objects, when security is achievable but native support is not available
I think that contemporary JavaScript modules could be written in such a way that they do not depend on whether the module loader opts to give them a unique object, a snapshot, or a singleton. In contemporary JavaScript, it's important that modules be stateless so that they are agnostic to this feature. It's also important that contemporary JavaScript be fast and memory-light. To that end, in contemporary JavaScript, where security is not worth trying to attain strictly client-side and performance is paramount, a memoized, shared, singleton object suffices for anyone who wants it. In a module provided by a Caja microkernel module loader, a completely unique object graph would suffice, or it could enforce a rule that module capability objects are shallow and frozen to be immutable. It would then be the purview of the module to not be a vector for transferring shared state among mutually untrusting dependencies. Then, in a native module loader provided by a future version of JavaScript, the modules could be copy-on-write snapshots.
On Sun, Sep 7, 2008 at 9:56 PM, <ihab.a...@gmail.com> wrote: > Do you use XMLHttpRequest to get module code? If so, how do you manage to block?
The third parameter of XMLHttpRequest.open is whether to be asynchronous. true = nonblocking, false = blocking. If you opt to block, the send method waits. Whether onreadystatechange fires in blocking mode is implementation dependent, but it's considered standard for it to be called at least once.
> Cool, but does that write to a global namespace that the clients of > the module rely on? How many copies of this module are running? How > many versions? Who trusts it? What if it tramples on the namespace of > the "real" urn:myModule?
I agree that this is a major problem. I don't think we can live without it though. Minimizing round-trip-times for HTTP requests is still really important especially as the web shifts to centralized services and potentially TCP jumbo-frames. In a high-latency environment, getting all of your data in one round-trip can save a lot of time.
>> ... we probably should ask Brendan ...
> Ahem, you mean ECMA TC39, right? };->
*shy* yeah.
>> For Caja's purposes, I imagine that "register" would be a microkernel >> function ...
> No, Caja just builds first-class modules out of whatever loader you > give it. There is no global registration of anything.
We have to either reconcile the need for module bundling for performance and the need for module isolation for security, or we must chose one or the other. (I'm pretty sure that's a tautology; I would not want to set up a false dilemma.)
Perhaps we could provide strong location-based convention for trust. I know that's bitten us in the butt before and we haven't quite gotten it right with any particular technology like the various Flash conventions or various XHR conventions.
Perhaps we should just ditch "registration" for environments in which security is possible to attain, that is to say, contemporary JavaScript module loaders using Caja, and future native module loaders. The register function could be the exclusive purview of contemporary JavaScript module loaders. Sounds good to me. In fact, I may have already provisioned for the possibility by specifying that "register" and "publish" MAY be provided by some module loaders but not others.
On Sun, Sep 7, 2008 at 10:01 PM, <ihab.a...@gmail.com> wrote:
> On Sun, Sep 7, 2008 at 10:50 AM, Peter Michaux <petermich...@gmail.com> wrote: >> I don't like augmenting "this" with the exports. ... >> (function(){ >> // ... >> global.fn = function(){}; >> })()
> Interesting!
Yeah. In the specification I've outlined on the wiki, I recommend using "module" and "local" to explicitly localize or export a function. "module" is the same as "this" in the top-scope. I agree that "module", "global", and "local" are all much more explicit than "this", and I would encourage their use too. Sparing that the closure-called-once is free in the specified module system, that particular expression would be equivalent to:
global.fn = function () {};
And, if you wanted to follow the rule of not modifying the global object and rather export functions by adding them to the module:
module.fn = function () {};
I'm only holding out for the provision of "this" == "module" still because of that neat migration path for jQuery I described.
>> I thought Brendan made it clear somewhere that the >> getting-the-code part will not be part of ECMAScript.
> If I understand correctly, that was the way people on es-discuss were leaning.
I think we're all in agreement. "fetch" should be implementation-specific.
>> Module loads should not be asynchronous. It is too complex and all of >> this loading business is not really related to the idea that a module >> is a language construct related to variable scope.
> Well not really -- if that's all it is, you can declare everything in > one big file and be done with it, and use lexical scoping like it was > meant to be used. Anything different, to the extent of the difference, > is just a "#include" in your code that you can easily write a > pre-processor to resolve.
> As I see it, a module system for the broader ES effort is about how > independently written, perhaps mutually suspicious pieces of code can > be composed at run-time safely.
> In the end, > I don't think it should matter too much what other language do other > than learning from their mistakes. JavaScript is in a very > weird/unique niche.
Agreed. Let's stick to citing requirements rather than language precedent from now on.
> A proxy server can rewrite content and a proxy > server is the actual physical man-in-the-middle. Therefore no > JavaScript will be secure
This kind of attack requires DNS cracking. It isn't impossible, but it is pretty hard, and it's a separate matter.
> I don't think security is an > issue for modules to solve.
Not to belabor the point or anything, but I think we can move beyond discord for these basic issues.
* Contemporary JavaScript is not secure at all. * It can be made secure with server-side static analysis and injected client-side assertions. * Future JavaScript should make it easier to maintain security invariants client-side. * In the long term, future JavaScript should obviate the need for server-side static analysis and injected client-side assertions.
1. For the purpose of developing a client-side module loader with contemporary JavaScript, security is not a concern. 2. For the purpose of standardizing a server-side module system for future versions of JavaScript, not introducing new security holes and difficulties for static-analysis or runtime security assertions are valid concerns. 3. In so far as that a client-side module loader with contemporary JavaScript can provide an easy path of migration to a future, native JavaScript module loader, the client-side module loader standard should be patterned after the future JavaScript module loader standard. 4. By premises (2) and (3), we can infer that security issues should effect patterns that influence the contemporary client-side module loader standard, even though they will not afford any level of security in contemporary JavaScript without static analysis and runtime assertions.
> In a web page, there will always be the need for at least one script > tag to bootstrap. That is how HTML works. In production that one tag > will load all the code so I don't see that there is any phasing out of > <script>.
I believe in the first afterward of the draft specification I deliberately addressed this issue. The specification does not constrain the manner in which "main" modules are invoked. This can be done with a <script>, as it is with modules.js. I don't believe that I've suggested that <script> should be phased out, merely that its use for loading modules should be reduced to a single latch-point with contemporary JavaScript Userland module loaders. I imagine that future versions of JavaScript could replace the <script> tag with a <module> tag or <scriprt variant> to enforce a consistent use of JavaScript modules and avoid the can of worms that globally evaluated <script> tags, especially inline <script> tags, introduce, but it isn't a pragmatic option, nor necessary if web developers choose to use a secure native modules.
On Sun, Sep 7, 2008 at 11:24 PM, Kris Kowal <cowbertvon...@gmail.com> wrote: > I think that contemporary JavaScript modules could be written in such > a way that they do not depend on whether the module loader opts to > give them a unique object, a snapshot, or a singleton.
Hm. But contemporary JS modules aren't. :)
> In > contemporary JavaScript, it's important that modules be stateless so > that they are agnostic to this feature.
Again, contemporary modules aren't.
> Then, in a > native module loader provided by a future version of JavaScript, the > modules could be copy-on-write snapshots.
The whole idea is to come up with that native module loader's semantics (or say that we don't need to standardize them). All else is worthy -- but nonstandard -- solutions.
On Mon, Sep 8, 2008 at 12:01 AM, Kris Kowal <cowbertvon...@gmail.com> wrote: > * Contemporary JavaScript is not secure at all. > * It can be made secure with server-side static analysis and injected > client-side assertions. > * Future JavaScript should make it easier to maintain security > invariants client-side. > * In the long term, future JavaScript should obviate the need for > server-side static analysis and injected client-side assertions.
> 1. For the purpose of developing a client-side module loader with > contemporary JavaScript, security is not a concern. > 2. For the purpose of standardizing a server-side module system for > future versions of JavaScript, not introducing new security holes and > difficulties for static-analysis or runtime security assertions are > valid concerns. > 3. In so far as that a client-side module loader with contemporary > JavaScript can provide an easy path of migration to a future, native > JavaScript module loader, the client-side module loader standard > should be patterned after the future JavaScript module loader > standard. > 4. By premises (2) and (3), we can infer that security issues should > effect patterns that influence the contemporary client-side module > loader standard, even though they will not afford any level of > security in contemporary JavaScript without static analysis and > runtime assertions.
I agree with all the above; it's well stated. Thanks.
On Sun, Sep 7, 2008 at 11:47 PM, Kris Kowal <cowbertvon...@gmail.com> wrote: > The third parameter of XMLHttpRequest.open is whether to be > asynchronous.
Ah.
>> Cool, but does that write to a global namespace that the clients of >> the module rely on? How many copies of this module are running? How >> many versions? Who trusts it? What if it tramples on the namespace of >> the "real" urn:myModule?
> I agree that this is a major problem. I don't think we can live > without it though. Minimizing round-trip-times for HTTP requests is > still really important especially as the web shifts to centralized > services and potentially TCP jumbo-frames. In a high-latency > environment, getting all of your data in one round-trip can save a lot > of time.
These are orthogonal. If you want to pre-fetch modules, you go through your code looking for calls to the module fetcher schmivit and replacing them with other schmivits that return a closure or an object or what not that is initialized from literal text that you autoslurp from whatever the fetcher schmivit would have fetched at run time.
seems to define in a *standard* the idea that the urn: namespace or what not is hard-coded as a global space. My point is that you don't need it and shouldn't do it.
Does this explain things adequately? If not we can try again over chat or whatever. I may be missing something.
> Perhaps we should just ditch "registration" for environments in which > security is possible to attain, that is to say, contemporary > JavaScript module loaders using Caja, and future native module > loaders. The register function could be the exclusive purview of > contemporary JavaScript module loaders. Sounds good to me. In fact, > I may have already provisioned for the possibility by specifying that > "register" and "publish" MAY be provided by some module loaders but > not others.
Well, I would restate that: the module fetcher takes *some* data telling it what to fetch. Each module is vulnerable to its fetcher. Each application is vulnerable to ts static analysis. But the parameters of the fetcher need not be standardized at this point -- at least not by ES.
On Sun, Sep 7, 2008 at 9:32 PM, <ihab.a...@gmail.com> wrote:
> a good module > system would permit one to execute foreign code while placing a > boundary around the effects it can cause.
So you want there to be something like a "Caja mode", more restrictive than "strict", where the loader would limit the environment radically?
Do you envision this as part of the ECMAScript standard or as part of a host standard (e.g. W3C)? I think this would be a host standard as many of the features needing to be controlled are outside the ECMAScript standard.
On Mon, Sep 8, 2008 at 10:14 AM, Peter Michaux <petermich...@gmail.com> wrote: > On Sun, Sep 7, 2008 at 9:32 PM, <ihab.a...@gmail.com> wrote: >> a good module >> system would permit one to execute foreign code while placing a >> boundary around the effects it can cause.
> So you want there to be something like a "Caja mode", more restrictive > than "strict", where the loader would limit the environment radically?
That is a useful way to think about it; the point is, what is the default?
> Do you envision this as part of the ECMAScript standard or as part of > a host standard (e.g. W3C)? I think this would be a host standard as > many of the features needing to be controlled are outside the > ECMAScript standard.
Well, the specific *objects* passed around are outside ES, but the fact of whether the loaded module automatically inherits its client's globals is definitely in ES. Again, what's the default, and how can it be made to work for everyone?
Do we standardize two native loaders, a "strict" and a "loose" one, such that security critical environments just require the "strict" version?
How does that inform the issue of how we pass imports and exports across the loading boundary?