Call for votes: name of the system-provided "virtual" package

3 views
Skip to first unread message

ihab...@gmail.com

unread,
Sep 29, 2009, 5:22:04 PM9/29/09
to comm...@googlegroups.com
This is to determine the name we will give the package representing
the system's PATH of modules. Current proposals are:

"default"
"global"

Any others? Any votes on the above ones?

Ihab

--
Ihab A.B. Awad, Palo Alto, CA

Daniel Friesen

unread,
Sep 29, 2009, 5:24:56 PM9/29/09
to comm...@googlegroups.com
+1 global

~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]

Mark S. Miller

unread,
Sep 29, 2009, 6:54:42 PM9/29/09
to comm...@googlegroups.com
+1 "default". I just verified with Ihab that it is not global. A
global namespace is a bug. Period.
--
Cheers,
--MarkM

ihab...@gmail.com

unread,
Sep 29, 2009, 6:59:50 PM9/29/09
to comm...@googlegroups.com
On Tue, Sep 29, 2009 at 3:54 PM, Mark S. Miller <eri...@google.com> wrote:
> +1 "default". I just verified with Ihab that it is not global. A
> global namespace is a bug. Period.

I should clarify here --

What I originally intended is probably what MarkM would call a bug.
The "default" package (calling it that for the moment...) is a view of
what's on the filesystem, perhaps subject to the veto power of the
CommonJS platform if it so chooses.

What MarkM is arguing for is that the "default" package is something
that an individual loader provides, and which can be overridden on a
per-loader basis.

What is not clear to me -- MarkM please explain -- is this:

Assume the "default" package is, *by default*, simply a view of the
packages on the current filesystem, as installed by apt-get or local
sysadmin edits or whatever. This forms a "global" namepace on that
machine. A programmer *could* override that, but let's assume that, in
the common case, they don't.

Do you *now* have objections to this state of affairs? Because if you
do, we should discuss why: this was identified as a common and
important use case here.

Mark S. Miller

unread,
Sep 29, 2009, 7:12:27 PM9/29/09
to comm...@googlegroups.com
On Tue, Sep 29, 2009 at 3:59 PM, <ihab...@gmail.com> wrote:
>
> On Tue, Sep 29, 2009 at 3:54 PM, Mark S. Miller <eri...@google.com> wrote:
>> +1 "default". I just verified with Ihab that it is not global. A
>> global namespace is a bug. Period.
>
> I should clarify here --
>
> What I originally intended is probably what MarkM would call a bug.
> The "default" package (calling it that for the moment...) is a view of
> what's on the filesystem, perhaps subject to the veto power of the
> CommonJS platform if it so chooses.
>
> What MarkM is arguing for is that the "default" package is something
> that an individual loader provides, and which can be overridden on a
> per-loader basis.
>
> What is not clear to me -- MarkM please explain -- is this:
>
> Assume the "default" package is, *by default*, simply a view of the
> packages on the current filesystem, as installed by apt-get or local
> sysadmin edits or whatever. This forms a "global" namepace on that
> machine. A programmer *could* override that, but let's assume that, in
> the common case, they don't.

As long as they can override it, or rather, that access to it must be
granted by the importing environment (according to normal ocap rules)
then my main objection goes away.


> Do you *now* have objections to this state of affairs? Because if you
> do, we should discuss why: this was identified as a common and
> important use case here.

My apologies, I have only been skimming this thread. Before giving an
informed answer I will need to review the messages identifying this as
an important use case.



> Ihab
>
> --
> Ihab A.B. Awad, Palo Alto, CA
>
> >
>



--
Cheers,
--MarkM

Wes Garland

unread,
Sep 29, 2009, 9:16:27 PM9/29/09
to comm...@googlegroups.com
If I have a correct grasp on this: "default" in this configuration is exactly equivalent to "what we currently have".

Given that -- "default" is a much better name than "global".  Global in this case is really non-descriptive; it is already a pretty overloaded label, as well. "default" is actually descriptive, if the literal string "default" yields the same behaviour as having only one argument to require().

Wes

Daniel Friesen

unread,
Sep 29, 2009, 9:53:37 PM9/29/09
to comm...@googlegroups.com
Eh?
I was under the interpretation that "default" was the package for all
modules that aren't in a package.

ie: require(module); is local to your current package.
If you require(module); from the default/global package you get modules
from the default/global package.
If you require(module); form the foo package you get modules from the
foo package.
If you require(module, "default"); from anywhere you get modules from
the default/global package.

"default" != same behavior as one argument

~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]

ihab...@gmail.com

unread,
Sep 29, 2009, 11:01:46 PM9/29/09
to comm...@googlegroups.com
On Tue, Sep 29, 2009 at 6:53 PM, Daniel Friesen
<nadir.s...@gmail.com> wrote:
>
> Eh?
> I was under the interpretation that "default" was the package for all
> modules that aren't in a package.
>
> ie: require(module); is local to your current package.
> If you require(module); from the default/global package you get modules
> from the default/global package.
> If you require(module); form the foo package you get modules from the
> foo package.
> If you require(module, "default"); from anywhere you get modules from
> the default/global package.
>
> "default" != same behavior as one argument

Daniel's interpretation is the correct description of our current proposal.

Kris Kowal

unread,
Sep 30, 2009, 1:01:25 AM9/30/09
to comm...@googlegroups.com
On Tue, Sep 29, 2009 at 8:01 PM, <ihab...@gmail.com> wrote:
> Daniel's interpretation is the correct description of our current proposal.

Thanks for clarifying, Daniel.

I think "default" will be misconstrued as equivalent to the single
argument case.

I think "global" is equally bad.

Perhaps we should consider "common", "commonjs", "common-js" or
"system" as the name space for the administered and managed package
name space.

This would necessitate require("file", "commonjs") to be the syntax
for requiring the CommonJS File module from the common name space from
outside the "commonjs" "package".

Although this is clean and clear, I am not entirely satisfied. I
would like require("file") to consistently deliver the CommonJS file
module. I'd like require("jack") to consistently deliver the "jack"
module if it is installed in "commonjs" or named in the present
package's package.json. Perhaps we should consider a scheme with
chained module loaders, where require(id) would first ask the package
module for the module "id", then its parent, the "commonjs" package
loader, such that the common cases are simple. Then, perhaps the two
argument form could be used to resolve conflicts.

One unfortunate ramification of this scheme is that the
loader.resolve(id, baseId) would no longer be a pure-text calculation;
it would require knowledge of the present packages in order to
construct a canonical module identifier, including both the {package}
and the {top-id} given a base {package}, {top-id}, and the package
inheritance chain.

So, I'm torn between desires.

If we go with the two-argument form, I would recommend that the two
arguments be internally normalized into a canonical module identifier
of the form {scheme}://{package}/{top-id}, such that
loader.resolve(id, baseId) would accept that form for baseId and
return a new baseId of that form. Thuse, require(id) would be able to
handle all canonical module identifiers without having to switch
between the single and double argument forms.

Kris Kowal

ihab...@gmail.com

unread,
Sep 30, 2009, 1:18:52 AM9/30/09
to comm...@googlegroups.com
On Tue, Sep 29, 2009 at 10:01 PM, Kris Kowal <cowber...@gmail.com> wrote:
> Perhaps we should consider "common", "commonjs", "common-js" or
> "system" as the name space for the administered and managed package
> name space.

+1 for "system".

> Although this is clean and clear, I am not entirely satisfied.  I
> would like require("file") to consistently deliver the CommonJS file

> module.  ... One unfortunate ramification of this scheme is that the


> loader.resolve(id, baseId) would no longer be a pure-text calculation;

Thanks for the analysis.

> it would require knowledge of the present packages in order to
> construct a canonical module identifier, including both the {package}
> and the {top-id} given a base {package}, {top-id}, and the package
> inheritance chain.

Indeed. The whole package inheritance chain thing is scary to me.

One thing to think about here is: where are you going to use modules
at all? I like your require('file') example because it illustrates the
difference between what we are doing and Python. In Python, you do:

import file

which gives you a module that comes wrapped up with a bunch of
"ambient" authority that it gains via its underlying links to the OS
via C/C+ bindings. In CommonJS, when you:

require('file');

you are granting yourself *no more authority* than that already
available to your sandbox. Hence there had to be *something* that gave
you that authority *already* present in your sandbox. Following the
precedent of (say) the browser model, that something should be a set
of *objects* that are provided by some more-powerful parent, as in:

sys.file

that give you the API you need.

There is a body of opinion that sandboxes should be built with
pre-populated "modules" that are actually links to external
capabilities, not really just pure modules per se. I think that this
is mixing namespaces, and the present dilemma is a symptom of this
mixing, rather than a deficiency in the ease of use of the require()
syntax.

> If we go with the two-argument form, I would recommend that the two
> arguments be internally normalized into a canonical module identifier
> of the form {scheme}://{package}/{top-id}, such that
> loader.resolve(id, baseId) would accept that form for baseId and
> return a new baseId of that form.

But these canonical identifiers are not valid *across* package
contexts. For example, package 1 might assign the name "foo" to --

{ locate: [ "http://a.com/foo.zip" ], verify: { signed-by: ... } }

and package 2 may assign that name to --

{ locate: [ "https://b.com/theFoo.zip" ] }

and these are *not* the same. Unless the package URI contains the
*entire* JSON descriptor, it is not canonical.

Kris Kowal

unread,
Sep 30, 2009, 1:30:06 AM9/30/09
to comm...@googlegroups.com
On Tue, Sep 29, 2009 at 10:18 PM, <ihab...@gmail.com> wrote:
> But these canonical identifiers are not valid *across* package
> contexts. For example, package 1 might assign the name "foo" to --
>
>  { locate: [ "http://a.com/foo.zip" ], verify: { signed-by: ... } }
>
> and package 2 may assign that name to --
>
>  { locate: [ "https://b.com/theFoo.zip" ] }
>
> and these are *not* the same. Unless the package URI contains the
> *entire* JSON descriptor, it is not canonical.

Right, right. So we need a function that can resolve a canonical
identifier from a relative or top module identifier, a
package-relative package identifier, a reference to the containing
package, and thereby a way to map package-relative package identifiers
to package locations. How about a canonical module identifier of:

{location}#{top-id}

Would it be safe to treat the verification information as non-normal
data? Would there possibly be two references to the same location
with conflicting signatures?

Kris Kowal

Kris Kowal

unread,
Sep 30, 2009, 1:34:20 AM9/30/09
to comm...@googlegroups.com
> Right, right.  So we need a function that can resolve a canonical
> identifier from a relative or top module identifier, a
> package-relative package identifier, a reference to the containing
> package, and thereby a way to map package-relative package identifiers
> to package locations.  How about a canonical module identifier of:

Correction and clarification. The input to a stateless resolver would be:

* a top-level or relative module identifer (first require argument)
* a package-relative package identifier, or undefined (second require argument)
* a top-level base module identifier
* a canonical base package identifier
* a mapping of canonical package identifiers to package locations

Kris Kowal

ihab...@gmail.com

unread,
Sep 30, 2009, 1:54:38 AM9/30/09
to comm...@googlegroups.com
On Tue, Sep 29, 2009 at 10:30 PM, Kris Kowal <cowber...@gmail.com> wrote:
> Would it be safe to treat the verification information as non-normal
> data?  Would there possibly be two references to the same location
> with conflicting signatures?

The source location is non-unique, so the verification may be what
disambiguates which source is deemed acceptable and actually used.

Kris Kowal

unread,
Sep 30, 2009, 2:01:02 AM9/30/09
to comm...@googlegroups.com

How about:

{location}?{top-id}#{signature}

Do we agree that we *need* a textual canonical reference to modules
that can be accepted by the one-argument form of require? If this
proposal does not close the gap, can you recommend another?

Kris Kowal

ihab...@gmail.com

unread,
Sep 30, 2009, 2:02:33 AM9/30/09
to comm...@googlegroups.com
On Tue, Sep 29, 2009 at 11:01 PM, Kris Kowal <cowber...@gmail.com> wrote:
> Do we agree that we *need* a textual canonical reference to modules
> that can be accepted by the one-argument form of require?

I'm not sure I agree. What are the use cases that demand it?

Kris Kowal

unread,
Sep 30, 2009, 2:17:22 AM9/30/09
to comm...@googlegroups.com
On Tue, Sep 29, 2009 at 11:02 PM, <ihab...@gmail.com> wrote:
>
> On Tue, Sep 29, 2009 at 11:01 PM, Kris Kowal <cowber...@gmail.com> wrote:
>> Do we agree that we *need* a textual canonical reference to modules
>> that can be accepted by the one-argument form of require?
>
> I'm not sure I agree. What are the use cases that demand it?

The programmatic case, where you receive a module identifier and need
to require it. This is a common pattern in my experience with
Narwhal, where frequently modules are used to export application
configurations, like Jack config files, Jakefiles (sortof), Bogart
configurations (from what I recall), and probably others. It would
not be possible to divine a canonical module identifier from an
[top-id, package] duple given that there is other context in the mix,
so require.apply(require, pair) would not be pretty, and furthermore,
would not work.

I think we need to be able to channel a value returned by
loader.resolve into require to keep sandbox designs simple; by most
people's measure, the sandbox design is already too complicated.

Kris Kowal

Daniel Friesen

unread,
Sep 30, 2009, 2:26:29 AM9/30/09
to comm...@googlegroups.com
ihab...@gmail.com wrote:
> On Tue, Sep 29, 2009 at 10:01 PM, Kris Kowal <cowber...@gmail.com> wrote:
>
>> Perhaps we should consider "common", "commonjs", "common-js" or
>> "system" as the name space for the administered and managed package
>> name space.
>>
>
> +1 for "system".
>
One of those do sound better than default or global.
-1 on common-js since it sounds abnormal, afaik we've never referred to
the group that way
+1 on either common or commonjs, they both sound reasonable

system..., meh, I'm not keen on:
require("system", "system"); / require.package("system").module("system");

Perhaps I'll +1 common, +.5 commonjs.
common seams to be better as not all of the said modules in the default
package will be commonjs standardized.
I for one already include a repl module to start up an interactive
shell. And it's also clear that these will include packages like those
installed at a system level.

>> Although this is clean and clear, I am not entirely satisfied. I
>> would like require("file") to consistently deliver the CommonJS file
>> module. ... One unfortunate ramification of this scheme is that the
>> loader.resolve(id, baseId) would no longer be a pure-text calculation;
>>
>
> Thanks for the analysis.
>

I don't like the idea of require(???); silently crossing borders. I like
the fact that packages isolate themselves. It gives a safe ground of "Am
I worried about conflicting names with something? Am I worried about
missing a file and silently including a module that has nothing to
trying to do with my library? Ok, it's in a package so I don't have to
worry about magic lying on the system."

However, I would not object to an alternate form that would basically be
a one-line or two-line simplification of:
try {
var file = require("file");
} catch ( e if e ... ) {
var file = require.package("common").module("file");
}

If you want, we could also create a special case and make
`require.common = require.package("common");` to shorten the most common
use case. Or even make `require.common = function(mod) { return
require.package("common").module(mod); };`

Actually on that note. That try catch loop has one issue now that I
think about it. Afaik, we have not standardized what kind of error
require should throw if it cannot load a module. I'd prefer something
like a LoadError error class, or perhaps RequireError would be a better
name. If we don't do something like that, then trying to catch bad loads
and provide an alternate is going to catch errors thrown by modules that
aren't technically load errors, but rather flat out bugs inside the
module we shouldn't be hiding.

This reason for one, I think we need to think more about package
management when we're drafting ideas for packages.

I for one am NOT going to implement CommonJS where it will fetch
dependencies on the fly. This is inherently insecure for the same reason
someone on the wikitech-l explained why WordPress is insecure (it has no
problem starting to start making http requests and downloading things
without confirmation right out of the box). Dropping a app into your
filesystem, viewing it through the webserver it's connected to, then
having it silently fetch libraries from third parties (even if the
developer thinks they're ok) install them on the system, and execute
them in a semi-trusted environment with filesystem and even mail access,
is not something I'm going to support.
Another reason of course is this does not work nicely in the case of
privileges, since this is done by the webserver user it cannot install
global packages (unless you are naive enough to run your webserver as
root), thus you either start making incoherent user-level package stores
behind the user's backs, or fetch packages every time you restart the
app server.

When I found out that packages would not have a unique canonical name, I
did some thinking on how I would allow packages to be installed using
banana to install commonjs packages (banana has moved from my package
system, likely to a commonjs package installer, perhaps with an
extension or two.)
I came up with a few ways to install packages:
$ banana install http://foodev.net/commonjs/modules/foobar.zip
$ banana install ./foobar/manifest.json foo # note that I'll probably
make it so you can omit manifest.json from the name
Packages would likely be stored on the system based on where they were
downloaded.
The first line is where banana is given the url to download a package
from then install it.
The latter banana opens up the manifest.json from a package and installs
the package identified by the package local name foo by reading it's
package descriptor.
^_^ This has a very nice bit that fits in line with one thing I was
going to do with my old banana package setup. A app can include a
manifest with dependencies inside of it and `banana
install-dependencies` or whatever will basically look at that app's
dependencies and install them onto the system. This makes for a very
nice and handy way of telling people "cd into the app directory and run
`banana install-dependencies` to install all the libraries that this
application depends on` instead of saying "Install this, and this, and
this, this to, and this over here" or being forced to make the app a
package itself. (I might extend package descriptor with an optional:
true/false key as well)

This is one reason why I -1 the idea of package descriptors being
included inline in modules inside of an .async call. It's no longer nice
and easy to do the latter. I need to go and suddenly error when a
package is found missing, save that descriptor with some random name and
tell the user to run some command to install that descriptor onto the
system. It also means that at install time I can't give the user a nice
list of "Hey, this application may depend on these things, you may wish
to install them now."

Course all this still leads to one problem:
<foo> depends on <bar> which depends on <baz> which depends on <hello world>

I stick the foo application on my system. I check and banana tells me it
depends on bar. I have it install bar. Now it tells me that bar depends
on baz, so I allow it to install it. Now it says baz also depends on
hello world, so I allow it to install it.
I can't give a nice complete list of dependencies unless I prefetch the
entirety of all the packages (which isn't something I like the idea of
doing behind the user's back) just to find the list of dependencies. And
as a result I need to talk to the user about dependencies multiple time.
Not to mention that this could become a tedious chore as the depth of
dependencies grows... Which is an issue considering a really well built
system is best divided up into small components so that they can be shared.

I'm not completely worried about this issue. To mitigate it I'm probably
going to setup a banana server and support banana: urls inside of
locate:, all other implementations can ignore that url they don't
understand, but banana can use that to download the commonjs package
from a banana server that can read package manifests and give me a list
of most of the dependencies (as long as those dependencies don't use
anything outside off the banana server, those will just fallback to the
case before) to present to the user up front.

Wes Garland

unread,
Sep 30, 2009, 5:37:37 AM9/30/09
to comm...@googlegroups.com
How did we get from a single-argument, module-relative or search-pathed module loader to a requirement by the developer to know where specific modules he didn't write are installed?

Extra sugar tastes great but when it's core system functionality, we really need to think twice about whether we need to add the weight.

From my perspective, single-argument require solves all of the resolution problems when the module is named either relative to the current module, or on the system-wide module path.  (Note that I believe that Narwhal's implementation, which allows path-based resolution of module-relative addressing -- e.g. require("./abc") naming multiple possible source files -- is simply wrong)

The single-argument require() completeness means that the second argument to require is only necessary to resolve module names relative to another package.  How are these packages loaded/resolved themselves? Do we need a separate package loader/resolver? If packages were simply modules, exporting the modules' own require function solves this problem without introducing any new syntax or resolution requirements.

Let's play with some pseudo code for a moment, I think it will help document my understand of things:

 - module path contains only /usr/local/commonjs/libexec
 - file /usr/local/commonjs/libexec/string_package.js:

exports.require = function(name) { require ("../string_package/" + id) };

 - file /usr/local/commonjs/libexec/string_package/base64.js

exports.encode = function base64_encode() { .... };
exports.decode = function base64_decode(charset) { ....; return new require("binary").ByteString(charset, ...) };

 - file /var/www/cgi-bin/decode_b64.js

#! /usr/bin/gsr
print("Content-Type: text/plain; charset=utf-8\n");

const stringPackage = require("string_package");
var query = require("cgi").query;

print("Your base64 string decodes to: " + stringPackage.require("base64").decode("utf-8", query));

A side note to Ihab: The "thing" that grants authority in the sandbox, from my POV, is the require function itself. How require communicates outside of the sandbox is where all of the authority flows.

Wes

ihab...@gmail.com

unread,
Sep 30, 2009, 11:06:25 AM9/30/09
to comm...@googlegroups.com
On Tue, Sep 29, 2009 at 11:17 PM, Kris Kowal <cowber...@gmail.com> wrote:
> The programmatic case, where you receive a module identifier and need
> to require it.

So just pass along the JSON package descriptor and the string module
path. I still don't get it.

Wes Garland

unread,
Sep 30, 2009, 11:09:52 AM9/30/09
to comm...@googlegroups.com
On Wed, Sep 30, 2009 at 11:06 AM, <ihab...@gmail.com> wrote:

On Tue, Sep 29, 2009 at 11:17 PM, Kris Kowal <cowber...@gmail.com> wrote:
> The programmatic case, where you receive a module identifier and need
> to require it.

So just pass along the JSON package descriptor and the string module
path. I still don't get it.

I believe you have just changed module.id into the pair [module.id,module.packageName]

Is this desirable?  What problem does it solve?

Wes

--
Wesley W. Garland
Director, Product Development
PageMail, Inc.
+1 613 542 2787 x 102

ihab...@gmail.com

unread,
Sep 30, 2009, 11:12:50 AM9/30/09
to comm...@googlegroups.com
On Tue, Sep 29, 2009 at 11:26 PM, Daniel Friesen
<nadir.s...@gmail.com> wrote:
> dependencies on the fly. This is inherently insecure for the same reason
> someone on the wikitech-l explained why WordPress is insecure (it has no
> problem starting to start making http requests and downloading things
> without confirmation right out of the box). Dropping a app into your
> filesystem, viewing it through the webserver it's connected to, then
> having it silently fetch libraries from third parties (even if the
> developer thinks they're ok) install them on the system, and execute
> them in a semi-trusted environment with filesystem and even mail access,
> is not something I'm going to support.

Then you should not be executing them with open filesystem access. :)

In a properly sandboxed environment, all you have to worry about is
the possibility of communication -- the code can signal the outside
world by require()-ing.

If you support only *static* require(), and with a bit of care, you
can be sure this is not an issue since the computation of require()-ed
modules could not have been done based on runtime data.

Otherwise, if you support *dynamic* require(), you can pull your
packages from a CDN (say) to ensure that the require()-s do not
signal.

ihab...@gmail.com

unread,
Sep 30, 2009, 11:15:46 AM9/30/09
to comm...@googlegroups.com
On Wed, Sep 30, 2009 at 8:09 AM, Wes Garland <w...@page.ca> wrote:
> I believe you have just changed module.id into the pair
> [module.id,module.packageName]

Not module.packageName. It's module.packageDescriptor.

> Is this desirable?  What problem does it solve?

A nontrivial package descriptor allows verification that the correct
module was loaded. It is in a sense a "secure name".

* * * * *

Would everyone be happier if we said, the canonical module ID *can* be
represented as a string, and that string is JSON?

Wes Garland

unread,
Sep 30, 2009, 12:13:35 PM9/30/09
to comm...@googlegroups.com
> Would everyone be happier if we said, the canonical module ID *can* be
> represented as a string, and that string is JSON?

That is tempting: it introduces very little new syntax and preserved the require(module.id) idiom. JSON is, in fact, a permissible module identifier in the current scheme of things, provided that the require() implementation can resolve it.

I am still, however, not convinced that packages cannot be wholly represented as plain modules.


> A nontrivial package descriptor allows verification that the correct
> module was loaded. It is in a sense a "secure name".

I'm not sure this argument holds water.  I've pointed out before that I don't believe that narwhal's module resolver is sufficiently deterministic and Kris Kowal spun off a thread about that this morning.

In my view, the "secure name" problem is better solved by improving require() determinism than adding another layer of abstraction.

ihab...@gmail.com

unread,
Sep 30, 2009, 12:19:56 PM9/30/09
to comm...@googlegroups.com
On Wed, Sep 30, 2009 at 9:13 AM, Wes Garland <w...@page.ca> wrote:
> In my view, the "secure name" problem is better solved by improving
> require() determinism than adding another layer of abstraction.

The current proposal is *all* about making require() deterministic. :)

Please counterpropose -- I'm interested to hear your thoughts.
Specifically, how do you solve the "diverse and sundry versions of the
warez on the internets" problem? Do you have out-of-band package
management? A global namespace? Or -- ?

Wes Garland

unread,
Sep 30, 2009, 1:10:40 PM9/30/09
to comm...@googlegroups.com
On Wed, Sep 30, 2009 at 12:19 PM, <ihab...@gmail.com> wrote:

On Wed, Sep 30, 2009 at 9:13 AM, Wes Garland <w...@page.ca> wrote:
> In my view, the "secure name" problem is better solved by improving
> require() determinism than adding another layer of abstraction.

The current proposal is *all* about making require() deterministic. :)

AH! We're now on the same wavelength!

 I believe these rules are needed to make resolution of installed modules deterministic:
  - require("a") loads a.js on the module path (require.paths)
  - require("./a") loads a.js from the same container (e.g. dir) as the loading module
  - require("../a") loads a.js from the loading module's parent's container
  - programs are modules
  - module.id returns a string which is a valid argument to require() for the lifetime of the program (or sandbox)
  - module.id is unique across modules visible from the current program (or sandbox)
  - module.id does not depend on module paths

Specifically, how do you solve the "diverse and sundry versions of the
warez on the internets" problem? Do you have out-of-band package
management? A global namespace? Or -- ?

Likely out-of-band package management.  I consider these problems to be effectively identical: "Where do I find package.json" and "Where do I find module.js" -- and I haven't seen a solution yet for "where do I find package.json" (I may have missed it).

Recall that I've shown earlier in this thread that modules themselves can take the place of module containers (e.g. packages), provided that modules can be require()d deterministically.  This could be done by either exporting the containing module's require, or the containing module can export the submodules' exports objects.

If a module X needs a module Y that ships with X, module X can require("./Y"); provided require is made deterministic.

Now, if a module A needs a specific version of another module B that doesn't ship with A: we have to solve this with decent naming convention  (whether we are talking json-defined packages OR modules-which-are-packages).

So: globally-available modules/packages have to have unique names in a given CommonJS installation. That should be managed by who/whatever is installing them.  A convention like module-packages start with "packages/" might help, but should not need to be rigidly specified.

Locally-available modules -- i.e. those that ship with other modules -- can be deterministically located relative to the globally-available module that needs it, via the ./ and ../ relative module syntax; provided require-determinism is fixed.

So, let's take another example, all the way to the end case. I'll invent stuff about the package manager as I go along to make sure everything works.  (Man, sorry I'm being so verbose here - but it helps me think):

Say a program needs the package org/apache/commons/pool.

The program will need to ship with some kind of a JSON file describing the program's requirements and maybe where to get them, e.g.

[ { name: "org/apache/commons/pool", url: "http://apache.org/commons/pool.car" }, ... ]

The installer then checks, say, /usr/local/commonjs/packages/org/apache/commons and finds that there is no pool.js installed, so it fetches http://apache.org/commons.org/pool.car, which is a zip file containing pool.js and pool/lib/factory.js.

- When the pool module needs stuff from the factory, it will require("./lib/factory.js");
- When the program wants the pool module, it will require("org/apache/commons/pool");
- Since /usr/local/commonjs/packages/ is on the module path, that module will get loaded.
- The implementation might make module.id be /usr/local/commonjs/packages/org/apache/commons/pool, but any other unique identifier would do just fine
- If another package in /usr/local/commonjs/packages/org/apache/commons called require("./pool"), it would work as expected

The only time the program would get non-deterministic behaviour is when some joker goes and installs org/apache/commons/pool.js in two different places on the module path.  I consider this an error in package management.

I'm not sure how much time you've spent in C-land, but I consider the module.path to be equivalent to LD_LIBRARY_PATH and relative-pathed modules to be like Darwin binaries loaded based on @install_path from install_name_tool.  This works very well for MacOSX; Mac packages tend to be very transportable. I wish I could get that level of accuracy out the runtime linkers shipped by GNU or Sun.

Wes

Christoph Dorn

unread,
Sep 30, 2009, 4:00:44 PM9/30/09
to comm...@googlegroups.com

> Say a program needs the package org/apache/commons/pool.
>
> The program will need to ship with some kind of a JSON file describing
> the program's requirements and maybe where to get them, e.g.
>
> [ { name: "org/apache/commons/pool", url:
> "http://apache.org/commons/pool.car" }, ... ]
>
> The installer then checks, say,
> /usr/local/commonjs/packages/org/apache/commons and finds that there is
> no pool.js installed, so it fetches
> http://apache.org/commons.org/pool.car, which is a zip file containing
> pool.js and pool/lib/factory.js.
>
> - When the pool module needs stuff from the factory, it will
> require("./lib/factory.js");
> - When the program wants the pool module, it will
> require("org/apache/commons/pool");
> - Since /usr/local/commonjs/packages/ is on the module path, that module
> will get loaded.
> - The implementation might make module.id <http://module.id> be
> /usr/local/commonjs/packages/org/apache/commons/pool, but any other
> unique identifier would do just fine
> - If another package in /usr/local/commonjs/packages/org/apache/commons
> called require("./pool"), it would work as expected
>
> The only time the program would get non-deterministic behaviour is when
> some joker goes and installs org/apache/commons/pool.js in two different
> places on the module path. /I consider this an error in package
> management./

>
> I'm not sure how much time you've spent in C-land, but I consider the
> module.path to be equivalent to LD_LIBRARY_PATH and relative-pathed
> modules to be like Darwin binaries loaded based on @install_path from
> install_name_tool. This works very well for MacOSX; Mac packages tend
> to be very transportable. I wish I could get that level of accuracy out
> the runtime linkers shipped by GNU or Sun.

If I got this right you are saying that any module identifier not
starting with a relative path ("./" or "../") will match against the
sandbox require.paths.

This only works if your sandbox has only one "org/apache/commons/pool"
on require.paths (as mentioned). What happens if the "org/foo" package
in your program relies on a different version of
"org/apache/commons/pool"? How are package versions represented in the
module paths?

I have spend some time working through this issue while working on new
catalog functionality for tusk. I posted an outline of my solution here [1].

The proposed solution maps package aliases to module paths via
package.json in a similar fashion as you described above.

Without the tusk-specific URL and catalog functionality dependencies in
foo/package.json could be declared as:

{
"dependencies": [
["template"],
["org.apache.commons.pool", {
"url": "http://apache.org/commons/pool.car",
"revision": "<version>"
}]
]
}

where the dependencies are located at:

foo/packages/template
foo/packages/dependencies/org.apache.commons.pool/<revision>

and the require.paths for the sandbox are:

foo/packages/template/lib
foo/packages/dependencies/

and:

/* called from foo/lib/module.js */
require("factory", "org.apache.commons.pool");

is equivalent to:

require("org.apache.commons.pool/<revision>/factory");

The difference between the "template" and "org.apache.commons.pool"
dependencies is that "template" ships with the "foo" package where it is
versioned with the "foo" package while "org.apache.commons.pool" is an
external dependency.

Now my argument with the catalog functionality is that we need a unique
namespace for packages for the require.paths. We have two options to
accomplish this:

1) Place external dependencies of package "foo" into "foo/packages/" and
if package "foo" depends on package "bar" place bar's dependencies into
"foo/packages/bar/packages/...

2) Place external dependencies of package "foo" into the *program's*
"packages" directory not "foo/packages". Same goes for bar's dependencies.

Option (1) will give us deeply nested directory trees that are hard to
work with on many levels. Option (2) gives us a relatively flat
directory structure.

If a program places all external dependencies into
"packages/dependencies" it put's an emphasis on the program's packages
at "packages/*" without all the clutter of the dependencies which is
typically desired.

Now with option (2) there can easily be two:

/packages/dependencies/org.apache.commons.pool/

which is why I am advocating a further level of indirection via a
"catalog" or "collection identifier" that together with a package name
and revision points to a unique set of code.

I used to do a lot of work in Java where a naming scheme for imports of
org.apache.commons.* is sufficient to avoid conflicts between packages
because it is understood that it follows a reverse hostname convention.

Recently I have been doing a lot of work in PHP and client-side
javascript where there are no real package naming conventions. If
CommonJS is intended for server as well as client environments we will
attract a lot of developers from the PHP and browserJS camps that have
no real understanding of proper package naming conventions.

To mitigate these risks of package naming conflicts I have introduced
the "collection identifier" which follows the java package naming
scheme. This forces a developer to identify where a package is from and
since only one package can come from the same location it enforces a
unique naming scheme.

For example the "collection identifier" for all my packages on github is:

com.github.cadorn

I am not sure how much of this is applicable to what we are specifically
discussing, but it seems to have a big impact on how programs are going
to be composed. I envision programs made up of hundreds of interlinked
packages and IMHO without a reasonable directory structure that is
relatively flat it is going to be a nightmare to work with.

Christoph

[1] -
http://groups.google.com/group/narwhaljs/browse_thread/thread/9cd1758fd41631c0?hl=en

Neville Burnell

unread,
Sep 30, 2009, 8:03:02 PM9/30/09
to CommonJS
>> "collection identifier"

perhaps "publisher" would be nicer

On Oct 1, 6:00 am, Christoph Dorn <christoph...@christophdorn.com>
wrote:
> [1] -http://groups.google.com/group/narwhaljs/browse_thread/thread/9cd1758...

ihab...@gmail.com

unread,
Oct 1, 2009, 5:33:00 PM10/1/09
to comm...@googlegroups.com
[ Catching up ... boy are you folks prolific! ]

On Wed, Sep 30, 2009 at 10:10 AM, Wes Garland <w...@page.ca> wrote:
>  I believe these rules are needed to make resolution of installed modules
> deterministic:
>   - require("a") loads a.js on the module path (require.paths)
>   - require("./a") loads a.js from the same container (e.g. dir) as the
> loading module
>   - require("../a") loads a.js from the loading module's parent's container

This is only deterministic to the extent that the module path is deterministic.

The proposal on the table provides two levels of determinism:

* The "common" package (if that's what we call it). Modules in the
common package can refer to one another just as you wish. Any
management of the path semantics is outside the scope of the current
proposal.

* The rest of the packages, where the correctness model (which the
proposal intends to defend in the face of deliberate malice) is that a
package can verify its dependencies with great precision, without
having to involve a local sysadmin in the decision making.

>> Specifically, how do you solve the "diverse and sundry versions of the
>> warez on the internets" problem?
>

> Likely out-of-band package management.

Ok, the current proposal does not prevent you from doing so -- in the
"common" package. Perhaps your use case does not require the level of
verifiability of dependencies in a zero-admin situation in the face of
malice. The proposal does not get in your way.

> I consider these problems to be
> effectively identical: "Where do I find package.json" and "Where do I find
> module.js" -- and I haven't seen a solution yet for "where do I find
> package.json" (I may have missed it).

See http://wiki.commonjs.org/wiki/File:Packages-5.png.

Each package contains its own "manifest.json".

Consider a module "a" from package "A" which loads module "b" from
package "B". The loader of "a" has access to the manifest.json of
"a"'s parent package, "A". That manifest.json specifies, to the
necessary degree of precision, how to locate and verify package "B".
This loader will then dig into package "B" and find module "b".

> Recall that I've shown earlier in this thread that modules themselves can
> take the place of module containers (e.g. packages), provided that modules
> can be require()d deterministically.  This could be done by either exporting
> the containing module's require, or the containing module can export the
> submodules' exports objects.

A package is merely a convenience, on the assumption that a
"manifest.json" is too much of a pain in the neck to administer for
each "*.js" file individually. That said, and *given* that there is a
need for the contents of a "manifest.json", are you really suggesting
that we repeat these contents in each and every individual "*.js"
file? If not, then we need packages.

> If a module X needs a module Y that ships with X, module X can
> require("./Y"); provided require is made deterministic.

The "ships with" case is uninteresting. Modules bundled together in
the same package, under the proposal currently on the table, can refer
to one another just as you describe.

> Now, if a module A needs a specific version of another module B that doesn't
> ship with A: we have to solve this with decent naming convention  (whether
> we are talking json-defined packages OR modules-which-are-packages).

Yes.

> So: globally-available modules/packages have to have unique names in a given
> CommonJS installation. That should be managed by who/whatever is installing
> them.  A convention like module-packages start with "packages/" might help,
> but should not need to be rigidly specified.

Not in a *given* CommonJS installation. That does not give the "zero
admin reliability" property. I claim we need, in the *world*. The
*universe*. In all *seven* of the internets, not just the most
commonly used four or five.

Is that the crux of our misunderstanding?

> Locally-available modules -- i.e. those that ship with other modules -- can
> be deterministically located relative to the globally-available module that
> needs it, via the ./ and ../ relative module syntax; provided
> require-determinism is fixed.

Yes, as noted above, this is taken care of.

> So, let's take another example, all the way to the end case. I'll invent
> stuff about the package manager as I go along to make sure everything
> works.  (Man, sorry I'm being so verbose here - but it helps me think):

Np - thanks for being concrete! :)

> Say a program needs the package org/apache/commons/pool.
>
> The program will need to ship with some kind of a JSON file describing the
> program's requirements and maybe where to get them, e.g.
>
> [ { name: "org/apache/commons/pool", url:
> "http://apache.org/commons/pool.car" }, ... ]

Excellent.

(So again, as an aside: the "package" level of abstraction is simply
the realization that most folks will want to write out such
information for a *group* of *.js modules at a time, rather than one
by one. If each *.js file had to include this info for each of its
dependencies, it gets repetitious and error-prone in a hurry. The idea
is a "package" represents a set of *.js files that are developed and
versioned together.)

> The installer then checks, say,
> /usr/local/commonjs/packages/org/apache/commons and finds that there is no
> pool.js installed, so it fetches http://apache.org/commons.org/pool.car,
> which is a zip file containing pool.js and pool/lib/factory.js.

The key point is what you *mean* when you say, "there is no pool.js
installed". Which pool.js? What if there were another module somewhere
that happened to ask for:

[ { name: "org/apache/commons/pool", url:
"http://evil.org/fakeApache/fakeCommons/fakePool.car" }, ... ]

Which pool.js gets installed? Is it now a race between the program
that require()-ed the evil version versus the program that
require()-ed the benign version?

> - When the pool module needs stuff from the factory, it will
> require("./lib/factory.js");

Nice. That is supported by the current proposal, btw.

> - When the program wants the pool module, it will
> require("org/apache/commons/pool");

Ok. Except in the current proposal -- and distinctly (I think) from
your counterproposal -- "org/apache/commons/pool" is now a *local*
name to each program that included it. So the good program gets the
good pool.js that it asked for, and the evil program gets the evil
pool.js that it asked for, but neither one corrupts the other.

> - Since /usr/local/commonjs/packages/ is on the module path, that module
> will get loaded.

Ok. Again, in the current proposal, that "module path" is private to
the program (package) that issued the description of what that meant
(via the JSON you showed earlier).

> - The implementation might make module.id be
> /usr/local/commonjs/packages/org/apache/commons/pool, but any other unique
> identifier would do just fine

Yep. Any module ID is fine. A hexadecimal number is fine.

> - If another package in /usr/local/commonjs/packages/org/apache/commons
> called require("./pool"), it would work as expected

Sure.

> The only time the program would get non-deterministic behaviour is when some
> joker goes and installs org/apache/commons/pool.js in two different places
> on the module path.  I consider this an error in package management.

Great. So we want to make sure that independently downloaded, possibly
untrusted programs cannot act as jokers in your scenario.

> I'm not sure how much time you've spent in C-land, but I consider the
> module.path to be equivalent to LD_LIBRARY_PATH and relative-pathed modules
> to be like Darwin binaries loaded based on @install_path from
> install_name_tool.  This works very well for MacOSX; Mac packages tend to be
> very transportable. I wish I could get that level of accuracy out the
> runtime linkers shipped by GNU or Sun.

Mac packages are nowhere near as transportable as the current proposal
proposes that JavaScript packages be. Mac packages, for one thing, are
written in C and run with the full authority of the currently
logged-in user. Even the smallest program is therefore naturally part
of the trusted computing base of the user on whose behalf it runs.
Thus package installation is done only with care.

In other words, here Wes, I have this DMG that I want you to install.
It's a program written in C, and it's called Win.app. Run it to win a
million dollars in the Elbonian National Lottery with lucky number
12345 and lot number 67890. What have you to fear...?

Clearly, you have a lot to fear. In the securable modules proposal,
you have *nothing* to fear, as long as the package I send you is
written in pure JS. In this new state of affairs, people will (and
should!) "install" and run packages with impunity. (Again, consider
the zero-admin hosted Web server case.) But this induces a somewhat
different take on the basic architecture.

Daniel Friesen

unread,
Oct 1, 2009, 5:56:53 PM10/1/09
to comm...@googlegroups.com
ihab...@gmail.com wrote:
> ...

>> I'm not sure how much time you've spent in C-land, but I consider the
>> module.path to be equivalent to LD_LIBRARY_PATH and relative-pathed modules
>> to be like Darwin binaries loaded based on @install_path from
>> install_name_tool. This works very well for MacOSX; Mac packages tend to be
>> very transportable. I wish I could get that level of accuracy out the
>> runtime linkers shipped by GNU or Sun.
>>
>
> Mac packages are nowhere near as transportable as the current proposal
> proposes that JavaScript packages be. Mac packages, for one thing, are
> written in C and run with the full authority of the currently
> logged-in user. Even the smallest program is therefore naturally part
> of the trusted computing base of the user on whose behalf it runs.
> Thus package installation is done only with care.
>
> In other words, here Wes, I have this DMG that I want you to install.
> It's a program written in C, and it's called Win.app. Run it to win a
> million dollars in the Elbonian National Lottery with lucky number
> 12345 and lot number 67890. What have you to fear...?
>
> Clearly, you have a lot to fear. In the securable modules proposal,
> you have *nothing* to fear, as long as the package I send you is
> written in pure JS. In this new state of affairs, people will (and
> should!) "install" and run packages with impunity. (Again, consider
> the zero-admin hosted Web server case.) But this induces a somewhat
> different take on the basic architecture.
>
> Ihab
>
>
*cough* Since when was there nothing to fear?
Since when did JS make things magically safe when we are specifying
filesystem access, socket access, and whatnot. As well as the fact that
most implementations provide some sort of LiveConnect or FFI (GPSEE and
any Rhino based impl).
Last I checked we specified nothing about CommonJS in it's entirety
being run in some secure sandboxed environment, nor that to implement
packages one must implement sandboxes or some sort of security and run
packages in them.

Kris Kowal

unread,
Oct 1, 2009, 6:01:10 PM10/1/09
to comm...@googlegroups.com
On Thu, Oct 1, 2009 at 2:56 PM, Daniel Friesen
<nadir.s...@gmail.com> wrote:
> *cough* Since when was there nothing to fear?
> Since when did JS make things magically safe when we are specifying
> filesystem access, socket access, and whatnot. As well as the fact that
> most implementations provide some sort of LiveConnect or FFI (GPSEE and
> any Rhino based impl).
> Last I checked we specified nothing about CommonJS in it's entirety
> being run in some secure sandboxed environment, nor that to implement
> packages one must implement sandboxes or some sort of security and run
> packages in them.

These features are intended to address the desire for safe access to
remote packages in secure subsets of the CommonJS platform, which will
be achievable with the introduction of ES5's Object.freeze and
potentially ES5+ hermetic evaluation, or with something like Caja,
which is supporting CommonJS modules.

Kris Kowal

ihab...@gmail.com

unread,
Oct 3, 2009, 1:14:18 AM10/3/09
to comm...@googlegroups.com
[ more catching up ... ]

On Wed, Sep 30, 2009 at 1:00 PM, Christoph Dorn
<christ...@christophdorn.com> wrote:
> 1) Place external dependencies of package "foo" into "foo/packages/" and
> if package "foo" depends on package "bar" place bar's dependencies into
> "foo/packages/bar/packages/...

As you note below, I agree this is intractable. If for no other reason
than that, if two packages in different places *happen* to be
require()-ing the same, bit-identical package, this scheme duplicates
that package unnecessarily.

> Now [ ... ] there can easily be two:


>
>   /packages/dependencies/org.apache.commons.pool/
>
> which is why I am advocating a further level of indirection via a
> "catalog" or "collection identifier" that together with a package name
> and revision points to a unique set of code.

Ok, so this is actually a new thing that I'm going to try to hack into
my own package proposal. Some package importing another package can
specify either the complete set of meta-information required to
retrieve and verify the imported package --

{
locate: [ 'http://foosoft.com/foo.zip', 'https://sw.com/theFoo.zip'],
verify: { checksum: ... }
}

or it can accept the authority of a standard catalog site, and use the
name given to the package by the catalog site --

{
catalog: 'https://trustedcatalog.com/', /* note use of HTTPS */
{ name: 'theFoo', version: '3.8', /* and whatever other attributes */ }
}

Note that the JSON I propose is *NOT* normative; there is a whole
'nother round of proposal I plan to do which would decide what JSON to
use and ensure that it is consistent with Tusk.

> I used to do a lot of work in Java where a naming scheme for imports of
> org.apache.commons.* is sufficient to avoid conflicts between packages
> because it is understood that it follows a reverse hostname convention.

Yup. Even in Java, though, it is understood :) until you have two
thingeys in the same JVM which want different versions of
org.apache.commons in which case the Java language does not help you
at all and you have to resort to classloader shenanigans; and these
shenanigans only get you *part* of the way there anyway.

ihab...@gmail.com

unread,
Oct 3, 2009, 1:20:33 AM10/3/09
to comm...@googlegroups.com
On Wed, Sep 30, 2009 at 5:03 PM, Neville Burnell
<neville...@gmail.com> wrote:
>>> "collection identifier"
>
> perhaps "publisher" would be nicer

In the context in which Christoph used it, yes.

This "publisher" in order to be verifiably globally unique, needs to
be tied to some mechanism for ensuring uniqueness and authenticity; it
needs to encode more than just a plain text name. The ownership
guarantees of DNS and the MITM-resistance of HTTPS would be a good
start, so perhaps these identifiers should be HTTPS URLs. There are
less "centrally planned" techniques for doing this as well using
public key cryptography, but again, in this case, the publisher
identifier would still have to have more semantics than just a plain
string.

Neville Burnell

unread,
Oct 3, 2009, 1:36:50 AM10/3/09
to CommonJS
HTTPS URLS certainly make sense for a verifiable publisher

On Oct 3, 3:20 pm, ihab.a...@gmail.com wrote:
> On Wed, Sep 30, 2009 at 5:03 PM, Neville Burnell
>

Ash Berlin

unread,
Oct 3, 2009, 6:05:36 AM10/3/09
to comm...@googlegroups.com

On 3 Oct 2009, at 06:36, Neville Burnell wrote:

>
> HTTPS URLS certainly make sense for a verifiable publisher

Except that HTTPs urls are a pain to actually have - it requires
someone to have a dedicated IP on which to host the webserver since
https cannot be vhosted.

This seems like a large barrier to entry for writing/publishing
modules which would kill any hopes for a large, CPAN like community of
modules.

Also worth considering is that different versions of the same open-
source package might end up being released legitimately by different
people. What ever scheme we come up with must allow this sort of thing
without requiring a single person to do the releases, or to require
something horrible like password sharing.

-ash

Neville Burnell

unread,
Oct 3, 2009, 7:05:24 AM10/3/09
to CommonJS
> Except that HTTPs urls are a pain to actually have - it requires  
> someone to have a dedicated IP on which to host the webserver since  
> https cannot be vhosted.
>
> This seems like a large barrier to entry for writing/publishing  
> modules which would kill any hopes for a large, CPAN like community of  
> modules.

Yes, although we could certainly have HTTPS urls as *one* method of
verifiable publisher.

A large community will include unsigned publishers, with zero barrier
to entry, so I don't see a problem with that.

Christoph Dorn

unread,
Oct 3, 2009, 3:17:17 PM10/3/09
to comm...@googlegroups.com

>> Now [ ... ] there can easily be two:
>>
>> /packages/dependencies/org.apache.commons.pool/
>>
>> which is why I am advocating a further level of indirection via a
>> "catalog" or "collection identifier" that together with a package name
>> and revision points to a unique set of code.
>
> Ok, so this is actually a new thing that I'm going to try to hack into
> my own package proposal. Some package importing another package can
> specify either the complete set of meta-information required to
> retrieve and verify the imported package --
>
> {
> locate: [ 'http://foosoft.com/foo.zip', 'https://sw.com/theFoo.zip'],
> verify: { checksum: ... }
> }
>
> or it can accept the authority of a standard catalog site, and use the
> name given to the package by the catalog site --
>
> {
> catalog: 'https://trustedcatalog.com/', /* note use of HTTPS */
> { name: 'theFoo', version: '3.8', /* and whatever other attributes */ }
> }

Where will the JSON above be placed? Into package.json?:

{
"dependencies": [
["theFoo", {
catalog: 'https://trustedcatalog.com/group/catalog.json',


name: 'theFoo',
version: '3.8'

}]
]
}

> or it can accept the authority of a standard catalog site, and use the
> name given to the package by the catalog site --

I do not think the name of a "remote" package should *ever* be used. The
remote package name is intended for the purpose of identifying the
package within the catalog only.

When the remote dependent package is used within your package you *must*
give it a name *local to your package*. (See above)

Now with either "import statement" we need to decide how this translates
into an install path for the package. If we agree that external
dependencies should get installed into a sea-wide (narwhal terminology)
dependency tree we need a way to generate a unique path for a package
that is consistent across multiple imports targeting the same
package/version.

Using the import statements from above the only way to do this I can
think of is to generate a hash based on the "locate" or
"catalog"/"name"/"version":

sea/packages/dependencies/<md5_hash>

When a require(<module>, "theFoo") is issue from our importing package
we can identity the dependency and map to the appropriate path.

Now a hash will work great for production systems but lacks
descriptiveness during development. This is where the tusk catalog
functionality comes in that I am proposing.

package.json ~ {
"name": "foo",
"dependencies": [
["theFoo", "tusk://com.trustedcatalog/group/theFoo/3.8"]
]
}

will translate to an install path of:

sea/packages/dependencies/com.trustedcatalog/group/theFoo/3.8

If trustedcatalog.com runs a spec-compliant package registry the catalog
will be accessible with:

https://trustedcatalog.com/group/catalog.json

and contain:

{
"name": "com.trustedcatalog",
"collection": "group",
"packages": {
"theFoo": [
["latest", {
locate: [
'http://foosoft.com/foo.zip'
],
verify: { signature: ... }
}],
["3.8", {
locate: [
'http://foosoft.com/foo-3.8.zip'
'http://sw.com/theFoo-3.8.zip'
],
verify: { checksum: ... }
}],
["*", {
locate: [
'http://foosoft.com/foo-{*}.zip'
'http://sw.com/theFoo-{*}.zip'
],
verify: { signature: ... }
}]
]
}
}

When the importing package is installed:

tusk package install foo

and the "tusk://com.trustedcatalog/group/theFoo/3.8" dependency
encountered and "com.trustedcatalog/group" catalog is not yet present
locally it will download the catalog and verify the signature of the
catalog etc...

If you want to override the "tusk://com.trustedcatalog/group/theFoo/3.8"
dependency with your own code you can issue:

tusk catalog overlay --catalog com.trustedcatalog/group com.mycatalog

Where your "com.mycatalog" declares the "theFoo" package.

I do not see a way to override dependencies centrally when using
"locate" in package.json other than with package.local.json files for
each applicable package.


Some of this is obviously tusk-specific but it impacts on the install
path for packages and the allowed syntax of "import statements". I think
we need to support both types of imports (catalog & locate).

A generic syntax for:

tusk://com.trustedcatalog/group/theFoo/3.8

could be:

catalog://com.trustedcatalog/group/theFoo/3.8

Christoph

Christoph Dorn

unread,
Oct 3, 2009, 3:31:54 PM10/3/09
to comm...@googlegroups.com

>>>> "collection identifier"
>> perhaps "publisher" would be nicer
>
> In the context in which Christoph used it, yes.
>
> This "publisher" in order to be verifiably globally unique, needs to
> be tied to some mechanism for ensuring uniqueness and authenticity; it
> needs to encode more than just a plain text name. The ownership
> guarantees of DNS and the MITM-resistance of HTTPS would be a good
> start, so perhaps these identifiers should be HTTPS URLs. There are
> less "centrally planned" techniques for doing this as well using
> public key cryptography, but again, in this case, the publisher
> identifier would still have to have more semantics than just a plain
> string.

Ensuring uniqueness and authenticity is a matter that should be left to
the package manager implementation. I think we will want to support both:

http://domain.com/path/to/catalog.json ~ {
"name": "myAweSomeCatalOg"
}

which will result in a local catalog name of: "myAweSomeCatalOg" as well as:

https://repo.jshq.org/narwhal/ext/catalog.json ~ {
"name": "org.jshq.repo",
"collection": "narwhal/ext"
}

which will result in a local catalog name of: "org.jshq.repo" and
provide packages for the "narwhal/ext" collection. i.e.

catalog://<catalog>/<collection>/<package>/<revision>
catalog://org.jshq.repo/narwhal/ext/fooPackage/3.2

Ideally we have a spec for building compliant package managers and
registry servers for CommonJS.

Christoph

Ash Berlin

unread,
Oct 3, 2009, 3:39:03 PM10/3/09
to comm...@googlegroups.com

On 3 Oct 2009, at 20:31, Christoph Dorn wrote:

>
> Ideally we have a spec for building compliant package managers and
> registry servers for CommonJS.


If we build more than one package registry server it will fail. Or
more specifically, is there isn't some central list of packages the
whole package eco systems won't be as useful/successful as it could.
There can be multiple search interfaces, but the should all index the
same packages.

-ash

Christoph Dorn

unread,
Oct 3, 2009, 4:04:01 PM10/3/09
to comm...@googlegroups.com
>> Ideally we have a spec for building compliant package managers and
>> registry servers for CommonJS.
>
> If we build more than one package registry server it will fail. Or

I disagree. We are building infrastructure tools here that can be used
for public and private purposes. A company should be able to run it's
own package registry internally and use this in combination with a
public one.


> more specifically, is there isn't some central list of packages the
> whole package eco systems won't be as useful/successful as it could.
> There can be multiple search interfaces, but the should all index the
> same packages.

This is a matter of organization not specs/code. We can easily have a
package registry at:

registry.jshq.org

that will mirror other registries.

For example you have a public registry for your company at:

packages.company.com

If you want to make this available via registry.jshq.org you follow some
procedure put in place by registry.jshq.org. The result is:

https://registry.jshq.org/packages.company.com/

which is equivalent to

https://packages.company.com/

When you push to your registry it can notify "registry.jshq.org" which
will mirror your catalogs.

Christoph

Neville Burnell

unread,
Oct 3, 2009, 7:31:44 PM10/3/09
to CommonJS

> > If we build more than one package registry server it will fail.

I don't think so.

Rubygems works perfectly well with private gem servers, github,
rubygems.org etc.

And adding a new server is easy your config, so gem consumers are not
inconvenienced at all.

ihab...@gmail.com

unread,
Oct 4, 2009, 12:22:29 AM10/4/09
to comm...@googlegroups.com
On Sat, Oct 3, 2009 at 12:17 PM, Christoph Dorn
<christ...@christophdorn.com> wrote:
> Where will the JSON above be placed? Into package.json?:

Yes.

>   {
>     "dependencies": [
>       ["theFoo", {
>         catalog: 'https://trustedcatalog.com/group/catalog.json',
>         name: 'theFoo',
>         version: '3.8'
>       }]
>     ]
>   }

Yup, just like that. :)

> I do not think the name of a "remote" package should *ever* be used. The
> remote package name is intended for the purpose of identifying the
> package within the catalog only.

Indeed. Sorry that I got sloppy with my JSON. You are correct. In
fact, to make it more clear:

{
"dependencies": [
["localFoo", {


catalog: 'https://trustedcatalog.com/group/catalog.json',
name: 'theFoo',
version: '3.8'
}]
]
}

"localFoo" is the local name given by the importing package. "theFoo"
is what the catalog calls it.

> When the remote dependent package is used within your package you *must*
> give it a name *local to your package*. (See above)

Yes.

> Now with either "import statement" we need to decide how this translates

> into an install path for the package. ...


> Using the import statements from above the only way to do this I can
> think of is to generate a hash based on the "locate" or
> "catalog"/"name"/"version":
>   sea/packages/dependencies/<md5_hash>

Hashes are *one* way. It's an implementation detail. But you claim --

> Now a hash will work great for production systems but lacks
> descriptiveness during development.

That's a good concern. So long as any path or name or what not is a
*mnemonic*, there is nothing wrong with providing it. As long as the
generality of the overall system is not constrained by this
requirement.

> This is where the tusk catalog
> functionality comes in that I am proposing.
>
> package.json ~ {
>   "name": "foo",
>   "dependencies": [
>     ["theFoo", "tusk://com.trustedcatalog/group/theFoo/3.8"]
>   ]
> }
>
> will translate to an install path of:
>
>   sea/packages/dependencies/com.trustedcatalog/group/theFoo/3.8

Why invent another kind of URI? So --

> If trustedcatalog.com runs a spec-compliant package registry the catalog
> will be accessible with:
>   https://trustedcatalog.com/group/catalog.json

Why not [ "https://trustedcatalog.com/group/catalog.json",
"packageName" ]? Making HTTP[S] URIs explicit makes it easier to build
tools that work with the system, rather than having to crack custom
URIs. But this is probably bikeshedding at this point.... :)

> When the importing package is installed:
>
>   tusk package install foo
>
> and the "tusk://com.trustedcatalog/group/theFoo/3.8" dependency
> encountered and "com.trustedcatalog/group" catalog is not yet present
> locally it will download the catalog and verify the signature of the
> catalog etc...

Respecting HTTP cache control headers, right?

> If you want to override the "tusk://com.trustedcatalog/group/theFoo/3.8"
> dependency with your own code you can issue:
>
>   tusk catalog overlay --catalog com.trustedcatalog/group com.mycatalog
>
> Where your "com.mycatalog" declares the "theFoo" package.

Ok, overlays are an interesting use case.

We should be sure this all works with the require.async(<something
totally new>) as well. In other words, running code can trigger an
"install"....

> I do not see a way to override dependencies centrally when using
> "locate" in package.json other than with package.local.json files for
> each applicable package.

Sure -- well there can be both catalog-based and non-catalog-based imports.

> I think we need to support both types of imports (catalog & locate).

Agreed. Really, the syntax for "locate" should just be a "private
snippet of catalog matter inside the importing package" -- not a new
thing altogether.

> A generic syntax for:
>   tusk://com.trustedcatalog/group/theFoo/3.8
> could be:
>   catalog://com.trustedcatalog/group/theFoo/3.8

True. Again, I'm not cozied up to custom URI schemes yet but....

ihab...@gmail.com

unread,
Oct 4, 2009, 12:23:52 AM10/4/09
to comm...@googlegroups.com
On Sat, Oct 3, 2009 at 1:04 PM, Christoph Dorn
<christ...@christophdorn.com> wrote:
>> If we build more than one package registry server it will fail. Or
>
> I disagree. We are building infrastructure tools here that can be used
> for public and private purposes. A company should be able to run it's
> own package registry internally and use this in combination with a
> public one.

+1

Kris Kowal

unread,
Oct 4, 2009, 12:26:45 AM10/4/09
to comm...@googlegroups.com
On Sat, Oct 3, 2009 at 9:22 PM, <ihab...@gmail.com> wrote:
>> will translate to an install path of:
>>
>>   sea/packages/dependencies/com.trustedcatalog/group/theFoo/3.8
>
> Why invent another kind of URI? So --
>
>> If trustedcatalog.com runs a spec-compliant package registry the catalog
>> will be accessible with:
>>   https://trustedcatalog.com/group/catalog.json
>
> Why not [ "https://trustedcatalog.com/group/catalog.json",
> "packageName" ]? Making HTTP[S] URIs explicit makes it easier to build
> tools that work with the system, rather than having to crack custom
> URIs.

+1. I think the internal cache locations ought to be mapped out of
real URL's. They're easier to tool.

Kris Kowal

ihab...@gmail.com

unread,
Oct 4, 2009, 12:41:55 AM10/4/09
to comm...@googlegroups.com
On Sat, Oct 3, 2009 at 12:31 PM, Christoph Dorn
<christ...@christophdorn.com> wrote:
> Ensuring uniqueness and authenticity is a matter that should be left to
> the package manager implementation. I think we will want to support both:
>
>   http://domain.com/path/to/catalog.json ~ {
>     "name": "myAweSomeCatalOg"
>   }
>
> which will result in a local catalog name of: "myAweSomeCatalOg" as well as:
>
>   https://repo.jshq.org/narwhal/ext/catalog.json ~ {
>     "name": "org.jshq.repo",
>     "collection": "narwhal/ext"
>   }
>
> which will result in a local catalog name of: "org.jshq.repo" and
> provide packages for the "narwhal/ext" collection. i.e.
>
>   catalog://<catalog>/<collection>/<package>/<revision>
>   catalog://org.jshq.repo/narwhal/ext/fooPackage/3.2

I'm afraid I really don't understand the semantics of these 2 snippets of JSON.

Ash Berlin

unread,
Oct 4, 2009, 8:00:12 AM10/4/09
to comm...@googlegroups.com

A private one sure, this makes perfect sense.


On 4 Oct 2009, at 00:31, Neville Burnell wrote:

>
>>> If we build more than one package registry server it will fail.
>

> I don't think so.
>
> Rubygems works perfectly well with private gem servers, github,
> rubygems.org etc.
>
> And adding a new server is easy your config, so gem consumers are not
> inconvenienced at all.

My intention isn't that there *must* be only one server you can ever
use, but that there starts off with *one* server which is
authoritative and knows about all the modules/packages (probably via
scraping and submission etc) so that users have one place to search
for 'is there already a module that does this'.

This is a typical failing of Java: there are probably lots of free
+open solutions out there for any given problem, but they are hard to
find so to all intents and purposes they don't exist because no one
uses them.

-ash

Christoph Dorn

unread,
Oct 4, 2009, 3:01:54 PM10/4/09
to comm...@googlegroups.com
>> Now with either "import statement" we need to decide how this translates
>> into an install path for the package. ...
>> Using the import statements from above the only way to do this I can
>> think of is to generate a hash based on the "locate" or
>> "catalog"/"name"/"version":
>> sea/packages/dependencies/<md5_hash>
>
> Hashes are *one* way. It's an implementation detail. But you claim --
>
>> Now a hash will work great for production systems but lacks
>> descriptiveness during development.
>
> That's a good concern. So long as any path or name or what not is a
> *mnemonic*, there is nothing wrong with providing it. As long as the
> generality of the overall system is not constrained by this
> requirement.

What do you mean with "there is nothing wrong with providing it"? Where
would the path be provided?

>> This is where the tusk catalog
>> functionality comes in that I am proposing.
>>
>> package.json ~ {
>> "name": "foo",
>> "dependencies": [
>> ["theFoo", "tusk://com.trustedcatalog/group/theFoo/3.8"]
>> ]
>> }
>>
>> will translate to an install path of:
>>
>> sea/packages/dependencies/com.trustedcatalog/group/theFoo/3.8
>
> Why invent another kind of URI? So --

One of the original requirements/intents was to have catalog names that
can be sorted nicely on a file system and in lists (by simply flipping
the domain parts) ... see below ...


>> If trustedcatalog.com runs a spec-compliant package registry the catalog
>> will be accessible with:
>> https://trustedcatalog.com/group/catalog.json
>
> Why not [ "https://trustedcatalog.com/group/catalog.json",
> "packageName" ]? Making HTTP[S] URIs explicit makes it easier to build
> tools that work with the system, rather than having to crack custom
> URIs. But this is probably bikeshedding at this point.... :)

I think a catalog URL ought to:

* make it obvious it's a catalog URL
* support overlays
* imply location of install path
* fit into a "/" delimited namespace
* support addition of selectors to identify a package at a given revision

The tusk://<catalog>/<collection>/<package>/<revision> URL meets these
requirements because it is URL-based and due to it's composition. We
could instead express it as:

package.json ~ {
"name": "foo",
"dependencies": [

["localBar", {
"catalog": "https://trustedcatalog.com/group/catalog.json",
"package": "theBar"
"revision": "3.8"
"verify": "checksum"
}],
["localBaz", {
"locate": ["https://trustedcatalog.com/group/theBaz-2.1.zip"],
"package": "theBaz"
"revision": "2.1"
"verify": "checksum"
}]
]
}

Which is more verbose but I am starting to agree that this is a better
way to go. These could translate to the following install paths
respectively:

sea/packages/dependencies/trustedcatalog.com/group/theBar/3.8
sea/packages/dependencies/trustedcatalog.com/group/theBaz/2.1

And as for overlays ... see below ...


>> and the "tusk://com.trustedcatalog/group/theFoo/3.8" dependency
>> encountered and "com.trustedcatalog/group" catalog is not yet present
>> locally it will download the catalog and verify the signature of the
>> catalog etc...
>
> Respecting HTTP cache control headers, right?

Right.


>> If you want to override the "tusk://com.trustedcatalog/group/theFoo/3.8"
>> dependency with your own code you can issue:
>>
>> tusk catalog overlay --catalog com.trustedcatalog/group com.mycatalog
>>
>> Where your "com.mycatalog" declares the "theFoo" package.
>
> Ok, overlays are an interesting use case.

Overlays work real nice during development and I think will become an
essential feature for distributed development of programs made up of
many small components. If you cannot maintain your own versions of
dependencies and easily contribute to improving existing components
(even if used deeply nested in other dependencies) you are less likely
to become an active participant in a component ecosystem.

To issue overlays you need to identify the catalogs or packages. Using
URL's to identify catalogs is not ideal because:

* URL's imply the location of a catalog which should be irrelevant
* URL's are quite verbose

The first point is an issue for local and remote catalogs. For remote
catalogs, say you have two mirrored catalogs at:

https://trustedcatalog.com/group/catalog.json
https://cdn.com/group/catalog.json

and you are installing dependencies for your program each of which use
the same package but load it via different catalogs. To issue an overlay
that targets both catalogs we need to find a commonality between them.


My original thinking was to do this by catalog name where catalog.json
defines a "name" property. To make this work in the above scenario one
copy of the catalog must be authoritative while the other is a mirror.

The name could be issued based on a reverse domain + path convention:

com.trustedcatalog.group


A situation where this still breaks down is with dynamic/composite
catalogs that combine set's of packages from other catalogs into one
(and with "located" (instead of "cataloged") packages). To be able to
overlay a package in this situation you really need to be able to target
a package directly.

The only way I can think of targeting a package no matter where it comes
from or what it is called is by a unique ID in the package.json. The
unique ID could be anything including a URL, UUID, email address etc...
I think having an "id" field in package.json that contains a URL that is
unique to the package could make a lot of sense.

package.json ~ {
"id": "http://domain.com/a/namespace/url/that/never/changes"
}

While a URL is still verbose, a package manager implementation can
provide shortcuts to issue overlays ...


Now you may argue that overlays are an implementation issue and I agree
except for the need to have sufficient meta information and structure
prescribed by a spec for catalog.json and package.json files to make
overlays even possible which is why I am bringing all this up.

I am hoping that CommonJS will standardize a packaging model that
accommodates modern workflows and IMHO overlays are an essential part of
this.


> We should be sure this all works with the require.async(<something
> totally new>) as well. In other words, running code can trigger an
> "install"....

The proposed syntax above should accommodate this.


>> I do not see a way to override dependencies centrally when using
>> "locate" in package.json other than with package.local.json files for
>> each applicable package.
>
> Sure -- well there can be both catalog-based and non-catalog-based imports.

The proposed solution above (package.json ~ "id") would make overlays
possible for located and cataloged packages.


>> I think we need to support both types of imports (catalog & locate).
>
> Agreed. Really, the syntax for "locate" should just be a "private
> snippet of catalog matter inside the importing package" -- not a new
> thing altogether.

+1

Christoph


Christoph Dorn

unread,
Oct 4, 2009, 3:03:25 PM10/4/09
to comm...@googlegroups.com
> I'm afraid I really don't understand the semantics of these 2 snippets of JSON.
>
> Ihab

Please ignore in favor of my last post.

Christoph

Reply all
Reply to author
Forward
0 new messages