Precise ID matching in AMD configs

146 views
Skip to first unread message

James Burke

unread,
Jan 18, 2013, 4:59:22 PM1/18/13
to amd-im...@googlegroups.com
The existing paths and map config prefix matching cannot express all
build cases.

What is possible now: routing IDs to a particular built file:

require.config({
'a': 'layer2',
'b': 'layer2',
'c': 'layer2'
});

If a module does a require for 'a', 'b', or 'c', then 'layer2.js' is fetched,
and only fetched once. This is good.

Where this breaks down:

1) paths config is prefix-based, but only some resources might be in a
built file.

Example:

The top module for an ID prefix may be inlined in a built file, but
support files may want to be dynamically loaded. In the above example,
perhaps 'c/util' is not included in 'layer2', and is only needed
occasionally.

Using the above config, 'c/util' would map to 'layer2/util.js', which
is not correct. While it is possible to add a 'c/util' paths entry,
this requires scanning the directory of 'c' to generate those lists,
and may lead to lots of config entries.

It would be better to have something that works off of known
information, without relying on directory scanning. Module IDs can be
reliably translated paths, but given just a path, a loader cannot
reliably map that back to a module ID.

2) Loader plugin resources have a similar problem. While it is
tempting to use "map" config for loader plugin resources, it also
suffers from the prefix rules use for map.

Example using the 'cs' coffeescript transpiler loader plugin, that
after a build can just point to a plain JS module, perhaps in a built
JS file:

{
map: {
'*': {
cs!some/thing': 'some/thingBuilt'
}
}
}

For 'cs!some/thing/else' that may be incorrectly translated as
'some/thingBuilt/else'.

## Possible Solution:

A way in the paths and map config to express "this is not for a
prefix, but for a precise ID".

I am currently favoring ending the config key with a "." So, for the
above examples:

require.config({
'a': 'layer2',
'b': 'layer2',
'c.': 'layer2'
'c': 'libs/c'
});

where 'c.' means only match for full ID of 'c'. For any other module
ID with a 'c' prefix, use 'libs/c'. Properties that end in a '.' take
precedence over non-dot properties. So, 'c.' takes precedence over
'c'.

Same in map:

{
map: {
'*': {
cs!some/thing.': 'some/thingBuilt'
}
}
}

Other ideas? I have not implemented this yet. I could be missing
something, but there are a couple of requirejs open bugs that related
to this issue, so this is my first thought on how to fix it.

James

Miller Medeiros

unread,
Jan 19, 2013, 12:37:50 AM1/19/13
to amd-im...@googlegroups.com
I agree with the need of such feature. I would go with a trailing "$"
since that is what is used on RegExp to say it's the end of the string. It
is also easier to see and harder to mistype.

require.config({
paths : {
'asd' : 'layer2',
'bar' : 'layer2',
'foo$' : 'layer2',
'foo' : 'libs/foo'
}
});

The more features like paths/map/packages I see land into AMD the further
we get from node.js modules. The node.js community is not flexible at all,
which is very frustrating...

I know I'm pretty happy with AMD and that it makes my job way easier
because of all the extra flexibility that the configs provide. For now I
guess I will just ignore the node.js community and use what works for me.

For the kind of work I do "more flexible" == "better".




On 1/18/13 7:59 PM, "James Burke" <jrb...@gmail.com> wrote:

>The existing paths and map config prefix matching cannot express all
>build cases.

Guy Bedford

unread,
Jan 19, 2013, 5:40:05 AM1/19/13
to amd-im...@googlegroups.com
I posted this earlier, not sure if it went through - 

I think it's worth keeping the paths and map configurations as simple and as predictable as possible. The fact that there aren't regular-expression style rules in these is a benefit because it keeps paths simpler.

One thing I'm struggling with is how this mixes path config with resolving layers for normalized module IDs.

For example:

{
  map: {
    '*': {
      'jquery': 'jquery/jquery',
      'jquery/jquery.': 'built/layer'
    }
  }
}

Here, if I request 'jquery', this gets normalized to 'jquery/jquery'. So a request to 'jquery' should also result in 'built/layer' being loaded.

But this seems to go against the fundamental principle of mapping, which is that maps don't get applied more than once.

So my worry is that the syntax is mixing the needs of normalized moduleIDs and unnormalized moduleIDs together. When layer maps should only really apply to normalized module IDs.

In this case, a separate layer map configuration makes a little more sense to me, as it takes the complexity out of map and paths configurations. Then effectively the layer map is the last step in this process:

moduleID -> normalized moduleID (map and paths config) -> layer map -> URL

So unless my reasoning above is wrong, I'd think it is simpler to go with a layerMap-type configuration.

Daniel Dotsenko

unread,
Jan 20, 2013, 8:33:56 PM1/20/13
to amd-im...@googlegroups.com
+1 to Guy Bedford. The need for this seems mostly imaginary, academic. It's so simple to get around this problem by naming things differently.

Still, if this pattern-matching thing is to spread to Paths, my thoughts (as a guy whose task it is to maintain an AMD tree build tool and a number of AMD-based projects.):

The question becomes (for other loaders, build tools) when to treat "hard stop" character (proposed "." or "$" or whateer) as "hard stop" or as part of the actual path string.

What I am trying to say is that you actually have 3 issues here:
1) what pattern-matching syntax to use?
2) how do you make loaders understand if a particular string is a literal string or a pattern-matching formula?
3) how do you convey order of application of pattern-matching formulas?

My feels for solutions to the above would be:
1) Use the pattern-matching syntax that is already known and used so the learning curve is low and bugs are worked out elsewhere.
2) Mark the pattern-matching formula string in such a way that it makes either very clear that it's **not** a literal string or makes it meaningless for literal string-matching.
3) If not set explicitly (i.e. formulas are in an ordered array) make unordered choices completely non-overlapping.

(1) I'd naturally say "just use RegEx object for key in paths object." But, alas, one knows Object keys can RegEx objects be not. ( Yoda voice )
Still, RegEx strings fit here naturally:

Your proposed "foo." compiles to "^foo$"
Your proposed (catch all) "foo" compiles to "^foo.*"

(2) Although "String starting with '^' is a start of a pattern-matching formula string as opposed to string-matching literal" is fairly robust rule that other loaders can easily adapt to, could it be too limiting? Let's say, once you go in pattern-matching way, you like it and you want to express pattern detection in the middle of the string, which would make "^" unneeded in the pattern-matching formula. 

If we assume that all pattern-matching formula strings must start with "^" the consumption of that string would be:
var regex = new RegExp('^foo$')

If you anticipate that we need to mark pattern formula in greater way that provides more flexibile formulas you could go for some more explicit markers of RegEx-iness like "(/"+pattern+"/)":

var regex = eval('(/^foo$/)')

Which is equivalent to 

var regex = (/^foo$/)

(While parens are deemed by some as superfluous, they are actually quite useful for JavaScript parsers to understand where /regex/ ends. In our case, however, they specifically help convey the solid fact that the string starting with "(/" and ending with "/)" is pattern-matching formula.)

Yet. eval is scary. And the formatting is too verbose to justify the gain... I'd treat starting "^" as a reliable marker of pattern-matching formula string.

(3) Conveying order is very easy when you take apart the constituent ambiguities into unambiguous constituents.

Your proposed "foo." compiles to "^foo$"
Your proposed (catch all) "foo" compiles separately to "^foo$" and "^foo.+"

Pseudologic:
- Convert all string-matching literal strings into non-overlapping pattern-matching formula string pairs. Store these in paths object such that new strings are keys and values are same original values that single string-matching literal had. (These are your "default" handlers.)
- Apply on top of the new paths object the paths that were originally defined as pattern-matching formula strings. This effectively overwrites the "default" handlers where supplied pattern-matching formula string === one derived in the step above.

Thus, the order becomes a non-issue, unless user messes up his RegEx. This way one could also decide which eventuality they need to dress up - ".+" or "$"

example:

var paths = {
'foo': 'bar'
,'^foo$': 'baz'
}

which is equivalent to

var paths = {
'^foo.+': 'bar'
,'foo': 'baz'
}

Because the only possible result of above pseudo-code is:

var paths = {
'^foo.+': 'bar'
,'^foo$': 'baz'
}

Which is derived (crudely) as such:

newPaths = {}
lastPass = []
for key in paths:
if hasPatternMarker(key):
lastPass.push(key)
else:
newPaths['^'+key+'$'] = paths[key]
newPaths['^'+key+'.+'] = paths[key]
for key in lastPass:
newPaths[key] = paths[key]


Application of the above becomes simple, unordered:
var path
var foundIt = Object.keys(newPaths).some(function(key){
if (someTestedPath.match( new RegExp(key) )){
path = newPath[key]
return true
}
})

if foundIt:
    // use new 
    path


Does it look more complex? Yes. Do you gain something from complexity? Yes - full power of RegEx in your routes. Is this complexity too much for the gain? Up to you.

Even if full regex is not rolled, surely hope that "hard stop" is not set to "." as that is unclear if this is still part of literal path or a sigh of pattern-matching. I can see uses for having "." at the end to signify "plugins" paths substitution.

Daniel

Ben Hockey

unread,
Jan 22, 2013, 11:04:49 AM1/22/13
to amd-im...@googlegroups.com


On Sunday, January 20, 2013 7:33:56 PM UTC-6, Daniel Dotsenko wrote:
The need for this seems mostly imaginary, academic. It's so simple to get around this problem by naming things differently.

this was my first thought too.  is it a problem that really needs to be solved?

with the discussion so far, my preference is for "$" rather than "." mostly because of the visibility to a developer but to be honest neither really feels right.  i've held off responding for a few days to try and find words to describe why it doesn't "feel" right but so far i don't have anything more concrete to explain what i don't like about it.  

all i have so far is that given that the problem is trivial to work around (name layers differently or write your code in a way that the layer modules are part of your source code and there is appropriate application logic to load them when needed, whether built or not) and that i don't find the proposed solution appealing i think we should take our time to solve it (if we solve it at all).  when faced with modifications to AMD, i try to consider what might be possible if/when we have modules as a native concept in javascript and see how the ideas in AMD would translate - this one just doesn't seem to fit.  i guess that's why it doesn't feel right as it is right now.  maybe someone can suggest a parallel that would be convincing?

ben...

Guy Bedford

unread,
Jan 25, 2013, 8:56:54 AM1/25/13
to amd-im...@googlegroups.com
One more thought I've had on this, is that there is currently no notation / strong distinction made for "normalized modules IDs".

Normalized Module IDs would be:
  • When not a plugin ID, the final module path, just before adding the baseUrl prefix.
  • When a plugin ID, the normalized moduleID for the plugin module, with a normalized plugin name as passed through the plugin normalizer.
It is exactly these normalized module IDs on which the layer mapping would be done.

A notation for normalized module IDs may be more useful, because in most cases, when one uses this notation of 'modulename." what one would really mean is that "this is the final normalized module ID, so don't apply map or paths configuration again".

In terms of notating a normalized module ID, I'm favouring "^" as it is the standard character for normalization in mathematics, and less intrusive than a "$" while still more visible than the ".".

So perhaps something like "^some/module" to indicate a fully-normalized module ID could be a useful variation on the suggestion here.

Note - this is also very much an academic point, so it would need to be weighed up against any use cases. Just a thought - hope I haven't muddled the ideas too much now!

Ben Hockey

unread,
Jan 25, 2013, 10:11:32 AM1/25/13
to amd-im...@googlegroups.com

On Jan 25, 2013, at 7:56 AM, Guy Bedford <guybe...@googlemail.com> wrote:

> One more thought I've had on this, is that there is currently no notation / strong distinction made for "normalized modules IDs".
>

do you have an example usage that demonstrates the need for it?

from the point of view of modules that are written, there is no need for normalized module ids. once you introduce this idea into a module then you lose portability somewhere. a module knowing the normalized id of one of it's dependencies is just as wrong as a module being defined with an explicit ID.

perhaps internally within a loader (or some mechanism that extends a loader - e.g. a plugin) there might be a use for normalized ids but i'm cautious about introducing the concept anywhere external to a loader (even configuration) without some idea of the types of problems that could be solved but aren't solvable now.

ben...

Guy Bedford

unread,
Jan 25, 2013, 10:16:09 AM1/25/13
to amd-im...@googlegroups.com
In optimized builds, normalized module IDs are what need to be mapped to layers to know where to find the named definition for a given module ID. The named definition itself is a normalized moduleID.

Then the dependencies in the definition file are currently somewhere between a normalized ID and a normal ID (partially normalized?). In an ideal world I think these would be normalized as well.

But yes, the notation is probably not necessary. It was just an academic suggestion around the ideas proposed.


ben...

--



Guy Bedford

unread,
Jan 27, 2013, 9:03:26 AM1/27/13
to amd-im...@googlegroups.com, guybe...@googlemail.com
One adjustment to this - paths are not applied in the normalized module ID process, they are part of the url conversion process. Excluding paths config from the normalized module ID makes the relative module ID behaviour exactly as expected, and then this normalized module ID matches up with the requirejs.s.contexts._.defined values.

For plugins this means that ideally, the plugin normalization should only apply map configuration, and the toUrl method should only apply paths configuration not map configuration. I haven't done any tests though to verify this.

The point being is that we then have a well-defined definition for the normalized moduleID to compare a plugin against when trying to match it to a layer.

Sorry, just been trying to get this clear in my own head... 

Daniel Dotsenko

unread,
Jan 28, 2013, 4:17:11 AM1/28/13
to amd-im...@googlegroups.com, guybe...@googlemail.com
Re: "In optimized builds, normalized module IDs are what need ..."

Just a thought: Don't put "resolved" / "normalized" module IDs in optimized build. Build named defines in optimized file as if they still need "paths" to be applied to be consumed. At run-time have loader do the right thing and apply the paths conversion. (This can be and was done by regular mortals like me - PruneJS - so it's not just a smarty-pants suggestion. It can be done rather easily.)


While we are on the subject of pre-resolved IDs, could I poke James in the side and ask how come these:
  "alias/plain.js"
  "alias/plain" // <- plain JavaScript
in the presence of the following paths alias:
  "alias": "js/path/to/real/folder"
resolve to urls:
  "alias/plain.js"
  "js/path/to/real/folder/plain.js" // <- plain JavaScript  

and "js/path/to/real/folder/plain.js" actually runs without error of complain from RequireJS

Whiile:

  "//localhost/path/to/module"  // <- AMD module

resolves to URL

  "//localhost/path/to/module"

and fails to load due to missing ".js".

Confuses the hell out of me.

To be frank, I LOVE the fact that RequireJS properly resolves, loads and runs plain JS from a resource like "look/ma/no/extension" But, dang, all these special handling rules with and without ".js" at the end drive me up the wall. :)

Daniel

Guy Bedford

unread,
Jan 28, 2013, 5:15:22 AM1/28/13
to amd-im...@googlegroups.com, Daniel Dotsenko
At least the paths rules when a 'js' or protocol is present are consistent, but yes one does has to watch out for them.

I've come around to the idea that things would be a lot simpler if one could load both a normalized module ID and a partial module ID through the same function call. And without a notation separation, the best thing here would be to ensure that map configurations can be applied recursively without issue.

So if we could focus on ensuring that we avoid these scenarios:

map: {
  '*': {
    'jquery': 'jquery/jquery',
    'path': 'path/full'
  }
}

'jquery' -> 'jquery/jquery' -> 'jquery/jquery/jquery'
'path/my/module' -> 'path/full/my/module' -> 'path/full/full/my/module'

Then we could ensure that map configurations can be applied recursively without issue.

The first example is solved by Jame's exact initial suggestion of setting that the map config applies to the entire moduleID.
The second example is solved by not allowing map configurations to map a folder into a sub folder within that folder.

If the above two protections could be implemented, then normalized module IDs (with map applied, not paths) can be required and used as dependencies without issue. Also plugins naturally apply map twice because there is normalization and then a require. So this would also be sorted out as well.

James Burke

unread,
Jan 30, 2013, 5:28:16 PM1/30/13
to amd-im...@googlegroups.com
On Tue, Jan 22, 2013 at 8:04 AM, Ben Hockey <neonst...@gmail.com> wrote:
> On Sunday, January 20, 2013 7:33:56 PM UTC-6, Daniel Dotsenko wrote:
>> The need for this seems mostly imaginary, academic. It's so simple to get
>> around this problem by naming things differently.
>
> this was my first thought too. is it a problem that really needs to be
> solved?
>
> with the discussion so far, my preference is for "$" rather than "." mostly
> because of the visibility to a developer but to be honest neither really
> feels right. i've held off responding for a few days to try and find words
> to describe why it doesn't "feel" right but so far i don't have anything
> more concrete to explain what i don't like about it.
>
> all i have so far is that given that the problem is trivial to work around
> (name layers differently or write your code in a way that the layer modules
> are part of your source code and there is appropriate application logic to
> load them when needed, whether built or not) and that i don't find the
> proposed solution appealing i think we should take our time to solve it (if
> we solve it at all).

My initial thought was to also just say "lay out the code
differently". For instance, in the previous example, where 'c/util'
was needed later, and possibly some other 'c/...' modules, place them
in a 'c/sub' folder and then only a paths config for 'c/sub' is
needed.

However, that assumes the person doing the build is the one that also
authored 'c', which may not be the case.

I do not want to start down the path of regexp matching. That path
lies madness. I just wanted to indicate "exact match, not a module ID
prefix", and explicitly I would want to avoid using some indicator
that looks like a regexp.

Looking back over the bugs that triggered this initial post, one of
them was this one:

https://github.com/jrburke/requirejs/issues/497

which is basically about "how can I express 'transpiler!some/id' after
a build can just be found at 'some/id', because the transpiler has
already run and built, pure JS resource at that location"?

If map supported this "exact match" syntax, then I could express that.
But maybe what is needed is a way to express that. In the bug, I
suggested a "pluginMap" that *only* matched on exact IDs.

However, since there was another bug report where they built 'c' but
wanted to load 'c/util' separate, generally allowing "exact match"
syntax for existing paths and map config would fix that.

If we do not want the exact match syntax, I would not mind ideas on
how to express the transpiler ID-to-JS exact ID matching.

James

Guy Bedford

unread,
Feb 12, 2013, 6:37:51 AM2/12/13
to amd-im...@googlegroups.com
James, are you still looking for feedback with respect to the plugin layer mapping in production?

Ideally I think a new configuration property is needed to be able to handle dynamic layer loading in production:

Something like - 

{
  layers: {
    core: ['jquery', 'framework-core'],
    app: ['cs!app/home', 'cs!app/products'],
  }
}

If no layer is specified, normal loading rules would apply.

This way, a runtime request to 'cs!app/products' in production (when 'app' is not already loaded in the page) would resolve to the correct layer without causing duplicate loads as it does currently.


James

--
You received this message because you are subscribed to the Google Groups "amd-implement" group.
To unsubscribe from this group and stop receiving emails from it, send an email to amd-implemen...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



James Burke

unread,
Feb 13, 2013, 12:06:43 AM2/13/13
to amd-im...@googlegroups.com
On Tue, Feb 12, 2013 at 3:37 AM, Guy Bedford <guybe...@googlemail.com> wrote:
> James, are you still looking for feedback with respect to the plugin layer
> mapping in production?
>
> Ideally I think a new configuration property is needed to be able to handle
> dynamic layer loading in production:
>
> Something like -
>
> {
> layers: {
> core: ['jquery', 'framework-core'],
> app: ['cs!app/home', 'cs!app/products'],
> }
> }

I am open to a change like this. It has the advantage of being a
smaller config block than a normal paths config, since the end layer
is specified as the property name with a list of module IDs after
that. And they would be full module IDs and exact matches, no prefixes
like in paths config.

The only thing, this is like a reverse paths config, with the paths
value as the property name. Normally the property names have been
module IDs. This may be a bit confusing if it is instead a path. For
instance, I would expect this would be possible if the property name
is like the normal paths property value:

{
layers: {
'//some.example.com/builds/core': ['jquery', 'framework-core']
}
}

Hmm, so that feels too bizarro. Let's say that the property name is a
module ID, and that module ID can be configured to a different path
using paths config.

The difference between "layers" and "map": "map" deals with one-to-one
aliases, and allow module prefixes. "layers" is not an alias, just a
pointer to the module ID that contains the desired module, does exact
matches and allows pointing many module IDs to one module ID.

I still kind of wish for a way to just say "do exact match" in paths
and module config though. Even though for these cases it would lead to
wordier configs, it felt more uniform. With "layers" it is something
that acts a bit differently.

That said, I will likely try implementing the layers config in
requirejs, as other implementers preferred to keep paths and map
intact, and it is hard to come up with a good enough string character
that indicates "exact match" without then starting to imply regexps.

James

Guy Bedford

unread,
Feb 13, 2013, 4:19:37 AM2/13/13
to amd-im...@googlegroups.com
Just to clarify, would these fully normalized moduleIDs be with paths applied? (I would assume not, and only with map applied to match up with the "defined" id? This was the subject of my previous barrage of mails...).

If we remove the paths and baseUrl mapping, it ensures that we have a nicer looking, and well defined "Canonical ModuleID", and looks less odd as well in the "layer" config. The moduleID as a key really is the simplest way of writing it.

In the documentation for "layer" it would then be very clear exactly the form of ModuleID. I think it makes sense in this way as a separate config due to the different type of moduleID used.


James

James Burke

unread,
Feb 13, 2013, 12:38:57 PM2/13/13
to amd-im...@googlegroups.com
On Wed, Feb 13, 2013 at 1:19 AM, Guy Bedford <guybe...@googlemail.com> wrote:
> Just to clarify, would these fully normalized moduleIDs be with paths
> applied? (I would assume not, and only with map applied to match up with the
> "defined" id? This was the subject of my previous barrage of mails…)

Correct, normalized module IDs are just IDs, nothing to do with paths.
To fetch a module ID, that normalized ID is then passed through logic
that applies baseUrl, paths, packages, map config.

James

Guy Bedford

unread,
Feb 14, 2013, 6:45:00 AM2/14/13
to amd-im...@googlegroups.com
My only worry with taking a normalized moduleID before map configuration is that a `jquery` module could map to two different underlying version moduleIDs based on the map configuration for the specific module using it, while for layer mapping we need the ability to reference a "canonical" global module ID that doesn't vary between sub-modules. 


James

James Burke

unread,
Feb 14, 2013, 12:53:03 PM2/14/13
to amd-im...@googlegroups.com
On Thu, Feb 14, 2013 at 3:45 AM, Guy Bedford <guybe...@googlemail.com> wrote:
> My only worry with taking a normalized moduleID before map configuration is
> that a `jquery` module could map to two different underlying version
> moduleIDs based on the map configuration for the specific module using it,
> while for layer mapping we need the ability to reference a "canonical"
> global module ID that doesn't vary between sub-modules.

The model should be that map config is applied first, then the
following is consulted for location of that module if it is not
already loaded: layers, then paths and packages. So, what is listed in
the layers config is the actual name of the modules that show up in
the layer files, the IDs used in the first argument to define(). Those
IDs (which are normalized, absolute IDs) are the ones that should be
in the layer config. So I believe it works out.

James

Guy Bedford

unread,
Feb 14, 2013, 1:05:07 PM2/14/13
to amd-im...@googlegroups.com
Ok perfect, thanks for taking the time to go through it, that makes complete sense. 


James

James Burke

unread,
Jan 8, 2014, 8:09:05 PM1/8/14
to amd-im...@googlegroups.com
Following up on Guy’s suggestion of a “layers” config, I went with
calling it “bundles” config that uses module IDs for the property
names and values:

http://requirejs.org/docs/api.html#config-bundles

If other loaders end up supporting it, we can look at putting it in to
the common config, but I am also open to other ways to solve the
problem. No immediate action or response is necessary, just giving a
status update.

James
Reply all
Reply to author
Forward
0 new messages