URI vs URL module

93 views
Skip to first unread message

Rasmus Andersson

unread,
Dec 28, 2009, 10:31:51 AM12/28/09
to nod...@googlegroups.com
In response to the "uri" module discussion on GitHub
<http://github.com/ry/node/commit/2f9722cca0a72122aa03763c085f6b5aa7f0ada2#comments>:

A URI is a generic and uniform way to express identifiers. That's it.
Parsing a URL is not within the scope of URI parsing, since a URL is a
derivation — not a subset — of a URI.

Personally I would rather not want to have a separate module for
something so simple as splitting a string (parsing a URI). On the
other hand, parsing URLs is a very common task which would make a
great addition. Why not make a "string" module, which contains all
sorts of string parsing? Like
[NSString](http://developer.apple.com/mac/library/documentation/Cocoa/Reference/Foundation/Classes/NSString_Class/Reference/NSString.html#jumpTo_5)
of the Cocoa/NextStep Objective-C library.

Another suggestion is to rename the module to "url" since it, at the
time of writing, specifically deals with URLs, not URIs. However this
suggestion seemed not to be very popular. A third suggestion would be
to rename the uri_* functions in uri.js to url_* since they deal with
URLs.

Also, I would consider to be inspired by the Python standard library
module "urlparse" http://docs.python.org/library/urlparse.html (having
a separate parse_qs function, etc).

--
Rasmus Andersson

Ryan Dahl

unread,
Dec 28, 2009, 4:36:34 PM12/28/09
to nod...@googlegroups.com
Yeah, I want to rename it all to "URL" and only focus on parsing URLs.
I've actually been meaning to do this for some time. Do anyone want to
volunteer a patch? It should have depreciation warnings for users
referring the old API.

Felix Geisendörfer

unread,
Dec 28, 2009, 5:10:08 PM12/28/09
to nodejs
> Do anyone want to
> volunteer a patch? It should have depreciation warnings for users
> referring the old API.

Yip, I'd volunteer the patch. But since v8 has not getters/setters, I
don't know how to show a deprecation message for req.uri, any ideas?

--fg

Chris Winberry

unread,
Dec 28, 2009, 7:00:48 PM12/28/09
to nod...@googlegroups.com
Maybe I misunderstand you but V8 supports getter and setter methods.

> --
>
> You received this message because you are subscribed to the Google
> Groups "nodejs" group.
> To post to this group, send email to nod...@googlegroups.com.
> To unsubscribe from this group, send email to nodejs+un...@googlegroups.com
> .
> For more options, visit this group at http://groups.google.com/group/nodejs?hl=en
> .
>
>

Felix Geisendörfer

unread,
Dec 28, 2009, 7:15:46 PM12/28/09
to nodejs
> Maybe I misunderstand you but V8 supports getter and setter methods.

Huh, I didn't know that - cool. I thought those were not part of
Ecma3, at least I can't find a reference for it. Do you have any more
details on their history?

--fg

Tautologistics

unread,
Dec 28, 2009, 8:24:08 PM12/28/09
to nodejs
I think [gs]etters are not specified until the 5th edition of the spec
but, none the less, they are in V8 even though I thought V8 was
staying strictly 3rd edition.

Try this in Node.js:
-------------------------------------
var sys = require("sys");

function Foo () {
this._fun = function (value) { return(value + 1); };
}
Foo.prototype.__defineGetter__("Bar", function () { return
(this._fun); });
Foo.prototype.__defineSetter__("Bar", function (value) { if ((typeof
value) == "function") this._fun = value; });

var foo = new Foo();
sys.debug(foo.Bar(10));
foo.Bar = function (value) { return(value - 1); }
sys.debug(foo.Bar(10));
-------------------------------------

...dunno if we can expect it to stay but there it is for now =)

Isaac Z. Schlueter

unread,
Dec 28, 2009, 8:41:08 PM12/28/09
to nodejs
v8 tends to follow webkit's JavaScriptCore pretty closely. When JSC
implements an ES5 feature, v8 tends to do so as well.

I think we can expect any implemented ES5 features to stay.

--i

Isaac Z. Schlueter

unread,
Dec 28, 2009, 8:56:30 PM12/28/09
to nodejs
Felix, I can do this, since I've monkeyed around a lot in that
module. Don't know if you saw the discussion in IRC, but I think the
consensus was that it's worthwhile to have a URL parsing module that
could be applied to the http's request.uri optionally, but have the
http module do almost nothing by default. Also, maybe the "url"
member should be something like "full" instead? That would remove
http's request.uri object, and replace it with just the URL requested,
and let the user chop it up with require("url") and/or require
("querystring") if they want.

How would you guys feel about something that mirrored the widely known
(albeit a bit weird) naming conventions of the browser's
window.location object? Ie: hash, host, hostname, href, pathname,
port, protocol, and search, with a toString member that returns the
href. (Maybe we could add an "auth" member for the user:pass@ bit.)
It seems like that could be done with a MUCH smaller regexp. It might
make thorough resolution a bit trickier in some edge cases, but the
advantage would be that JavaScript users are more likely to grok it
right off the bat.

--i

Bluebie

unread,
Dec 28, 2009, 9:26:16 PM12/28/09
to nodejs
I'm all for implementing the browser standard URL API. Though I would
like to see stuff like 'hash' also available aliased as 'fragment' for
instance, so that those of us who're more familiar with the way URI's
are generally described in spec, or server side languages, can use the
objects without needing to refer to the docs to find out the weird
name for things.

I really like the idea of providing the URL as a an enhanced string of
sorts, with all these bonus parsing features as lazy loading getters.
If we could have matching setters, and a clear way to construct a
URLish String, that would be absolutely fantastic! ^_^

I'd also like to see a .set method added, for bulk setting aspects of
the URL. For instance:
var searchPage = new URL();
searchPage.set({port: 80, path: '/search', fragment: 'searchBox'});

or more simply:

var searchPage = new URL({port: 80, path: '/search', fragment:
'searchBox'});

For the sake of completeness, it would also be lovely to have a URI
object provided, with a subset of the functionality, for use mainly
with things like tag:, aim:, skype:, data:, and the likes.

Ideally, URL would include logic such that when the port number is
equal to the protocol's default, the port would be omitted when
stringified. This would require a list of common protocols to be
available somewhere, which would be rather nifty to have, as then we
could do stuff like tcpserver.listen("irc", "localhost") :)

Another great convenience with Ruby's URI object is the ability to
merge two URI's together, like http://google.com/ + /search =
http://google.com/search. This should also work the other way when
stringifing, so you could say something like
searchPage.toStringRelativeTo(homePage); and just get '/search',
providing details like the hostname, port, protocol, and auth, were
identical in both, or where not identical, absent. This should handle
the '/../' de-facto unix standard in URLs correctly, and combining a
path like '/blah/foo' with 'bar' should result in '/blah/bar'.

Is there any technical reason we can't just tack this functionality
right on to an existing string object? Would the performance costs be
too high? I hope it would just be a matter of merging the contents of
URL.prototype in to a new blank object, and inheriting that object
from the original string, via Object.create or a similar mechanism.

I don't imagine there being many real world uses for a httpd where you
wouldn't want to access the various parts of the URL in a meaningful
way, so I don't feel the cost of extending the String to be too high,
given it's frequent value, and API niftiness (the quality of
containing a large amount of nift.)

Dean Landolt

unread,
Dec 28, 2009, 10:17:13 PM12/28/09
to nod...@googlegroups.com
On Mon, Dec 28, 2009 at 10:31 AM, Rasmus Andersson <ras...@notion.se> wrote:
In response to the "uri" module discussion on GitHub
<http://github.com/ry/node/commit/2f9722cca0a72122aa03763c085f6b5aa7f0ada2#comments>:

A URI is a generic and uniform way to express identifiers. That's it.
Parsing a URL is not within the scope of URI parsing, since a URL is a
derivation — not a subset — of a URI.

From RFC 2396 1.2:

1.2. URI, URL, and URN

A URI can be further classified as a locator, a name, or both. The
term "Uniform Resource Locator" (URL) refers to the subset of URI
that identify resources via a representation of their primary access
mechanism (e.g., their network "location"), rather than identifying
the resource by name or by some other attribute(s) of that resource.
The term "Uniform Resource Name" (URN) refers to the subset of URI
that are required to remain globally unique and persistent even when
the resource ceases to exist or becomes unavailable.

So URLs explicitly subset URIs. This comment isn't (completely) pedantic -- leaving it as uri does leave the door open to offering up some URN utilities as well. Of course, that crap could always go in a separate module.

Dean Landolt

unread,
Dec 28, 2009, 10:21:50 PM12/28/09
to nod...@googlegroups.com
On Mon, Dec 28, 2009 at 8:24 PM, Tautologistics <cpt.o...@gmail.com> wrote:
I think [gs]etters are not specified until the 5th edition of the spec
but, none the less, they are in V8 even though I thought V8 was
staying strictly 3rd edition.

Try this in Node.js:
-------------------------------------
var sys = require("sys");

function Foo () {
       this._fun = function (value) { return(value + 1); };
}
Foo.prototype.__defineGetter__("Bar", function () { return
(this._fun); });
Foo.prototype.__defineSetter__("Bar", function (value) { if ((typeof
value) == "function") this._fun = value; });

var foo = new Foo();
sys.debug(foo.Bar(10));
foo.Bar = function (value) { return(value - 1); }
sys.debug(foo.Bar(10));
-------------------------------------

...dunno if we can expect it to stay but there it is for now =)


As Isaac pointed out, getters and setters are there to stay, but who knows if they'll stay in __wunderbar__ form. It's possible to stub out Object.defineProperty and the other meta apis (narwhal does this) -- this would probably be a Good Thing for node to do explicitly -- and feature testing can prevent V8 from pulling the rug out from under getter/setter users (as would happen if they pulled __defineGetter__/__defineSetter).

Felix Geisendörfer

unread,
Dec 29, 2009, 5:14:38 AM12/29/09
to nodejs
> Felix, I can do this, since I've monkeyed around a lot in that
> module.

Go ahead, I'm just always looking for stuff to do : ).

--fg

inimino

unread,
Dec 29, 2009, 10:33:43 AM12/29/09
to nod...@googlegroups.com
Dean Landolt wrote:
> On Mon, Dec 28, 2009 at 10:31 AM, Rasmus Andersson <ras...@notion.se> wrote:
>> [...] since a URL is a
>> derivation � not a subset � of a URI.

>
> From RFC 2396 1.2:
>
> 1.2. URI, URL, and URN
>
> A URI can be further classified as a locator, a name, or both. The
> term "Uniform Resource Locator" (URL) refers to the subset of URI
> that identify resources via a representation of their primary access
> mechanism (e.g., their network "location"), rather than identifying
> the resource by name or by some other attribute(s) of that resource.
> The term "Uniform Resource Name" (URN) refers to the subset of URI
> that are required to remain globally unique and persistent even when
> the resource ceases to exist or becomes unavailable.
>
> So URLs explicitly subset URIs.

Yes, as far as I can tell the distinction between URLs and URIs is
irrelevant to this discussion; the distinction that is relevant is
between HTTP URIs, which define such things as query strings, and
all the other kinds of URIs, most of which do not.

Renaming the module to URL doesn't help at all if what it actually
supports is only HTTP URLs and URIs. Better call it the HTTP-URL
module in that case.

--
http://inimino.org/~inimino/blog/

Bluebie

unread,
Dec 29, 2009, 11:02:53 PM12/29/09
to nodejs
Why would it only work for HTTP URLs? The URL is a generic thing, a
superset of the URI. I think we should at least provide URIs and URLs,
and possibly at some point, URNs. It would be possible to then have a
superset of, say, URI which provides DataURI, with nifty accessors for
the data within. A HTTPURL might for instance provide bonus
functionality for decoding form encoded query strings. Prototypical
inheritance for the win?

I vote we call the module 'uri', and have it provide as it's exports,
URI, URL, and whatever other subsets are deemed relevant. Things which
make use of URL's of some sort should use the most specific form
available. I'd quite like to see DataURIs be included, as they're such
useful little doodads, but annoying to create and decode.

On Dec 30, 2:33 am, inimino <inim...@inimino.org> wrote:
> Dean Landolt wrote:
> > On Mon, Dec 28, 2009 at 10:31 AM, Rasmus Andersson <ras...@notion.se> wrote:
> >> [...] since a URL is a

> >> derivation not a subset of a URI.

inimino

unread,
Dec 30, 2009, 1:14:37 AM12/30/09
to nod...@googlegroups.com
Bluebie wrote:
> Why would it only work for HTTP URLs?

Because that is what most people using node are most likely to want,
and a fully general URI API isn't going to be as convenient, or as
likely to get written, tested, etc.

> The URL is a generic thing, a
> superset of the URI.

Subset.

> I vote we call the module 'uri', and have it provide as it's exports,
> URI, URL, and whatever other subsets are deemed relevant.

I'm happy to leave what goes in the module up to whoever ends up
writing it. A data: URI module would be great to have too.

--
http://inimino.org/~inimino/blog/

Steve Cook

unread,
Dec 30, 2009, 6:03:05 PM12/30/09
to nod...@googlegroups.com
I don't know if there's interest in this, but before the Narwhal URI
handler got ganked, I ported the parser from Ruby's Addressable into
JS. It doesn't rely on "//":

{
"source": "http://mt0.google.com/vt/lyrs=m@114&hl=en&src=api&x=2&y=2&z=3&s=",
"protocol": "http",
"path": "/vt/lyrs=m@114&hl=en&src=api&x=2&y=2&z=3&s=",
"relative": "/vt/lyrs=m@114&hl=en&src=api&x=2&y=2&z=3&s=",
"authority": "mt0.google.com",
"host": "mt0.google.com",
"file": "lyrs=m@114&hl=en&src=api&x=2&y=2&z=3&s=",
"directory": "/vt/"
}
{
"source": "mailto:i...@izs.me",
"protocol": "mailto",
"path": "i...@izs.me",
"relative": "i...@izs.me",
"file": "i...@izs.me"
}

Can anyone detail precisely what is and isn't wanted in a URI handler
for Node? I'd be happy to contribute what I've written if it'd be
helpful.

Reply all
Reply to author
Forward
0 new messages