Why doesn't URL.parse decode the URI?

4,994 views
Skip to first unread message

Matt

unread,
Apr 24, 2012, 4:25:59 PM4/24/12
to nod...@googlegroups.com
{ protocol: 'http:',
  slashes: true,
  host: 'host',
  hostname: 'host',
  search: '',
  query: {},
  pathname: '/path/some+thing%20with%20spaces+in+it/bar',
  path: '/path/some+thing%20with%20spaces+in+it/bar' }

I kind of hoped it would set the path to '/path/some thing with spaces in it/bar' which would be appropriate for passing to the filesystem. How come it doesn't decode the path?

Matt Patenaude

unread,
Apr 24, 2012, 4:51:23 PM4/24/12
to nod...@googlegroups.com
If Node automatically performed a URL decode on the pathname, you lose data — e.g., if your application depended on knowing whether someone encoded a space as a "+" or %20 (it seems silly, but let's say you can contrive an example where that matters), you'd have no way to tell from that information. URL parsing and decoding are conceptually separate steps, and the implementation in Node reflects that.

You should be able to get a correctly-decoded URL using something like:

decodeURIComponent(url.parse("http://host/path/some+thing%20with%20spaces+in+it/bar", true).path.replace(/\++/g, ' '))

Just make sure you always resolve the +'s first, because if someone percent-encodes a literal + sign, that would then be translated into a space if done in the wrong order.

Hope that helps!

-Matt


--
Job Board: http://jobs.nodejs.org/
Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
You received this message because you are subscribed to the Google
Groups "nodejs" group.
To post to this group, send email to nod...@googlegroups.com
To unsubscribe from this group, send email to
nodejs+un...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/nodejs?hl=en?hl=en

Matt Patenaude

unread,
Apr 24, 2012, 4:54:55 PM4/24/12
to nod...@googlegroups.com
Sorry, that replace statement collapses multiple plus signs into a single space, but that's probably never what you want, my mistake! Should be this:

decodeURIComponent(url.parse("http://host/path/some+thing%20with%20spaces+in+it/bar", true).path.replace(/\+/g, ' '))

Alternatively, there's this little trick, but I prefer the above personally!

decodeURIComponent(url.parse("http://host/path/some+thing%20with%20spaces+in+it/bar", true).path.split('+').join(' '))

-Matt

Matt

unread,
Apr 24, 2012, 5:08:00 PM4/24/12
to nod...@googlegroups.com
On Tue, Apr 24, 2012 at 4:51 PM, Matt Patenaude <ma...@mattpatenaude.com> wrote:
If Node automatically performed a URL decode on the pathname, you lose data — e.g., if your application depended on knowing whether someone encoded a space as a "+" or %20 (it seems silly, but let's say you can contrive an example where that matters), you'd have no way to tell from that information. URL parsing and decoding are conceptually separate steps, and the implementation in Node reflects that.

But the encoding would still be present in the "path" part. Just decoded in "pathname".

I get the "don't throw away data" part.

And I'm also not suggesting fixing this - because I'm sure it would break a bunch of things. Just curious why it was done that way?

FWIW I was testing this because Ruby's Sinatra treats %3F in the URL as the end of the path and start of the querystring. How ridiculous is that? I wanted to make sure Node (express) didn't have the same bug.

Matt.

Brandon Benvie

unread,
Apr 26, 2012, 10:06:39 AM4/26/12
to nod...@googlegroups.com
It's probably because almost half! of the global objects in JavaScript (mandated by spec) are functions dedicated to encoding and decoding that precise type of string, and the node developers didn't want to presuppose which of those you did or didn't use

escape
unescape
encodeURI
decodeURI
encodeURIComponent
decodeURIComponent

Isaac Schlueter

unread,
Apr 26, 2012, 11:23:55 AM4/26/12
to nod...@googlegroups.com
It wouldn't be out of the question to add a decoded version of the
path. It'd have to be a new member, though, and perform adequately
(maybe a flag to enable it or something.)

But I don't think it's really that pressing of an issue. The question
of whether you wish to use decodeURI or unescape is relevant, but I
think we can probably just never use unescape.

Matt

unread,
Apr 26, 2012, 12:57:14 PM4/26/12
to nod...@googlegroups.com
On Thu, Apr 26, 2012 at 11:23 AM, Isaac Schlueter <i...@izs.me> wrote:
It wouldn't be out of the question to add a decoded version of the
path.  It'd have to be a new member, though, and perform adequately
(maybe a flag to enable it or something.)

Of course. I wasn't really asking for it to be changed - hopefully any libraries are already coping with this already.
 
But I don't think it's really that pressing of an issue.

No the question was just whether there was logic behind it - I wanted to make sure I wasn't wildly off base :)

As long as express/connect don't interpret %3F as the start of the querystring like Sinatra does then we're all good. I'm fairly sure they won't given the following is parsed right:

{ protocol: 'http:',
  slashes: true,
  host: 'host',
  hostname: 'host',
  search: '?a=b',
  query: { a: 'b' },
  pathname: '/something%3F%20here',
  path: '/something%3F%20here?a=b' }


Matt.
Reply all
Reply to author
Forward
0 new messages