How to parse a utf-8 url?

200 views
Skip to first unread message

Thijs Koerselman

unread,
May 15, 2012, 11:58:14 AM5/15/12
to nod...@googlegroups.com
Hi,

I'm trying to parse a utf-8 url, but the url.parse function doesn't
seem to like spaces.

var url = require("url");
var file_url = "http://myhost.nl:9090/some/path/myweirdfile æfqúsß.mp3";
console.log("parsed:" , url.parse(file_url));

This gives the following output:

parse: { protocol: 'http:',
slashes: true,
host: 'myhost.nl:9090',
port: '9090',
hostname: 'myhost.nl',
href: 'http://myhost.nl:9090/some/path/myweirdfile',
pathname: '/some/path/myweirdfile',
path: '/some/path/myweirdfile'
}

Where exactly do I need to tackle this and how?

Thijs

Matt

unread,
May 15, 2012, 12:24:34 PM5/15/12
to nod...@googlegroups.com
Spaces are encoded as a + in URLs.


Thijs

--
Job Board: http://jobs.nodejs.org/
Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
You received this message because you are subscribed to the Google
Groups "nodejs" group.
To post to this group, send email to nod...@googlegroups.com
To unsubscribe from this group, send email to
nodejs+un...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/nodejs?hl=en?hl=en

Felipe Gasper

unread,
May 15, 2012, 12:48:25 PM5/15/12
to nod...@googlegroups.com
puny code?
>--
>Job Board: http://jobs.nodejs.org/
>Posting guidelines:
>https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
>You received this message because you are subscribed to the Google
>Groups "nodejs" group.
>To post to this group, send email to nod...@googlegroups.com
>To unsubscribe from this group, send email to
>nodejs+un...@googlegroups.com
>For more options, visit this group at
>http://groups.google.com/group/nodejs?hl=en?hl=en

--
Sent from my Android phone with K-9 Mail. Please excuse my brevity.

Isaac Schlueter

unread,
May 15, 2012, 2:08:10 PM5/15/12
to nod...@googlegroups.com
The URL parser doesn't parse spaces.

It would be good to maybe discuss this, as it's a common complaint.

Let's discuss it in this issue: https://github.com/joyent/node/issues/3270

Thijs Koerselman

unread,
May 15, 2012, 6:30:14 PM5/15/12
to nod...@googlegroups.com
Thanks Isaac. That discussion makes it clear. I'm all for. I will just
escape the spaces myself for now but it would be great if this is
handled in url.parse. I would like to be able to retrieve the filename
from the parsed data as it is, with spaces and all.

Cheers,
Thijs

Isaac Schlueter

unread,
May 15, 2012, 8:44:06 PM5/15/12
to nod...@googlegroups.com
Thijs,

Well, if we do this, you'll get it with the spaces converted to %20.
It'll be up to you to call decodeURIComponent on it.

Spaces are actually not allowed in URLs, after all.

Matt

unread,
May 15, 2012, 11:15:07 PM5/15/12
to nod...@googlegroups.com
Another argument for a parsed_path value. Though again we have to realise that comes with an overhead, which is worth considering.

Thijs Koerselman

unread,
May 16, 2012, 8:13:39 AM5/16/12
to nod...@googlegroups.com
Hi Isaac,

> Well, if we do this, you'll get it with the spaces converted to %20.
> It'll be up to you to call decodeURIComponent on it.
>
> Spaces are actually not allowed in URLs, after all.
>
Yep that's all fine with me. At least I don't have to replace the " "
with "%20" manually before using the url.

Isaac Schlueter

unread,
May 16, 2012, 7:36:32 PM5/16/12
to nod...@googlegroups.com
Landed on 9fc7283a403bb0dec096b76991226cba8e7b73c2 in master.
Reply all
Reply to author
Forward
0 new messages