why url.Parse return err=nil with an apparently invalid URL

257 views
Skip to first unread message

Lin Lin

unread,
Feb 20, 2025, 7:07:13 AMFeb 20
to golang-nuts
Hi, gophers

I ran into a URL parsing issue and am a little confused about the url.Parse behavior.The doc says:

// Parse parses a raw url into a [URL] structure.
//
// The url may be relative (a path, without a host) or absolute
// (starting with a scheme). Trying to parse a hostname and path
// without a scheme is invalid but may not necessarily return an
// error, due to parsing ambiguities.

I.E. url.Parse can return a nil in some situations even with a malformed target. The following code confirms that.

package main

import "net/url"
import "fmt"

func main() {
                u := "http:/127.0.0.1/index.html"   // a wrong format URL, lacking of a /
                obj, err := url.Parse(u)                                                                                                                     fmt.Printf("obj: %#v, error: %v", obj, err)
}

I think that's a little conflict with Go's convention. If the error is nil, one can be sure any returned Object is good. In this case, how can the caller trust the result? Or we can improve the doc to explain a bit more. I believe most Go developers will not notice that pitfall before one step into it. In my experience, I ran into that by calling http.NewRequest with a bad URL, the url.Parse is hidden inside, which is even less likely to be noticed.

I've seen issues below, they were closed.


Maybe we can improve that.

Thanks for your time, best regards.
  Lin Lin

Jan Mercl

unread,
Feb 20, 2025, 7:15:11 AMFeb 20
to Lin Lin, golang-nuts
On Thu, Feb 20, 2025 at 1:06 PM Lin Lin <linsite...@gmail.com> wrote:
> I think that's a little conflict with Go's convention. If the error is nil, one can be sure any returned Object is good. In this case, how can the caller trust the result? Or we can improve the doc to explain a bit more. I believe most Go developers will not notice that pitfall before one step into it. In my experience, I ran into that by calling http.NewRequest with a bad URL, the url.Parse is hidden inside, which is even less likely to be noticed.

The result _is_ good.

jnml@e5-1650:~/tmp/url$ ls -la
total 8
drwxr-xr-x  2 jnml jnml 4096 Feb 20 13:09 .
drwxr-xr-x 22 jnml jnml 4096 Feb 20 13:08 ..
jnml@e5-1650:~/tmp/url$ mkdir -p http:/127.0.0.1          
jnml@e5-1650:~/tmp/url$ echo foo > http:/127.0.0.1/index.html
jnml@e5-1650:~/tmp/url$ ls -la http:/127.0.0.1/index.html
-rw-r--r-- 1 jnml jnml 4 Feb 20 13:10 http:/127.0.0.1/index.html
jnml@e5-1650:~/tmp/url$ cat http:/127.0.0.1/index.html
foo
jnml@e5-1650:~/tmp/url$

 

Dan Kortschak

unread,
Feb 21, 2025, 12:19:11 AMFeb 21
to golan...@googlegroups.com
As others have noted, the URL there is valid. If it's not valid for
your use, you can perform additional validation like so
https://go.dev/play/p/v3Wjq6jzuYK

Dan

robert engels

unread,
Feb 21, 2025, 12:32:00 AMFeb 21
to Dan Kortschak, golan...@googlegroups.com
I don’t think it is a valid url according to the rfc https://datatracker.ietf.org/doc/html/rfc3986#section-3.2

An http scheme url requires the //, see https://datatracker.ietf.org/doc/html/rfc2616#section-3.2.2

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/golang-nuts/e1f5dffff9984992799b4e01345d1f89d6783ff2.camel%40kortschak.io.

robert engels

unread,
Feb 21, 2025, 12:39:19 AMFeb 21
to Dan Kortschak, golan...@googlegroups.com
Also, see https://datatracker.ietf.org/doc/html/rfc2396#section-3 for more details on the scheme + authority.

It all depends on what the URL parse is supposed to return, and based on the return structure - since it has elements like ‘host’ - it is supposed to be decoding a valid http url - using a single slash is not valid.

Kurtis Rader

unread,
Feb 21, 2025, 12:48:14 AMFeb 21
to robert engels, Dan Kortschak, golan...@googlegroups.com
On Thu, Feb 20, 2025 at 9:39 PM robert engels <ren...@ix.netcom.com> wrote:
Also, see https://datatracker.ietf.org/doc/html/rfc2396#section-3 for more details on the scheme + authority.

It all depends on what the URL parse is supposed to return, and based on the return structure - since it has elements like ‘host’ - it is supposed to be decoding a valid http url - using a single slash is not valid.

I guess it depends on whether the old RFC aphorism "be conservative in what you send, be liberal in what you accept" is applicable in this situation. I tend to favor being strict regarding what is accepted as valid input. Especially given the mess that was created by early web browsers being arguably too liberal in their interpretation of malformed HTML. Nonetheless, I can see an argument for the current behavior.


--
Kurtis Rader
Caretaker of the exceptional canines Junior and Hank

Dan Kortschak

unread,
Feb 21, 2025, 1:05:17 AMFeb 21
to golan...@googlegroups.com
On Thu, 2025-02-20 at 23:38 -0600, robert engels wrote:
> Also, see https://datatracker.ietf.org/doc/html/rfc2396#section-3 for
> more details on the scheme + authority.


The BNF is (including only the parts that are necessary to show
validity):

absoluteURI = scheme ":" ( hier_part | opaque_part )
hier_part = ( net_path | abs_path ) [ "?" query ]
abs_path = "/" path_segments
path_segments = segment *( "/" segment )
segment = *pchar *( ";" param )
param = *pchar
pchar = unreserved | escaped |
":" | "@" | "&" | "=" | "+" | "$" | ","

The original URI (I miswrote URL for URI in my last post — the
url.Parse documentation notes "technically, a URI reference") was
"http:/127.0.0.1/index.html" which can be broken into:

scheme = http
abs_path = /127.0.0.1/index.html

The url.Parse function doesn't examine semantics, that's the caller's
responsibility. The semantics here are clearly wrong, but it's up to
the caller to check and ensure that the passed value satisfies the
expectations.


robert engels

unread,
Feb 21, 2025, 1:06:09 AMFeb 21
to Kurtis Rader, Dan Kortschak, golan...@googlegroups.com
I agree. When parsing you should be strict according to specifications - otherwise system security can be more easily compromised.

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.

robert engels

unread,
Feb 21, 2025, 1:18:52 AMFeb 21
to Dan Kortschak, golan...@googlegroups.com
I think I disagree. The bnf you are quoting form is from “For example, some URI schemes do not allow an <authority> component, and others do not use a <query> component.”

The http scheme requires the //.

And since the return struct from URL.parse() has fields like “host”, it implies it has semantic understanding, otherwise the only result possible

scheme, andtherest…

with no parsing of other.

The previous email I sent had a reference to the common format of many “scheme”s using the authority… but this is not required.
> --
> You received this message because you are subscribed to the Google Groups "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
> To view this discussion visit https://groups.google.com/d/msgid/golang-nuts/e5d3b0bf3f9c51c18b4170fc2b15bfae469cf57f.camel%40kortschak.io.

Axel Wagner

unread,
Feb 21, 2025, 1:51:47 AMFeb 21
to golan...@googlegroups.com
I disagree with this logic. I also use "net/url" to parse file:// scheme URLs, which also don't have a valid interpretation of a "host".

Axel Wagner

unread,
Feb 21, 2025, 1:57:52 AMFeb 21
to golan...@googlegroups.com
And FWIW, I'm not an expert, but was curious. Wikipedia certainly seems to think that "host" is a universal component of the URI syntax (in particular, it is a sub-component of the "authority" part): https://en.wikipedia.org/wiki/Uniform_Resource_Identifier#Syntax
It also says "A component is empty if it has no characters", which to me supports the behaviour of using an empty string, if there is no host in the URI to be parsed.

Of course, Wikipedia is no formal spec. But I assume the description is distilled from the respective RFCs and certainly intends to describe common understanding.

robert engels

unread,
Feb 21, 2025, 2:16:09 AMFeb 21
to Axel Wagner, golan...@googlegroups.com
The file scheme is not an http scheme. The rfc states the http scheme must have the // - so the parsing is should return an error in this case, UNLESS, you believe that the parsing does not understand the semantics of the scheme, and if that is the case, then having it return a ‘host’ is also incorrect - as there is nothing that states a ‘host’ is a required element of a basic uri, it is simply scheme:data…

So like a lot of the Go stdlib, there is a defacto “standard” it implements that is not based strictly on specifications - which is fine (most implementation of anything don’t do this) - but then it should be better documented as to what it is doing and what it expects. The fact that is in some cases it can parse and find a “host” means it has some understanding of http scheme and authorities.

It is simply that is it is underspecified in how the parsing is performed and what conditions cause errors to be returned.


robert engels

unread,
Feb 21, 2025, 2:18:01 AMFeb 21
to Axel Wagner, golan...@googlegroups.com
related, if there are test cases that allow the url in question to be parsed as valid, these test cases should be reflected in the documentation - developers shouldn’t have to read the code to understand the invariants of an api call.

Axel Wagner

unread,
Feb 21, 2025, 2:33:31 AMFeb 21
to robert engels, golan...@googlegroups.com
On Fri, 21 Feb 2025 at 08:15, robert engels <ren...@ix.netcom.com> wrote:
The file scheme is not an http scheme.

My point exactly. Your assertion that net/url is meant exclusively for HTTP URLs is wrong. ISTM that if the docs don't say it is restricted to HTTP URLs and if people don't use it restricted to HTTP URLs and we can't change it's behavior to return an error for non-HTTP URLs, then it's probably a duck.
 
The rfc states the http scheme must have the // - so the parsing is should return an error in this case, UNLESS, you believe that the parsing does not understand the semantics of the scheme, and if that is the case, then having it return a ‘host’ is also incorrect - as there is nothing that states a ‘host’ is a required element of a basic uri, it is simply scheme:data…

So like a lot of the Go stdlib, there is a defacto “standard” it implements that is not based strictly on specifications

But it is? It's just not the specification you claim it is.

Axel Wagner

unread,
Feb 21, 2025, 2:46:46 AMFeb 21
to golan...@googlegroups.com
FWIW the RFCs mentioned in the docs are 3986 and 2396 (both "Uniform Resource Identifier (URI): Generic Syntax"). Your argument about HTTP URLs references RFC 2616 (HTTP/1.1). I see no claims in the docs, that net/url conforms to RFC 2616. Your claim that it implicitly does is based on the presence of the Host field, but that's a syntactic component of RFC 3986: https://www.rfc-editor.org/rfc/rfc3986.html#section-3.2.2
RFC 3986 also gives the option not to parse the authority sub structure:

   Non-validating parsers (those that merely separate a URI reference into
   its major components) will often ignore the subcomponent structure of
   authority, treating it as an opaque string from the double-slash to
   the first terminating delimiter, until such time as the URI is
   dereferenced.
 
But it by no means *requires* a conforming parser to not parse it.

I really don't see where net/url violates RFC 3986 here. Or where the inference comes from, that it should conform th to RFC 2616.

FWIW I could see the argument that ParseRequestURI perhaps should conform to 2616. But I'm not deep enough into the subject to understand its intent and its difference to Parse.

Robert Engels

unread,
Feb 21, 2025, 2:47:59 AMFeb 21
to Axel Wagner, golan...@googlegroups.com
There is nothing that prevents the code from going changed so that if an http scheme url is passed without a // the code can return an error. It is an error. 

Honestly I don’t understand if you are a troll or an apologist - but neither is helpful. 

On Feb 21, 2025, at 1:32 AM, Axel Wagner <axel.wa...@googlemail.com> wrote:



Dan Kortschak

unread,
Feb 21, 2025, 2:49:06 AMFeb 21
to golan...@googlegroups.com
On Fri, 2025-02-21 at 00:17 -0600, robert engels wrote:
> I think I disagree. The bnf you are quoting form is from “For
> example, some URI schemes do not allow an <authority> component, and
> others do not use a <query> component.”

No. I part quoted the collected BNF.
https://datatracker.ietf.org/doc/html/rfc2396#appendix-A

> The http scheme requires the //.

I don't think you read what I wrote; yes, the http scheme does required
a host and so requires the "//", but, quoting myself, "The url.Parse
function doesn't examine semantics, that's the caller's
responsibility. The semantics here are clearly wrong, but it's up to
the caller to check and ensure that the passed value satisfies the
expectations."

I think it's important to look at the documentation for url.Parse and
url.URL. Particularly this text:

> A URL represents a parsed URL (technically, a URI reference).
>
> The general form represented is:
>
> [scheme:][//[userinfo@]host][/]path[?query][#fragment]
>
> URLs that do not start with a slash after the scheme are interpreted
> as:
>
> scheme:opaque[?query][#fragment]
>
> The Host field contains the host and port subcomponents of the URL.
> When the port is present, it is separated from the host with a colon.
> When the host is an IPv6 address, it must be enclosed in square
> brackets: "[fe80::1]:80". The net.JoinHostPort function combines a
> host and port into a string suitable for the Host field, adding
> square brackets to the host when necessary.
>
> Note that the Path field is stored in decoded form: /%47%6f%2f
> becomes /Go/. A consequence is that it is impossible to tell which
> slashes in the Path were slashes in the raw URL and which were %2f.
> This distinction is rarely important, but when it is, the code should
> use the URL.EscapedPath method, which preserves the original encoding
> of Path.
>
> The RawPath field is an optional field which is only set when the
> default encoding of Path is different from the escaped path. See the
> EscapedPath method for more details.
>
> URL's String method uses the EscapedPath method to obtain the path.

This is consistent with the RFC.

Axel Wagner

unread,
Feb 21, 2025, 2:58:07 AMFeb 21
to Robert Engels, golan...@googlegroups.com
Then in the interest of being helpful: I think it would be reasonable to add a new function (probably to the HTTP package) that does extra validation beyond what generic URLs require.

Robert Engels

unread,
Feb 21, 2025, 2:59:42 AMFeb 21
to Axel Wagner, golan...@googlegroups.com
And actually the Go docs state it is an error https://pkg.go.dev/net/url#Parse

 URLs that do not start with a slash after the scheme are interpreted as:..”

Which means the first clause is used which states:

The general form represented is:

[scheme:][//[userinfo@]host][/]path[?query][#fragment]
Which shows the // to be required.

Maybe next time stop being snotty like a spoiled child and put in some reading effort. 

On Feb 21, 2025, at 1:47 AM, Robert Engels <ren...@ix.netcom.com> wrote:



Robert Engels

unread,
Feb 21, 2025, 3:05:50 AMFeb 21
to Axel Wagner, golan...@googlegroups.com
I am going to apologize. I retract what I said. As the docs state the url would be able to be parsed. 

On Feb 21, 2025, at 1:58 AM, Robert Engels <ren...@ix.netcom.com> wrote:



Axel Wagner

unread,
Feb 21, 2025, 3:55:10 AMFeb 21
to Robert Engels, golan...@googlegroups.com
Thank you for apologizing.

It should go without saying (but I still want to point that out for future reference) that your tone was unacceptable regardless of whether or not what you or I said was correct.

Robert Engels

unread,
Feb 21, 2025, 8:22:05 AMFeb 21
to Axel Wagner, golan...@googlegroups.com
When, someone does a lot of work to help support others in terms of researching rfcs etc, and you respond with, “you’re wrong, because I do this” - you’re the one changing the tone. I’ll admit that I went over the top - mainly because I see you do this all the time to a lot of people and I was triggered by your flippancy and didn’t control my emotions. That’s on me, but the way you respond to people has a part in thus. 

On Feb 21, 2025, at 2:54 AM, Axel Wagner <axel.wa...@googlemail.com> wrote:



kortschak

unread,
Feb 21, 2025, 2:36:03 PMFeb 21
to golang-nuts
On Friday, 21 February 2025 at 18:29:42 UTC+10:30 Robert Engels wrote:
The general form represented is:
[scheme:][//[userinfo@]host][/]path[?query][#fragment]
Which shows the // to be required.

The [//[userinfo]host] part is optional; if the host is absent, the // is not required in that grammar. This reflects the BNF I quoted.
 
Maybe next time stop being snotty like a spoiled child and put in some reading effort.

Please, that is unnecessary.

Lin Lin

unread,
Feb 23, 2025, 7:35:46 PMFeb 23
to Dan Kortschak, golan...@googlegroups.com
Thanks a lot for the powerful and detailed explanation, Dan.  Though one surely needs to be careful when coding, I'm keeping my initial point to improve the document or any improvements like Robert suggested as below.

> developers shouldn’t have to read the code to understand the invariants of an api call.


Lin

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages