[Proposal] Add methods to URI to aid in manual building & validating URIs

Mat Trudel

unread,

Feb 18, 2022, 12:29:21 PM2/18/22

to elixir-lang-core

When implementing an HTTP server, one of the most unspecified parts of handling a request is the building and canonicalization of the requested URI. The constituent parts of a request URI are spread out across multiple sources. For example, the hostname of a request can be any of (possibly multiple!) Host header(s), an authority pseudo-header in HTTP/2, a statically configured value for IP-based hosting, or even something derived from upstream X- headers. Assembling these parts into a canonical request URI is non-trivial.

The URI module as currently implemented does not provide supported ways to construct a URI from constituent parts (though that is changing [1] ). Nor does it provide methods to validate or meaningfully normalize an extant URI struct. Without these methods, HTTP servers need to resort to adhoc methods to build and canonicalize request URIs (see [2], [3]).

To help alleviate this, it is proposed to add the following changes to the URI module:

1. Explicitly allow for the building of URI structs directly in the module documentation (subject to warnings about the use of the authority field).

2. Add a normalize(%{})/2 function which will return a normalized version of an existing URI struct (this can plumb through to :uri_string.normalize/2 [4]).

3. Add an absolute?/1 function which returns whether or not the URI is absolute (that is, does it contain sufficient information to discretely represent a complete, unambiguous request)

Along with the existing new/1 and merge/2 functions, I believe that this should be sufficient to cleanly implement request URI construction within a web server such as Bandit. This will allow the web server to determine where to source the various components of a URI from, while deferring assembly, normalization and validation of those components to the URI module where it belongs.

Subject to debate and approval I'm happy to work this up.

m.

[1] https://twitter.com/josevalim/status/1494208355732275200

[2] https://github.com/mtrudel/bandit/blob/main/lib/bandit/http2/stream_task.ex#L101-L113

[3] https://github.com/ninenines/cowboy/blob/8795233c57f1f472781a22ffbf186ce38cc5b049/src/cowboy_http.erl#L490-L553

[4] https://www.erlang.org/doc/man/uri_string.html#normalize-2

José Valim

unread,

Feb 20, 2022, 6:02:38 AM2/20/22

to elixir-lang-core

Hi Mat, thanks for starting this discussion!

Quick question: don't you want to normalize the URI? I assume they already have to follow a strict format in the HTTP case that is ready to use as is. So doing any sort of normalization would be additional work. We could perform some minimal validation but, if so, what should it be?

--
You received this message because you are subscribed to the Google Groups "elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-co...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-core/8c4e9d5d-f83a-43dc-82e7-171730f19724n%40googlegroups.com.

Mat Trudel

unread,

Feb 21, 2022, 3:44:10 PM2/21/22

to elixir-l...@googlegroups.com

Jose,

You’re correct insofar as the various components in an HTTP request all come from well defined sources (with the possible exception of determining the hostname of a request, which is a bit tricky). What isn’t so obvious, however, is how these may be combined by bad actors to create undesired request URIs. There are a number of attack vectors which can exploit server URI parsing as a basis for further downstream exploits (see [1], [2], [3]).

My planned approach to manage this in Bandit is to build URIs is roughly as follows

1. Figure out the scheme used for the request - from the perspective of Bandit, this is either http or https depending on the underlying transport. Situations where this may be overridden by forwarding proxies including `X-` headers are explicitly outside the scope of Bandit; we’re only concerned about explicit HTTP semantics.

2. Determine the hostname & port used for the request (by consulting a specific list of sources in Host headers, authority pseudo headers, and other sources). Construct a URI from scheme, host & port & normalize it. Validate that the resulting path is “/“ and that the query string is empty.

3. Determine the path & query string from the request by analyzing the request line / path pseudo header. Construct a URI from this & normalize it. Validate that the resulting scheme, host & port are empty.

4. Merge these two URIs together resulting in one where all fields are known to come from specific sources as above.

In truth I suspect that the full answer here is no doubt a lot longer more nuanced than I’m able to appreciate. My (possibly naive) hope here is to be able to apply some well-defined heuristics to build & normalize a request as early as possible in the request lifecycle, so as to ensure that Plug users can rely on their request parameters at least being valid & sanitized at a protocol level.

In terms of specific validations, I would propose that each field be validated against the grammars defined in RFC 3986 [4]. Concerning normalization heuristics, a number are described in section 6 of the same RFC, though I can think of a few others which would likely be good to include. Specific normalization heuristics used should be called out in documentation.

The question of whether we would want to expose validation and normalization as discrete functions against a URI isn’t one I have a strong opinion on. My hunch here is that there is probably a wide variety of expectations here varying on use cases so it’s probably better to leave them separate.

m.

[1] https://samcurry.net/abusing-http-path-normalization-and-cache-poisoning-to-steal-rocket-league-accounts/

[2] https://i.blackhat.com/USA-19/Thursday/us-19-Birch-HostSplit-Exploitable-Antipatterns-In-Unicode-Normalization.pdf

[3] https://community.cloudflare.com/t/faq-url-normalization/259183

[4] https://datatracker.ietf.org/doc/html/rfc3986

You received this message because you are subscribed to a topic in the Google Groups "elixir-lang-core" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elixir-lang-core/hhFq9a1Xuuw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elixir-lang-co...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-core/CAGnRm4KcmuJNyOtc2DQ-LNuaMM1phMrpiHG7f2%3DP-3T2WrconQ%40mail.gmail.com.

José Valim

unread,

Feb 21, 2022, 3:54:47 PM2/21/22

to elixir-lang-core

I see, in your case then it sounds like you running your own custom validation is the best, because URI can't provide it out of the box. So it seems creating from the %URI{...} is the best option. We can document it is possible but not to set the deprecated authority field.

José Valim
https://dashbit.co/

To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-core/25C16A74-ADC7-4C84-AEF2-387B91EBF262%40geeky.net.

Mat Trudel

unread,

Feb 21, 2022, 5:08:10 PM2/21/22

to elixir-l...@googlegroups.com

José,

Very good. As you suggest, allowing for the manual creation of URI structs is the only *strictly* required thing on my wish list - everything else can be done externally.

I will build out the validation & normalization logic in a standalone library removed from Bandit, as I still do believe that the URI module is the correct place for this logic. Perhaps we can revisit this once I’ve had a chance to shake out the API structure & refined the various use cases.

I’ll cut a PR against elixir-lang/elixir to update URI's documentation as you suggest.

Thanks again!

m.

To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-core/CAGnRm4%2Bqvh%3DqyNMvBZ7bOfOCRVJV2rC5rYHFCVP-2G2xxaGUNQ%40mail.gmail.com.

José Valim

unread,

Feb 21, 2022, 5:33:28 PM2/21/22

to elixir-lang-core

Everything you said sounds good to me! Thank you!

To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-core/C1A59A3A-C143-435B-BEBA-DD5FAFD33BD5%40geeky.net.

Reply all

Reply to author

Forward