SDCHx RFC proposal

47 views
Skip to first unread message

Vasily Chekalkin

unread,
Jun 30, 2016, 12:55:48 AM6/30/16
to net-dev
Hello.

After recent (fsvo) discussion about state of SDCH and proper speccing it through IETF[1] we (as in Yandex net-dev team) decided to give it go.

But instead of just fixing small things in current SDCH spec [2] (which is properly copyrighted version of [3]) like "X-SDCH: 0" in spec vs "X-SDCH-Encoded: 0" in actual implementation we decided to improve it and fix some major pains. It also includes ideas from "SDCH2"[4] work.

There is an outline of proposal and any feedback will be greatly appreciated.

1. Dictionary format is changed to be just blob. This will simplify dynamic construction of dictionaries. Including using older version of resource to encode newer version.
2. All metadata about dictionaries/encoding/caching is passed via standard HTTP headers with a handful of new one. This will help to solve long standing problem with decoding content without available dictionary. Just because we can easily check it after fetching http headers from cache without fetching body.
3. (Rough idea but it's worth a shot) Incorporate RFC 3229 [5] approach to use different return code for encoded content instead of 200. This should mitigate some crazy proxies which can screw up responses.
4. Use CORS Access-Control-Allow-Origin [6] rules for matching dictionary scope to applicable URLs.
5. Do not use "X-SDCH-Encoded" header at all. Because of item 3 and our own experience of deploying SDCH on large cluster of different sites under single domain.


There are couple of examples of HTTP sessions who it can look:

Example 1: Simple precomputed Dictionary

Request 1:

GET /foo.html HTTP/1.1
Host: example.com
Accept-Encoding: gzip, deflate, br, sdchx
SDCHx-Features: <semicolon separated list of supported features>

This is request from client with cold caches and empty dictionary storage. "SDCHx-Features" header is for extensibility purpose. By default it will contains "encoding=vcdiff" to tell server about supported vcdiff encoding. In the future we can add more content-type specific codecs like courgette, etc.

Response 1:

HTTP/1.1 200 Ok
Set-Cookie: <sdch_tracking_id>
SDCHx-Get-Dictionary: https://cdn.example.com/common.dict
Access-Control-Allow-Origin: cdn.example.com

For precomputed dictionaries flow is mostly the same as in current SDCH implementation: server issue "SDCHx-Get-Dictionary" header with URL for static dictionary. Two other important headers are "Access-Control-Allow-Origin" to allow using dictionary from "cdn.example.com" and tracking cookie. Tracking cookie isn't something sdch specific. Just some unique identifier of the client. It's not relevant for static dictionary case but it can be used in next examples.

Request 2:

GET /common.dict HTTP/1.1
Host: cdn.example.com
Accept-Encoding: gzip, deflate, br, sdchx
SDCHx-Features: <semicolon separated list of supported features>
Cookie: <sdch_tracking_id=id>

Response 2:

HTTP/1.1 200 Ok
Content-Encoding: bg
Content-Length: <length>
SDCHx-Server-Id: <sha256 checksum of Dictionary content>
SDCHx-Related-Pattern: <pattern>
SDCHx-Algo: <algorithm>
Cache-Control: max-age=<max-age>

<dictionary blob>

Description of the headers:

SDCHx-Server-Id serves few purposes:
a) Identify.
b) Verify against MITM proxies which corrupts response. E.g. KIS.
c) Generate client_id which is lower 48 bits of server_id

SDCHx-Related-Pattern is used to determine set of URLs this dictionary is applicable to. Must be validated using CORS semantic for Access-Control-Allow-Origin from original response.

SDCHx-Algo: algorithm to be used with this dictionary. Default is vcdiff, but can be extended in future for content-type specific algos.

Cache-Control: max-age serves the purpose of Dictionary lifetime.

Dictionary is just a blob of data without any particular format.


Request 3:

GET /bar.html HTTP/1.1
Host: example.com
Accept-Encoding: gzip, deflate, br, sdchx
Cookie: <sdch_tracking_id=id>
SDCHx-Features: <semicolon separated list of supported features>
SDCHx-Avail-Dictionaries: <comma separated list of client dictionary ids>


Response 3:

HTTP/1.1 242 Delta encoded
Content-Encoding: sdchx, br
SDCHx-Used-Dictionary-Id: <server_id>
SDCHx-Algo: vcdiff

<vcdiff encoded response>

SDCHx-Used-Dictionary-Id and SDCHx-Alog used to decode response. It also can be used to decode cached content (or reject it if dictionary is no longer available).



Example 2: Dynamic Dictionary (SDCH2/RC3229 replacement)

Because dictionary is just opaque blob of data we can easily use response-as-dictionary approach.

Request 1:

GET /foo.html HTTP/1.1
Host: example.com
Accept-Encoding: gzip, deflate, br, sdchx
SDCHx-Features: <semicolon separated list of supported features>

Response 2:

HTTP/1.1 200 Ok
Content-Type: text/html
Content-Encoding: br
Content-Length: <length>
Set-Cookie: <sdch_tracking_id>
SDCHx-Server-Id: <sha256 checksum of Dictionary content>
SDCHx-Related-Pattern: <pattern>
SDCHx-Algo: <algorithm>
Cache-Control: max-age=<max-age>


In this case server tell client to use this response as dictionary by just providing same set of headers as for precomputed dictionary. SDCHx-Server-Id will require to process whole response body on server to calculate it which can be a memory heavy operation. But with HTTP/2 we can use trailing headers block and send SDCHx-Server-Id in it without delaying whole response.

Subsequent request/responses are exactly the same as in first example.



Semi-formal headers description:

Client headers

Accept-Encoding
Must contains sdchx to notify server about SDCHx support.

SDCHx-Features
Semicolon separated list of features client support.
encoding=<> -- comma separated list of encodings supported by client. By default list consist of single element vcdiff
In future version list features can be extended.


SDCHx-Avail-Dictionaries
Comma separated list of dictionary client ids.

Cookie
To properly support Dynamic Dictionaries server need some unique tracking of client. We can use standard cookie mechanism to do it. Name of the cookie is not specified.

Server headers

SDCHx-Server-Id
Server id is calculated as sha256 of Dictionary content. It also serves base for client id (lower 48 bits) and verification purpose.

SDCHx-Related-Pattern
host/path to which this Dictionary is applicable. When host part is empty dictionary is applicable to
all hosts from which SDCHx-Get-Dictionary response headers was issued
Original host in case of Dynamic Dictionary

SDCHx-Algo
Specify either
algorithm this Dictionary can be used with. Default is vcdiff.
Algorithm used to encode response. Default is vcdiff.

SDCHx-Used-Dictionary-Id
Dictionary server id used for encoding.

SDCHx-Remove-Dictionary
a) Comma separated list of Dictionaries which client should remove. Must be subset of SDCHx-Available-Dictionaries header.
b) or "*" to remove all dictionaries for origin domain.

Cache-Control
max-age=<seconds> determine lifetime of the Dictionary. After this client can re-validate it using standard HTTP caching semantics. Default is <N> seconds.
private - to avoid mis caching on intermediate proxies of delta encoded result server SHOULD set “private” in Cache-Control header for SDCHx encoded responses.
no-transform - to prevent MitM proxies from doing their stupid thingy.




[1] https://codereview.chromium.org/1876683002/
[2] https://docs.google.com/viewer?a=v&pid=forums&srcid=MDIwOTgxNDMwMTgyMjkzMTI2ODcBMDQ2MzU5NDU2MDA0MTg5NDE1MTkBTDZmaENoSG9BZ0FKATAuMQEBdjI&authuser=0
[3] http://lists.w3.org/Archives/Public/ietf-http-wg/2008JulSep/att-0441/Shared_Dictionary_Compression_over_HTTP.pdf
[4] https://docs.google.com/document/d/1UhN7-ZpDqR3vq9OcZ12LQsfhPaNmJNkCM7KRcPLy2VY/edit#
[5] https://tools.ietf.org/html/rfc3229
[6] https://www.w3.org/TR/cors/#access-control-allow-origin-response-header

-- 
Bacek

Matt Menke

unread,
Jun 30, 2016, 11:55:21 AM6/30/16
to Vasily Chekalkin, net-dev
I'm supportive of scrapping SDCH as-is, and fixing the failings of the spec, but I'm not responsible for any servers that currently use SDCH, or any servers at all, for that matter.

Some initial questions about the spec:

* If we get a 200 response (as opposed to 242) with "Content-Encoding: sdchx", what do we do?
* Should "privacy mode" requests (i.e. anonymous, non-CORS, cookiieless, whatever) also send CORS preflight requests?  I assume not, but that they should also send Cookieless dictionary requests?  They currently do send cookies with dictionary requests, which is probably a bug.
* Is this over SSL only?

Also, it's my understanding that the vcdiff library we're using is currently unsupported.  Have you thought at all about what to do about this?  Not trying to put you on a spot here, it's just an issue that always comes up in our offline discussions of the status of SDCH.


-- 
Bacek

--
You received this message because you are subscribed to the Google Groups "net-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to net-dev+u...@chromium.org.
To post to this group, send email to net...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/net-dev/99471467262546%40webcorp01e.yandex-team.ru.

Ben Maurer

unread,
Jun 30, 2016, 1:37:49 PM6/30/16
to net-dev
Hey,

Really excited about this work. I think being able to use a response as a dictionary opens up a lot of powerful options.

Something I see being a bit tricky here is using related-pattern. A big use case for us at FB is diffing versions of static resources. For example, we might have a resource like https://www.facebook.com/rsrc.php/v2imYJ4/yR/l/en_US/T-z4odqlBJu.js. The next week that URL might change completely when we update some files. It may be tricky for sites with a scheme like this to ensure that their path structure leads to patterns that advertise a reasonable number of dictionaries. It also might be valuable to allow a server to use two dictionaries in a response (say you have a large dictionary that changes infrequently and a smaller more frequently updating dictionary). 

One thing I wonder is if it would be worth trying to implement SDCHx using a service worker. There's a lot of complexity to this concept -- cross origin access, pattern matching, allowing a website to remotely delete things from the cache. The main advantage of a native implementation seems like the native implementation of VCDIFF, but vcdiff decoding is fairly efficient. Even if it can't be done in raw JS specing out a VCDIFF api seems much more contained than specing out SDCH. This would allow sophisticated sites like Yandex, Google and Facebook to experiment with the right API for SDCH. Eventually the client/server interface could be standardized (advanced users might always roll their own). 

-b

Ryan Sleevi

unread,
Jun 30, 2016, 3:39:27 PM6/30/16
to Ben Maurer, net-dev
On Thu, Jun 30, 2016 at 10:37 AM, Ben Maurer <ben.m...@gmail.com> wrote:
One thing I wonder is if it would be worth trying to implement SDCHx using a service worker. There's a lot of complexity to this concept -- cross origin access, pattern matching, allowing a website to remotely delete things from the cache. The main advantage of a native implementation seems like the native implementation of VCDIFF, but vcdiff decoding is fairly efficient. Even if it can't be done in raw JS specing out a VCDIFF api seems much more contained than specing out SDCH. This would allow sophisticated sites like Yandex, Google and Facebook to experiment with the right API for SDCH. Eventually the client/server interface could be standardized (advanced users might always roll their own). 

I'm super-supportive of this design approach (using a service worker), even if it's just to iterate on the SDCH spec capabilities. Further, I think the work for foreign-fetch here might yield more opportunities for experimentation.

In my mind, this is what SW is perfect for - to explore the platform, blaze new paths, and once the cowpaths are settled, to work through with the extensible web to explore standardizing them.

Vasily Chekalkin

unread,
Jun 30, 2016, 5:50:20 PM6/30/16
to Matt Menke, net-dev
Hello.
 
01.07.2016, 01:55, "Matt Menke" <mme...@chromium.org>:
I'm supportive of scrapping SDCH as-is, and fixing the failings of the spec, but I'm not responsible for any servers that currently use SDCH, or any servers at all, for that matter.
 
Some initial questions about the spec:
 
* If we get a 200 response (as opposed to 242) with "Content-Encoding: sdchx", what do we do?
 
I think it should be treated as proxy error with blacklisting SDCH for session. Similar to current "x-sdch-encoded: 0" handling.
 
* Should "privacy mode" requests (i.e. anonymous, non-CORS, cookiieless, whatever) also send CORS preflight requests?  I assume not, but that they should also send Cookieless dictionary requests?  They currently do send cookies with dictionary requests, which is probably a bug.
 
I'm open for discussion and I lean towards cookieless dictionary requests. OTOH
* we shouldn't have Get-Dictionary headers without corresponding Access-Control-Allow-Origin header from original response.
* I think we should restrict sdchx to "Safe methods". In our case it's GET only, because HEAD is irrelevant.
 
* Is this over SSL only?
 
 
Not necessary. Unfortunately "secure channel" won't prevent some crazy MitM proxies from breaking stuff. We had problems with KIS. Some large organisations install company-wide proxy with decoding all traffic, etc.
 
Also, it's my understanding that the vcdiff library we're using is currently unsupported.  Have you thought at all about what to do about this?  Not trying to put you on a spot here, it's just an issue that always comes up in our offline discussions of the status of SDCH.
 
I don't think it is a big problem. VCDiff spec is 14 years old and open-vcdiff does support it pretty well.
-- 
Bacek
 

Ben Maurer

unread,
Jun 30, 2016, 6:03:34 PM6/30/16
to net-dev, mme...@chromium.org


On Thursday, June 30, 2016 at 2:50:20 PM UTC-7, Vasily Chekalkin wrote:
Hello.
 
01.07.2016, 01:55, "Matt Menke" <mme...@chromium.org>: 
* Is this over SSL only?
Not necessary. Unfortunately "secure channel" won't prevent some crazy MitM proxies from breaking stuff. We had problems with KIS. Some large organisations install company-wide proxy with decoding all traffic, etc.

FWIW we recently did a full deployment of Brotli for Facebook via SSL only. The only MitM proxy that has come to our attention has been Kaspersky. They were able to hotfix their code in under one week when it broke. Given that we're doing brotli deployments with no provisions for defending against MitM I think we could just take the same approach for SDCH.

-b

Matt Menke

unread,
Jun 30, 2016, 6:07:54 PM6/30/16
to Ben Maurer, net-dev
I'm not sure "no issues / via SSL only" generalizes to "no issues over non-SSL".  The interwebs are a dark and scary place.  I'm most concerned about the non-SSL case.  If you're breaking SSL traffic with unusual content encodings, you / your company generally controls both the broken proxy and the client, which is very different from the HTTP case.

Matt Menke

unread,
Jun 30, 2016, 6:09:35 PM6/30/16
to Vasily Chekalkin, net-dev
On Thu, Jun 30, 2016 at 5:50 PM, Vasily Chekalkin <ba...@yandex-team.ru> wrote:
Hello.
 
01.07.2016, 01:55, "Matt Menke" <mme...@chromium.org>:
I'm supportive of scrapping SDCH as-is, and fixing the failings of the spec, but I'm not responsible for any servers that currently use SDCH, or any servers at all, for that matter.
 
Some initial questions about the spec:
 
* If we get a 200 response (as opposed to 242) with "Content-Encoding: sdchx", what do we do?
 
I think it should be treated as proxy error with blacklisting SDCH for session. Similar to current "x-sdch-encoded: 0" handling.

And for the request with the weird response itself?  Just display a network error page, and let the user reload?  The meta-reload hack is really weird.  We could do something more sane, like re-send the original request, but I'm not sure we want to do that, without user consent. 

Ben Maurer

unread,
Jun 30, 2016, 6:22:25 PM6/30/16
to Matt Menke, net-dev
Right, I was advocating for not even trying to do specs like this over non-HTTPS (like brotli did) but not trying to recover from MitM proxies that mess with content. As you mention in the MitM case you / your company controls the proxy and has some say over its actions. 

Vasily Chekalkin

unread,
Jun 30, 2016, 6:40:56 PM6/30/16
to Ben Maurer, net-dev
 
 
01.07.2016, 03:37, "Ben Maurer" <ben.m...@gmail.com>:
Hey,
 
Really excited about this work. I think being able to use a response as a dictionary opens up a lot of powerful options.
 
 
Glad to hear it. Look like we are on right track
 
Something I see being a bit tricky here is using related-pattern. A big use case for us at FB is diffing versions of static resources. For example, we might have a resource like https://www.facebook.com/rsrc.php/v2imYJ4/yR/l/en_US/T-z4odqlBJu.js. The next week that URL might change completely when we update some files. It may be tricky for sites with a scheme like this to ensure that their path structure leads to patterns that advertise a reasonable number of dictionaries. It also might be valuable to allow a
 
I can't offer any good solution for that type of url structure apart from using "www.facebook.com/" as related-pattern and select proper dictionary on server. We have similar problem at Yandex for some extent.
 
 
server to use two dictionaries in a response (say you have a large dictionary that changes infrequently and a smaller more frequently updating dictionary). 
 
 
Interestingly we had discussions about "combined dictionaries" internally. I'm slightly oppose of this idea. Mainly because it can complicate spec, client code, server side, etc. It also can increase memory usage on both clients and server which can be a road block of rolling it out. However! We do have extensibility built-in in protocol and use of combined dictionaries can look like this:
 
GET /some/semi/changing/resource HTTP/1.1
Accept-Encoding: br, sdchx
SDCHx-Features: encoding=vcdiff; combined-dictionaries;
SDCHx-Avail-Dictionary: <id1>, <id2>, <id3>
 
 
HTTP 242 Delta Encoded
Content-Encoding: sdchx, br
SDCHx-Algo: vcdiff
SDCHx-Used-Dictionary-Id: <id1>
SDCHx-Used-Dictionary-Id: <id3>
 
 
In this exchange client notifies server about "combined dictionary" support. Server detects it and chooses to combine dictionaries <id1> and <id3> for encoding. Client build combined dictionary based on "SDCHx-Used-Dictionary-Id" list and uses it for decoding.
 
This will allow deploy this feature on subset of clients. E.g. with experiment-per-platform. Or client can decide to use it based on workload (and available resources e.g. memory).
 
One thing I wonder is if it would be worth trying to implement SDCHx using a service worker. There's a lot of complexity to this concept -- cross origin access, pattern matching, allowing a website to remotely delete things from the cache. The main advantage of a native implementation seems like the native implementation of VCDIFF, but vcdiff decoding is fairly efficient. Even if it can't be done in raw JS specing out a VCDIFF api seems much more contained than specing out SDCH. This would allow sophisticated sites like Yandex, Google and Facebook to experiment with the right API for SDCH. Eventually the client/server interface could be standardized (advanced users might always roll their own). 
 
-b
 
 
Unfortunately I know almost nothing about Service Workers capabilities. But if it's possible to implement sdchx using Service Workers I'm all in. Quick search found https://github.com/plotnikoff/vcdiff.js vcdiff encoding/decoding library in pure javascript. It's not complete (e.g. there is no Adler checksum) but looks like a good start.
 
 
-- 
Bacek
 

Vasily Chekalkin

unread,
Jun 30, 2016, 6:43:09 PM6/30/16
to Matt Menke, Ben Maurer, net-dev
No objections. Let's put explicit note that sdchx should be applicable to secure connections only.
 
01.07.2016, 08:07, "Matt Menke" <mme...@chromium.org>:
--

You received this message because you are subscribed to the Google Groups "net-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to net-dev+u...@chromium.org.
To post to this group, send email to net...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/net-dev/CAEK7mvoXvTM%2B5gPn8SvnmO8iDRrTLDDgg6eUdoNMbW96wRkb%3DQ%40mail.gmail.com.
 
 
-- 
Bacek
 

Ben Maurer

unread,
Jun 30, 2016, 6:49:06 PM6/30/16
to Vasily Chekalkin, net-dev
On Thu, Jun 30, 2016 at 3:40 PM, Vasily Chekalkin <ba...@yandex-team.ru> wrote:
Something I see being a bit tricky here is using related-pattern. A big use case for us at FB is diffing versions of static resources. For example, we might have a resource like https://www.facebook.com/rsrc.php/v2imYJ4/yR/l/en_US/T-z4odqlBJu.js. The next week that URL might change completely when we update some files. It may be tricky for sites with a scheme like this to ensure that their path structure leads to patterns that advertise a reasonable number of dictionaries. It also might be valuable to allow a
I can't offer any good solution for that type of url structure apart from using "www.facebook.com/" as related-pattern and select proper dictionary on server. We have similar problem at Yandex for some extent.

FWIW this is why I think SW is a strength -- it lets the site implement any algorithm it chooses. For example, maybe we send a X-SDCHx-Tag with every JS/CSS file and then have a dictionary that says "file X is likely to match things with tags A or B".
server to use two dictionaries in a response (say you have a large dictionary that changes infrequently and a smaller more frequently updating dictionary). 
 
 
Interestingly we had discussions about "combined dictionaries" internally. I'm slightly oppose of this idea. Mainly because it can complicate spec, client code, server side, etc. It also can increase memory usage on both clients and server which can be a road block of rolling it out. However! We do have extensibility built-in in protocol and use of combined dictionaries can look like this:
 
GET /some/semi/changing/resource HTTP/1.1
Accept-Encoding: br, sdchx
SDCHx-Features: encoding=vcdiff; combined-dictionaries;
SDCHx-Avail-Dictionary: <id1>, <id2>, <id3>
 
 
HTTP 242 Delta Encoded
Content-Encoding: sdchx, br
SDCHx-Algo: vcdiff
SDCHx-Used-Dictionary-Id: <id1>
SDCHx-Used-Dictionary-Id: <id3>
 
 
In this exchange client notifies server about "combined dictionary" support. Server detects it and chooses to combine dictionaries <id1> and <id3> for encoding. Client build combined dictionary based on "SDCHx-Used-Dictionary-Id" list and uses it for decoding.
 
This will allow deploy this feature on subset of clients. E.g. with experiment-per-platform. Or client can decide to use it based on workload (and available resources e.g. memory).

We'd probably use this for the use case of JS/CSS (for example if we decided to combine two packages together we might want to do a diff from both of them). 
One thing I wonder is if it would be worth trying to implement SDCHx using a service worker. There's a lot of complexity to this concept -- cross origin access, pattern matching, allowing a website to remotely delete things from the cache. The main advantage of a native implementation seems like the native implementation of VCDIFF, but vcdiff decoding is fairly efficient. Even if it can't be done in raw JS specing out a VCDIFF api seems much more contained than specing out SDCH. This would allow sophisticated sites like Yandex, Google and Facebook to experiment with the right API for SDCH. Eventually the client/server interface could be standardized (advanced users might always roll their own). 
 
-b
 
 
Unfortunately I know almost nothing about Service Workers capabilities. But if it's possible to implement sdchx using Service Workers I'm all in. Quick search found https://github.com/plotnikoff/vcdiff.js vcdiff encoding/decoding library in pure javascript. It's not complete (e.g. there is no Adler checksum) but looks like a good start.

In theory it should be totally possible to implement sdch in SW. It gives you enough control over the cache, etc to implement it. 

Vasily Chekalkin

unread,
Jun 30, 2016, 6:51:45 PM6/30/16
to Matt Menke, net-dev
 
 
01.07.2016, 08:09, "Matt Menke" <mme...@chromium.org>:
On Thu, Jun 30, 2016 at 5:50 PM, Vasily Chekalkin <ba...@yandex-team.ru> wrote:
Hello.
 
01.07.2016, 01:55, "Matt Menke" <mme...@chromium.org>:
I'm supportive of scrapping SDCH as-is, and fixing the failings of the spec, but I'm not responsible for any servers that currently use SDCH, or any servers at all, for that matter.
 
Some initial questions about the spec:
 
* If we get a 200 response (as opposed to 242) with "Content-Encoding: sdchx", what do we do?
 
I think it should be treated as proxy error with blacklisting SDCH for session. Similar to current "x-sdch-encoded: 0" handling.
 
And for the request with the weird response itself?  Just display a network error page, and let the user reload?  The meta-reload hack is really weird.  We could do something more sane, like re-send the original request, but I'm not sure we want to do that, without user consent. 
 
Why not? We can just restart request without sdchx.
 
1. We detect breakage early enough to do it. We need only (cached) response headers and list of currently available dictionaries. Comparing to current sdch when we need response body to detect it.
2. sdchx is applicable to GET only. Which should be safe to restart.
3. With http2 we can send RST_STREAM as soon as we detected breakage.
 
-- 
Bacek
 

Vasily Chekalkin

unread,
Jun 30, 2016, 7:13:46 PM6/30/16
to Ben Maurer, net-dev
 
 
01.07.2016, 08:49, "Ben Maurer" <ben.m...@gmail.com>:
 
 
On Thu, Jun 30, 2016 at 3:40 PM, Vasily Chekalkin <ba...@yandex-team.ru> wrote:
Something I see being a bit tricky here is using related-pattern. A big use case for us at FB is diffing versions of static resources. For example, we might have a resource like https://www.facebook.com/rsrc.php/v2imYJ4/yR/l/en_US/T-z4odqlBJu.js. The next week that URL might change completely when we update some files. It may be tricky for sites with a scheme like this to ensure that their path structure leads to patterns that advertise a reasonable number of dictionaries. It also might be valuable to allow a
I can't offer any good solution for that type of url structure apart from using "www.facebook.com/" as related-pattern and select proper dictionary on server. We have similar problem at Yandex for some extent.
 
FWIW this is why I think SW is a strength -- it lets the site implement any algorithm it chooses. For example, maybe we send a X-SDCHx-Tag with every JS/CSS file and then have a dictionary that says "file X is likely to match things with tags A or B".
 
Erm. Can you explain more? I don't really understand how SDCHx-Tag can apply.
 
server to use two dictionaries in a response (say you have a large dictionary that changes infrequently and a smaller more frequently updating dictionary). 
 
 
Interestingly we had discussions about "combined dictionaries" internally. I'm slightly oppose of this idea. Mainly because it can complicate spec, client code, server side, etc. It also can increase memory usage on both clients and server which can be a road block of rolling it out. However! We do have extensibility built-in in protocol and use of combined dictionaries can look like this:
 
GET /some/semi/changing/resource HTTP/1.1
Accept-Encoding: br, sdchx
SDCHx-Features: encoding=vcdiff; combined-dictionaries;
SDCHx-Avail-Dictionary: <id1>, <id2>, <id3>
 
 
HTTP 242 Delta Encoded
Content-Encoding: sdchx, br
SDCHx-Algo: vcdiff
SDCHx-Used-Dictionary-Id: <id1>
SDCHx-Used-Dictionary-Id: <id3>
 
 
In this exchange client notifies server about "combined dictionary" support. Server detects it and chooses to combine dictionaries <id1> and <id3> for encoding. Client build combined dictionary based on "SDCHx-Used-Dictionary-Id" list and uses it for decoding.
 
This will allow deploy this feature on subset of clients. E.g. with experiment-per-platform. Or client can decide to use it based on workload (and available resources e.g. memory).
 
We'd probably use this for the use case of JS/CSS (for example if we decided to combine two packages together we might want to do a diff from both of them). 
 
 
This should work also.
 
One thing I wonder is if it would be worth trying to implement SDCHx using a service worker. There's a lot of complexity to this concept -- cross origin access, pattern matching, allowing a website to remotely delete things from the cache. The main advantage of a native implementation seems like the native implementation of VCDIFF, but vcdiff decoding is fairly efficient. Even if it can't be done in raw JS specing out a VCDIFF api seems much more contained than specing out SDCH. This would allow sophisticated sites like Yandex, Google and Facebook to experiment with the right API for SDCH. Eventually the client/server interface could be standardized (advanced users might always roll their own). 
 
-b
 
 
Unfortunately I know almost nothing about Service Workers capabilities. But if it's possible to implement sdchx using Service Workers I'm all in. Quick search found https://github.com/plotnikoff/vcdiff.js vcdiff encoding/decoding library in pure javascript. It's not complete (e.g. there is no Adler checksum) but looks like a good start.
 
In theory it should be totally possible to implement sdch in SW. It gives you enough control over the cache, etc to implement it. 
 
 
Good. I'm going to implement server side "prototype" of sdchx in course of next couple of weeks. Most likely it will be based on https://github.com/yandex/sdch_module.
 
-- 
Bacek
 

Ben Maurer

unread,
Jun 30, 2016, 7:31:04 PM6/30/16
to Vasily Chekalkin, net-dev
 
FWIW this is why I think SW is a strength -- it lets the site implement any algorithm it chooses. For example, maybe we send a X-SDCHx-Tag with every JS/CSS file and then have a dictionary that says "file X is likely to match things with tags A or B".
 
Erm. Can you explain more? I don't really understand how SDCHx-Tag can apply.

Imagine something like this:

GET /rsrc/asdfasdf.js

HTTP 200 OK
SDCHx-Server-Id: abc
SDCHx-Tag: tag1

On another document:

<script src="/rsrc/lkjlkjlkj.js" sdchx-tag="tag1, tag2" />

This could signal "/rsrc/lkjlkjlkj.js is likely a good match for any file with tag1" signaling to send abc as an avail-dict. This way the server is free to help match request with relevant dictionaries without having to change the path of the JS. But doing this requires some way to get data from the DOM to the networking stack.  Right now the only way to do that is via changing the url (and busting the cache). For example, if we allowed the specification of fetch parameters on script tags, etc this could be a way to signal the desired tags.

-b


Vasily Chekalkin

unread,
Jun 30, 2016, 8:14:03 PM6/30/16
to Ben Maurer, net-dev
 
 
01.07.2016, 09:31, "Ben Maurer" <ben.m...@gmail.com>:
I really like this idea. We can get rid of "Related Pattern" and just use CORS + tags to select dictionaries. More over, without Related Pattern we can have something like this:
 
1. cdn.example.com/js.dict with SDCHx-Tag "js".
2. cdn.example.com/css.dict with tag "css"
5. baz.example.com/baz/baz.css with tag "css"
 
In this case browser will choose js.dict for requests 3 and 4, css.dict for 5 and "combined dictionary" (if supported) or just list of 2 dictionaries for 6.
 
-- 
Bacek
 

Ben Maurer

unread,
Jun 30, 2016, 8:35:11 PM6/30/16
to Vasily Chekalkin, net-dev
Related pattern might still be needed for things like HTML where you can't easily specify a tag. But tags provide more flexibility when the path might not be helpful in figuring out the exact content match.

Vasily Chekalkin

unread,
Jun 30, 2016, 10:12:11 PM6/30/16
to Ben Maurer, net-dev
 
 
01.07.2016, 10:41, "Ben Maurer" <ben.m...@gmail.com>:
Related pattern might still be needed for things like HTML where you can't easily specify a tag. But tags provide more flexibility when the path might not be helpful in figuring out the exact content match.
 
Yes, but with sdchx approach we give control of "dictionary applicability" to "content owners", not "dictionary creators". For example if we have cdn.example.com/html.dict which was used by foo.example.com, and later bar.example.com decided to use same dictionary all what will need to happen is emit SDCHx-Get-Dictionary header from bar.example.com. To avoid extensive use by unrelated parties cdn.example.com can check Referer header if needed (or any other "hot linking" blocking technics).
 
Anyway, from my point of "related path" is mostly useful to reduce size of GET request from the browser and shouldn't affect how overall protocol works.
 
-- 
Bacek
 

Randy Smith

unread,
Jul 7, 2016, 2:42:53 PM7/7/16
to Vasily Chekalkin, net-dev, SD...@googlegroups.com
I'm excited about this work, and would be happy to see a new spec developed remedying some of the flaws of the current spec.  There's a fair amount in here that I don't feel competent to comment on (like RFC 3229; that's a very old spec that I haven't heard of being implemented, which makes me worried it's got problems--what's its current status?), but a couple of points/questions:

* This should be shared with the Google group for SDCH spec discussions (cc'd :-}).  

* I'm no expert on CORS, but it looks to me as if there's an ambiguity in how Access-Control-Allow-Origin is being used (in Response 1) as to whether it applies to the response body (unencoded and being returned) or the dictionary listed in the Get-Dictionary header.  I'd naively expect it to apply to the response body; it seems weird to hijack it for the dictionary header.  Maybe it should be in the Response 2 headers?

* From an implementation POV I think Chromium would be uncomfortable relying on an unsupported open source library (yes, we're currently doing it, but it's a part of why there's been less than full excitement coming from this direction about SDCH).  If a security bug comes up, there's no guarantee that it'll be noticed or patched quickly.  So it's an implementation issue, but the lack of current support for openvcdiff is a problem in that space.

* If we're seriously thinking of using arbitrary resources as dictionaries, I think we should work out how a couple of examples would most naturally be implemented and make sure we're not specing ourselves into an implementation corner.  I'm specifically concerned about a) encoding a URL response with the contents of the previous URL response, and b) a proliferation of data structures with memory costs in the client (or server :-}) because of a semi-arbitrary use of resources as dictionaries.  Both client and server have full control over what they advertise, so it wouldn't be a problem building spec-compliant clients & servers, but it might be a disappointment if a server implemented this in order to support some particular usage model, and clients avoided it like the plague because of implementation complexity.

Having said all that, thanks very much for taking this on!

-- Randy


On Thu, Jun 30, 2016 at 12:55 AM, Vasily Chekalkin <ba...@yandex-team.ru> wrote:

-- 
Bacek

--
You received this message because you are subscribed to the Google Groups "net-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to net-dev+u...@chromium.org.
To post to this group, send email to net...@chromium.org.

Vasily Chekalkin

unread,
Jul 7, 2016, 6:29:41 PM7/7/16
to Randy Smith, net-dev, sd...@googlegroups.com
Hello.
 
08.07.2016, 04:42, "Randy Smith" <rds...@chromium.org>:
I'm excited about this work, and would be happy to see a new spec developed remedying some of the flaws of the current spec.  There's a fair amount in here that I don't feel competent to comment on (like RFC 3229; that's a very old spec that I haven't heard of being implemented, which makes me worried it's got problems--what's its current status?), but a couple of points/questions:
 
 
I never saw RFC 3229 implemented in real life. Probably because it's way too complex for the small task of updating single resource.
 
* This should be shared with the Google group for SDCH spec discussions (cc'd :-}).  
 
* I'm no expert on CORS, but it looks to me as if there's an ambiguity in how Access-Control-Allow-Origin is being used (in Response 1) as to whether it applies to the response body (unencoded and being returned) or the dictionary listed in the Get-Dictionary header.  I'd naively expect it to apply to the response body; it seems weird to hijack it for the dictionary header.  Maybe it should be in the Response 2 headers?
 
 
I'm not proud of using ACAO header for this either. But my current approach that original resource owner decides which dictionary to use instead of SDCH's "Host" Dictionary header. Another approach can be something like this - If response on foo.example.com emits Get-Dictionary header we mark this dictionary as applicable for foo.example.com only. No wildcards, no domain matches, nothing. If bar.example.com emits same Get-Dictionary we mark same Dictionary as applicable for bar.example.com. With this approach we can share Dictionary for multiple hosts without stepping on other hosts in same domain.
 
* From an implementation POV I think Chromium would be uncomfortable relying on an unsupported open source library (yes, we're currently doing it, but it's a part of why there's been less than full excitement coming from this direction about SDCH).  If a security bug comes up, there's no guarantee that it'll be noticed or patched quickly.  So it's an implementation issue, but the lack of current support for openvcdiff is a problem in that space.
 
 
Yes, this can be a problem. But this is applicable for other opensource libraries. E.g. zlib was last updated in 2013. And previous update was in 2012. OTOH there is xdelta3 library which implements vcdiff encoding. And chromium can possibly switch to using it in case of major problems with openvcdiff.
 
* If we're seriously thinking of using arbitrary resources as dictionaries, I think we should work out how a couple of examples would most naturally be implemented and make sure we're not specing ourselves into an implementation corner.  I'm specifically concerned about a) encoding a URL response with the contents of the previous URL response, and b) a proliferation of data structures with memory costs in the client (or server :-}) because of a semi-arbitrary use of resources as dictionaries.  Both client and server have full control over what they advertise, so it wouldn't be a problem building spec-compliant clients & servers, but it might be a disappointment if a server implemented this in order to support some particular usage model, and clients avoided it like the plague because of implementation complexity.
 
Yes, agreed. And this is the reason why I'm working of sample server and client implementations to test those scenarios. I hope I'll have it in next couple of weeks.
-- 
Bacek
 

John Lenz

unread,
Jul 8, 2016, 11:29:03 AM7/8/16
to SD...@googlegroups.com, Vasily Chekalkin, net-dev
* In past discussions, the primary thing that I wanted to avoid was a distinct dictionary download, that any resource could be declared as a "possible future dictionary".   The current SDCH spec made this impossible.  

* I'd be happier if dictionaries were like other resources and a "Remove Dictionary" did not exist.

* The original implementation of SDCH in Chrome is very specialized to its original use case (hard limit on dictionary sizes, dictionaries were loaded early into memory and pinned there, etc).  I wasn't clear on what if any changes would be reasonably be accepted and if there was any future in SDCH at all.

--
You received this message because you are subscribed to the Google Groups "SDCH" group.
To unsubscribe from this group and stop receiving emails from it, send an email to SDCH+uns...@googlegroups.com.
To post to this group, send email to SD...@googlegroups.com.
Visit this group at https://groups.google.com/group/SDCH.
For more options, visit https://groups.google.com/d/optout.

Vasily Chekalkin

unread,
Jul 8, 2016, 5:43:23 PM7/8/16
to John Lenz, sd...@googlegroups.com, net-dev
Hello.
 
09.07.2016, 01:29, "John Lenz" <conca...@gmail.com>:
* In past discussions, the primary thing that I wanted to avoid was a distinct dictionary download, that any resource could be declared as a "possible future dictionary".   The current SDCH spec made this impossible.  
 
Yes, I incorporated it into SDCHx in "Example 2: Dynamic dictionaries". OTOH, we should be able to pre-compute static dictionary with some generic content which can be used for multiple resources. And such static dictionary won't be referenced by original resource. And this where Get-Dictionary comes into play. Usage scenario can be this:
 
1. Site has multiple small JS files (which is way to go in HTTP/2 world).
2. Some of those JS change more often than other.
3. Site owner can compile single js.dict with all common parts.
 
 
* I'd be happier if dictionaries were like other resources and a "Remove Dictionary" did not exist.
 
Remove Dictionary is optimization strategy which can be marked as optional. Main reason for existence of it is "Dinamic Dictionaries". If server for some reason lost information of previously defined Dynamic Dictionary (crash, reboot, memory pressure, whatever) there is no reason for Client to keep that Dictionary. So Server will supply hint to remove it.
 
* The original implementation of SDCH in Chrome is very specialized to its original use case (hard limit on dictionary sizes, dictionaries were loaded early into memory and pinned there, etc).  I wasn't clear on what if any changes would be reasonably be accepted and if there was any future in SDCH at all.
 
 
SDCH Dictionary implementation is also highly coupled with HTTP cache in chromium. But in the future we can have separate Dictionary storage with on-demand Dictionary loading, etc. And at least with SDCHx it will be possible to fix "meta-refresh hack" because we can detect absence of dictionaries early.
-- 
Bacek
 

Matt Menke

unread,
Jul 11, 2016, 1:51:16 PM7/11/16
to Vasily Chekalkin, John Lenz, sd...@googlegroups.com, net-dev
One issue I haven't seen raised:  CORS is a web platform feature (And so implemented way above net/, and only defined relative to HTML/DOM objects, and their relationships to each other), which SDCH is an HTTP-layer feature (And so implemented mostly within net/).  It seems strange to bring duplicated CORS-like behavior down into the network stack, which has no notion of same-site and cross-site requests.

Ryan Sleevi

unread,
Jul 11, 2016, 2:01:42 PM7/11/16
to Matt Menke, Vasily Chekalkin, John Lenz, sd...@googlegroups.com, net-dev
On Mon, Jul 11, 2016 at 10:51 AM, Matt Menke <mme...@chromium.org> wrote:
One issue I haven't seen raised:  CORS is a web platform feature (And so implemented way above net/, and only defined relative to HTML/DOM objects, and their relationships to each other), which SDCH is an HTTP-layer feature (And so implemented mostly within net/).  It seems strange to bring duplicated CORS-like behavior down into the network stack, which has no notion of same-site and cross-site requests.

I'm not sure the position you're taking here, Matt.

Are you saying that SDCHx should not be explained in terms of Fetch semantics (because Fetch is above //net)? Or that we shouldn't implement a duplicate pseudo-Fetch scheme?

I wholeheartedly agree that we should not duplicate a pseudo-Fetch scheme, if that is the position. Further, if we cannot explain SDCHx in terms of Fetch, then it creates a system where a Service Worker necessarily could not polyfill any SDCH semantics, because a Service Worker will be confined to Fetch semantics, and they're necessarily less-capable than "raw" requests - on the basis of security.

I think there's also a general zeitgeist that we should not be introducing new Web request mechanisms that cannot be explained, in some way, via Fetch, due to the implications in reasoning about the security model and interaction model with the rest of the Web platform.

This highlights a tension between "Is SDCHx a network-level feature (like HTTP) or a web platform feature (like, say, network error logging)". I'm wholly sympathetic to the argument that there are and will be non-browser implementations of SDCHx, and so Fetch is not relevant to them. However, I think it's a fairly hard MUST that any browser-based implementation of SDCHx can be expressed or implemented in terms of Fetch, and so if the spec goes a non-Fetch route, then it MUST have sufficient hooks to be implemented via Fetch. 

Matt Menke

unread,
Jul 11, 2016, 2:39:23 PM7/11/16
to Ryan Sleevi, Vasily Chekalkin, John Lenz, sd...@googlegroups.com, net-dev
On Mon, Jul 11, 2016 at 2:01 PM, Ryan Sleevi <rsl...@chromium.org> wrote:

On Mon, Jul 11, 2016 at 10:51 AM, Matt Menke <mme...@chromium.org> wrote:
One issue I haven't seen raised:  CORS is a web platform feature (And so implemented way above net/, and only defined relative to HTML/DOM objects, and their relationships to each other), which SDCH is an HTTP-layer feature (And so implemented mostly within net/).  It seems strange to bring duplicated CORS-like behavior down into the network stack, which has no notion of same-site and cross-site requests.

I'm not sure the position you're taking here, Matt.

Are you saying that SDCHx should not be explained in terms of Fetch semantics (because Fetch is above //net)? Or that we shouldn't implement a duplicate pseudo-Fetch scheme?

I wholeheartedly agree that we should not duplicate a pseudo-Fetch scheme, if that is the position. Further, if we cannot explain SDCHx in terms of Fetch, then it creates a system where a Service Worker necessarily could not polyfill any SDCH semantics, because a Service Worker will be confined to Fetch semantics, and they're necessarily less-capable than "raw" requests - on the basis of security.

My concern is that Content-Encoding/decoding is part of HTTP semantics, and now we're trying to bring non-HTTP behavior to it.  What does this mean in the case of Cronet, for instance, which just issues HTTP requests divorced from any HTML/Javascript/DOM?  In some cases, hosting web platform behavior, or subsets of it, up into HTTP can make some sense (Like Referrer-Policy, for redirects), but in others, like this one, it's more confusing - what is a CORS request, what isn't?  net/ doesn't even have any concept of it.  It does have some rather disturbing logic about magically using another socket pool when it sets PRIVACY_MODE_ENABLED to true, but it's a mistake to think this has any relationship to CORS - requests that send cookies but do not set them, for instance, enables this bit.  Building on top of something that's already broken just leads to more problems down the line.

I think there's also a general zeitgeist that we should not be introducing new Web request mechanisms that cannot be explained, in some way, via Fetch, due to the implications in reasoning about the security model and interaction model with the rest of the Web platform.

It can't be explained via Fetch, for the simple reason that what the browser does in the case of "Content-Encoding: sdch, gzip" is not well defined, in the case the browser supports gzip, but not sdch (Or at least behavior is not defined to the extent of my knowledge. Skimming over the Fetch spec, I'm having trouble finding anything about decoding of the body).  That would also lead to quite a few problems with trying to implement this via Service Worker.

This highlights a tension between "Is SDCHx a network-level feature (like HTTP) or a web platform feature (like, say, network error logging)". I'm wholly sympathetic to the argument that there are and will be non-browser implementations of SDCHx, and so Fetch is not relevant to them. However, I think it's a fairly hard MUST that any browser-based implementation of SDCHx can be expressed or implemented in terms of Fetch, and so if the spec goes a non-Fetch route, then it MUST have sufficient hooks to be implemented via Fetch. 

--
You received this message because you are subscribed to the Google Groups "net-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to net-dev+u...@chromium.org.
To post to this group, send email to net...@chromium.org.

--
You received this message because you are subscribed to the Google Groups "net-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to net-dev+u...@chromium.org.
To post to this group, send email to net...@chromium.org.

Ryan Sleevi

unread,
Jul 11, 2016, 3:57:04 PM7/11/16
to Matt Menke, Ryan Sleevi, Vasily Chekalkin, John Lenz, sd...@googlegroups.com, net-dev
On Mon, Jul 11, 2016 at 11:39 AM, Matt Menke <mme...@chromium.org> wrote:

My concern is that Content-Encoding/decoding is part of HTTP semantics, and now we're trying to bring non-HTTP behavior to it.

I don't think I'd agree that we're binding non-HTTP behaviour.

I see there being two parts to SDCHx - the processing of when a dictionary is available, and obtaining a dictionary when advertised. Those two are separable parts.

For example, it's entirely possible to implement SDCH today w/o fetching the dictionaries, if an application (such as Cronet) were to ship "prebuilt" dictionaries. A browser would, presumably, not do this, but there's nothing preventing non-browser clients from optimizing like this, particularly in situations where the dictionary may not change often. It's indistinguishable from "preloading" the HTTP cache, if that helps to think of it.

I was simply suggesting that the SDCHx processing model (that is, things like the actual decoding) be separate from the "Go get new dictionaries". The "Go get new dictionaries" necessarily MUST be compatible with the Web Platform model - I cannot see us shipping a new feature in a browser that doesn't abide by the basics of request security - we've seen too many problems from that (as in the case of plugins, for example).

  What does this mean in the case of Cronet, for instance, which just issues HTTP requests divorced from any HTML/Javascript/DOM? 

I thought I already addressed this in my reply.
 
In some cases, hosting web platform behavior, or subsets of it, up into HTTP can make some sense (Like Referrer-Policy, for redirects), but in others, like this one, it's more confusing - what is a CORS request, what isn't?  net/ doesn't even have any concept of it.  It does have some rather disturbing logic about magically using another socket pool when it sets PRIVACY_MODE_ENABLED to true, but it's a mistake to think this has any relationship to CORS - requests that send cookies but do not set them, for instance, enables this bit.  Building on top of something that's already broken just leads to more problems down the line.

I'm not sure what you're considering the thing "that's already broken", but I feel this is a bit of a non-sequitor. Perhaps I've misunderstood your concern.
 
I think there's also a general zeitgeist that we should not be introducing new Web request mechanisms that cannot be explained, in some way, via Fetch, due to the implications in reasoning about the security model and interaction model with the rest of the Web platform.

It can't be explained via Fetch, for the simple reason that what the browser does in the case of "Content-Encoding: sdch, gzip" is not well defined, in the case the browser supports gzip, but not sdch (Or at least behavior is not defined to the extent of my knowledge. Skimming over the Fetch spec, I'm having trouble finding anything about decoding of the body).  That would also lead to quite a few problems with trying to implement this via Service Worker.

I would disagree that it *cannot* be defined.

If we accept the browser supports gzip, but not SDCH, then it wouldn't have sent it in Accept-Encoding, and so such a situation would be as meaningless as "Content-Encoding: sleevi-rocks, gzip"
There's no question that a Service Worker (or Fetch client) can't modify Accept-Encoding to *add* SDCH - Accept-Encoding is a forbidden header ( https://fetch.spec.whatwg.org/#forbidden-header-name ) for precisely this reason)

I think you've confused what I meant by "Explained in terms of Fetch". I'm specifically speaking about any sub-resource fetches (specifically: dictionary fetches) that occur, being implemented in terms of Fetch, much like rel="preload" is a hint to load a resource, and ensuring that any scope of dictionary use abide by the SOP.

Matt Menke

unread,
Jul 11, 2016, 4:30:30 PM7/11/16
to Ryan Sleevi, Vasily Chekalkin, John Lenz, sd...@googlegroups.com, net-dev
On Mon, Jul 11, 2016 at 3:56 PM, Ryan Sleevi <rsl...@chromium.org> wrote:


On Mon, Jul 11, 2016 at 11:39 AM, Matt Menke <mme...@chromium.org> wrote:

My concern is that Content-Encoding/decoding is part of HTTP semantics, and now we're trying to bring non-HTTP behavior to it.

I don't think I'd agree that we're binding non-HTTP behaviour.

I see there being two parts to SDCHx - the processing of when a dictionary is available, and obtaining a dictionary when advertised. Those two are separable parts.

For example, it's entirely possible to implement SDCH today w/o fetching the dictionaries, if an application (such as Cronet) were to ship "prebuilt" dictionaries. A browser would, presumably, not do this, but there's nothing preventing non-browser clients from optimizing like this, particularly in situations where the dictionary may not change often. It's indistinguishable from "preloading" the HTTP cache, if that helps to think of it.

I was simply suggesting that the SDCHx processing model (that is, things like the actual decoding) be separate from the "Go get new dictionaries". The "Go get new dictionaries" necessarily MUST be compatible with the Web Platform model - I cannot see us shipping a new feature in a browser that doesn't abide by the basics of request security - we've seen too many problems from that (as in the case of plugins, for example).

It's worth noting that cross-origin dictionaries aren't allowed (For some definition of origin that may or may not currently match the normal definition - if it doesn't, it should be fixed), so to download a dictionary, we must have downloaded a resource from that site (Going through all CORS logic) that advertised it.  Before we use the dictionary, another request must be issued...  Which also must go through CORS, before we can actually use the dictionary (Since CORS is applied before the request).

Currently, only the dictionary itself doesn't use CORS, but only requests that pass CORS logic can download/use it.  It seems like the closest analog to the way one site could try to use another's dictionary would be a cross-site iframe referencing a same-site resource (i.e. https://foo.com has an https://bar.com iframe, and that iframe uses another resource on https://foo.com).
 
  What does this mean in the case of Cronet, for instance, which just issues HTTP requests divorced from any HTML/Javascript/DOM? 

I thought I already addressed this in my reply.

You touched on the issue's existence, but didn't seem to actually address it.
 
In some cases, hosting web platform behavior, or subsets of it, up into HTTP can make some sense (Like Referrer-Policy, for redirects), but in others, like this one, it's more confusing - what is a CORS request, what isn't?  net/ doesn't even have any concept of it.  It does have some rather disturbing logic about magically using another socket pool when it sets PRIVACY_MODE_ENABLED to true, but it's a mistake to think this has any relationship to CORS - requests that send cookies but do not set them, for instance, enables this bit.  Building on top of something that's already broken just leads to more problems down the line.

I'm not sure what you're considering the thing "that's already broken", but I feel this is a bit of a non-sequitor. Perhaps I've misunderstood your concern.

It's perhaps more of an implementation concern, but I don't believe we should be implementing anything that depends on whether a request is CORS or not in the browser process unless/until we actually have a concept of CORS there that is consistent with the concept in the renderer process.  PRIVACY_MODE in no way fulfills this requirement, since it's based on the union of several flags being set, instead of their intersection.
  
I think there's also a general zeitgeist that we should not be introducing new Web request mechanisms that cannot be explained, in some way, via Fetch, due to the implications in reasoning about the security model and interaction model with the rest of the Web platform.

It can't be explained via Fetch, for the simple reason that what the browser does in the case of "Content-Encoding: sdch, gzip" is not well defined, in the case the browser supports gzip, but not sdch (Or at least behavior is not defined to the extent of my knowledge. Skimming over the Fetch spec, I'm having trouble finding anything about decoding of the body).  That would also lead to quite a few problems with trying to implement this via Service Worker.

I would disagree that it *cannot* be defined.

If we accept the browser supports gzip, but not SDCH, then it wouldn't have sent it in Accept-Encoding, and so such a situation would be as meaningless as "Content-Encoding: sleevi-rocks, gzip"
There's no question that a Service Worker (or Fetch client) can't modify Accept-Encoding to *add* SDCH - Accept-Encoding is a forbidden header ( https://fetch.spec.whatwg.org/#forbidden-header-name ) for precisely this reason)

I think you've confused what I meant by "Explained in terms of Fetch". I'm specifically speaking about any sub-resource fetches (specifically: dictionary fetches) that occur, being implemented in terms of Fetch, much like rel="preload" is a hint to load a resource, and ensuring that any scope of dictionary use abide by the SOP.

I think I'm still confused by what you meant by "explained in terms of Fetch" (And "SDCHx can be expressed or implemented in terms of Fetch" - I think we're agreed implementation isn't possible, per Fetch spec).

Also, does your "SOP" include sending dictionary requests through ServiceWorker / AppCache?

Ryan Sleevi

unread,
Jul 11, 2016, 5:42:38 PM7/11/16
to Matt Menke, Ryan Sleevi, Vasily Chekalkin, John Lenz, sd...@googlegroups.com, net-dev
On Mon, Jul 11, 2016 at 1:30 PM, Matt Menke <mme...@chromium.org> wrote:

It's worth noting that cross-origin dictionaries aren't allowed (For some definition of origin that may or may not currently match the normal definition - if it doesn't, it should be fixed),

To be explicit: It doesn't. It uses cookie's model of 'origin', which is bad, and should feel bad. Further, there was the suggestion earlier on this thread of expanding that definition, something I consider a MUST NOT.
 
Currently, only the dictionary itself doesn't use CORS, but only requests that pass CORS logic can download/use it. 

That's not true, because of the above.
 
You touched on the issue's existence, but didn't seem to actually address it.

Apologies. To be explicit, I'm suggesting that if a hypothetical SDCHx spec wanted to permit Fetch to be integrated, then the method of accomodating that is to be extremely careful about prescribing the means of obtaining the dictionary resource, such that, as implemented in Web Browsers, a Web Browser could apply a Fetch-level filter.

Put into different terminology, I"m suggesting it mirrors the way in which we've been trying to break the existing SDCH circular dependencies. The //net layer would report that a dictionary has been advertised, it would filter up to the higher layer to make a policy decision about whether to allow that dictionary to be fetched, and then go down.

If dictionaries were truly SOP (same scheme, same host, same port) - that is, no cookies nonsense, no document.domain nonsense, a true SOP - then AIUI, the need for CORS is obviated because it's by definition no longer cross-origin.

But even if the need for *CORS* is obviated, I think it's still unwise/unacceptable to have the dictionary *load* go through a separate networking stack than other subresource requests. This is what I mean by Fetch integration. An advertised SDCH dictionary should not be fetched any differently than a rel="preload" / rel="prefetch" / img src="" resource - utilizing the same behaviours, flags, controls, and interaction models.
 
It's perhaps more of an implementation concern, but I don't believe we should be implementing anything that depends on whether a request is CORS or not in the browser process unless/until we actually have a concept of CORS there that is consistent with the concept in the renderer process.  PRIVACY_MODE in no way fulfills this requirement, since it's based on the union of several flags being set, instead of their intersection.

Apologies if it was not clearer, but "explaining in terms of Fetch" is more than just "use CORS".
 
I think I'm still confused by what you meant by "explained in terms of Fetch" (And "SDCHx can be expressed or implemented in terms of Fetch" - I think we're agreed implementation isn't possible, per Fetch spec). 

Also, does your "SOP" include sending dictionary requests through ServiceWorker / AppCache?

Yes and no. No, because that's not SOP (scheme/host/port). Yes, because that's Fetch - interacting with the same networking stack. 

Fetch is, roughly, the ResourceLoader layer... ish.

But I also want to be clear, since it sounds like you're in the "ServiceWorker isn't good enough for SDCH" position - the suggestion of ServiceWorker/Fetch earlier, from Ben and echo'd by myself, does not require munging about with Content-Encoding/Accept-Encoding to be able to polyfill. It's about using a Service Worker to intercept the Fetch requests and apply the necessary transforms. Such a model intrinsically limits to same-origin, due to the opaque-dataness of cross-origin requests, but that's a Good Thing(tm), because SDCH shouldn't be cross-origin.

That said, we're probably at a length in the thread where there's enough miscommunication that it may be worth restarting with first principles, so we don't get too lost into the weeds and can figure out where the fundamental miscommunications are.

Matt Menke

unread,
Jul 11, 2016, 5:49:05 PM7/11/16
to Ryan Sleevi, Vasily Chekalkin, John Lenz, sd...@googlegroups.com, net-dev
On Mon, Jul 11, 2016 at 5:41 PM, Ryan Sleevi <rsl...@chromium.org> wrote:


On Mon, Jul 11, 2016 at 1:30 PM, Matt Menke <mme...@chromium.org> wrote:

It's worth noting that cross-origin dictionaries aren't allowed (For some definition of origin that may or may not currently match the normal definition - if it doesn't, it should be fixed),

To be explicit: It doesn't. It uses cookie's model of 'origin', which is bad, and should feel bad. Further, there was the suggestion earlier on this thread of expanding that definition, something I consider a MUST NOT.
 
Currently, only the dictionary itself doesn't use CORS, but only requests that pass CORS logic can download/use it. 

That's not true, because of the above.
 
You touched on the issue's existence, but didn't seem to actually address it.

Apologies. To be explicit, I'm suggesting that if a hypothetical SDCHx spec wanted to permit Fetch to be integrated, then the method of accomodating that is to be extremely careful about prescribing the means of obtaining the dictionary resource, such that, as implemented in Web Browsers, a Web Browser could apply a Fetch-level filter.

Put into different terminology, I"m suggesting it mirrors the way in which we've been trying to break the existing SDCH circular dependencies. The //net layer would report that a dictionary has been advertised, it would filter up to the higher layer to make a policy decision about whether to allow that dictionary to be fetched, and then go down.

If dictionaries were truly SOP (same scheme, same host, same port) - that is, no cookies nonsense, no document.domain nonsense, a true SOP - then AIUI, the need for CORS is obviated because it's by definition no longer cross-origin.

But even if the need for *CORS* is obviated, I think it's still unwise/unacceptable to have the dictionary *load* go through a separate networking stack than other subresource requests. This is what I mean by Fetch integration. An advertised SDCH dictionary should not be fetched any differently than a rel="preload" / rel="prefetch" / img src="" resource - utilizing the same behaviours, flags, controls, and interaction models.
 
It's perhaps more of an implementation concern, but I don't believe we should be implementing anything that depends on whether a request is CORS or not in the browser process unless/until we actually have a concept of CORS there that is consistent with the concept in the renderer process.  PRIVACY_MODE in no way fulfills this requirement, since it's based on the union of several flags being set, instead of their intersection.

Apologies if it was not clearer, but "explaining in terms of Fetch" is more than just "use CORS".
 
I think I'm still confused by what you meant by "explained in terms of Fetch" (And "SDCHx can be expressed or implemented in terms of Fetch" - I think we're agreed implementation isn't possible, per Fetch spec). 

Also, does your "SOP" include sending dictionary requests through ServiceWorker / AppCache?

Yes and no. No, because that's not SOP (scheme/host/port). Yes, because that's Fetch - interacting with the same networking stack. 

Fetch is, roughly, the ResourceLoader layer... ish.

You doubtless mean third_party/WebKit/Source/core/fetch/ResourceLoader.cpp, which I'm completely unfamiliar with, and not content/browser/loader/resource_loader.cc, which I own (Duplicate class names is very unfortunate here, just pointing out the difference for those following along at home).

But I also want to be clear, since it sounds like you're in the "ServiceWorker isn't good enough for SDCH" position - the suggestion of ServiceWorker/Fetch earlier, from Ben and echo'd by myself, does not require munging about with Content-Encoding/Accept-Encoding to be able to polyfill. It's about using a Service Worker to intercept the Fetch requests and apply the necessary transforms. Such a model intrinsically limits to same-origin, due to the opaque-dataness of cross-origin requests, but that's a Good Thing(tm), because SDCH shouldn't be cross-origin.

That said, we're probably at a length in the thread where there's enough miscommunication that it may be worth restarting with first principles, so we don't get too lost into the weeds and can figure out where the fundamental miscommunications are.

+1 to this.

Vasily Chekalkin

unread,
Jul 12, 2016, 5:04:42 AM7/12/16
to Matt Menke, John Lenz, sd...@googlegroups.com, net-dev
 
 
12.07.2016, 03:51, "Matt Menke" <mme...@chromium.org>:
One issue I haven't seen raised:  CORS is a web platform feature (And so implemented way above net/, and only defined relative to HTML/DOM objects, and their relationships to each other), which SDCH is an HTTP-layer feature (And so implemented mostly within net/).  It seems strange to bring duplicated CORS-like behavior down into the network stack, which has no notion of same-site and cross-site requests.
 
Erm. Not necessary. Let's replace "SDCHx-Get-Dictionary" header with RFC5988 "Link" header. Something like
 
Link: <https://cdn.example.com/js.dict>; rel="sdchx-dictionary"
 
In this case processing will be exactly the same as with any other related resources. //net won't handle it at all (and shouldn't). Some code on higher level can examine it, CORS headers, etc, and issue Fetch for that Dictionary if needed.
 
 
-- 
Bacek
 

Vasily Chekalkin

unread,
Jul 12, 2016, 5:11:31 AM7/12/16
to rsl...@chromium.org, Matt Menke, John Lenz, sd...@googlegroups.com, net-dev
 
 
12.07.2016, 07:42, "Ryan Sleevi" <rsl...@chromium.org>:
 
You touched on the issue's existence, but didn't seem to actually address it.
 
Apologies. To be explicit, I'm suggesting that if a hypothetical SDCHx spec wanted to permit Fetch to be integrated, then the method of accomodating that is to be extremely careful about prescribing the means of obtaining the dictionary resource, such that, as implemented in Web Browsers, a Web Browser could apply a Fetch-level filter.
 
Put into different terminology, I"m suggesting it mirrors the way in which we've been trying to break the existing SDCH circular dependencies. The //net layer would report that a dictionary has been advertised, it would filter up to the higher layer to make a policy decision about whether to allow that dictionary to be fetched, and then go down.
 
If dictionaries were truly SOP (same scheme, same host, same port) - that is, no cookies nonsense, no document.domain nonsense, a true SOP - then AIUI, the need for CORS is obviated because it's by definition no longer cross-origin.
 
But even if the need for *CORS* is obviated, I think it's still unwise/unacceptable to have the dictionary *load* go through a separate networking stack than other subresource requests. This is what I mean by Fetch integration. An advertised SDCH dictionary should not be fetched any differently than a rel="preload" / rel="prefetch" / img src="" resource - utilizing the same behaviours, flags, controls, and interaction models.
 
 
 
+1. Let's express dictionary fetching via Link: <url>; link="sdchx-dictionary". It should simplify life a lot.
 
-- 
Bacek
 
Reply all
Reply to author
Forward
0 new messages