Re: SDCHx RFC proposal

14 views
Skip to first unread message

Randy Smith

unread,
Jul 7, 2016, 2:42:53 PM7/7/16
to Vasily Chekalkin, net-dev, SD...@googlegroups.com
I'm excited about this work, and would be happy to see a new spec developed remedying some of the flaws of the current spec.  There's a fair amount in here that I don't feel competent to comment on (like RFC 3229; that's a very old spec that I haven't heard of being implemented, which makes me worried it's got problems--what's its current status?), but a couple of points/questions:

* This should be shared with the Google group for SDCH spec discussions (cc'd :-}).  

* I'm no expert on CORS, but it looks to me as if there's an ambiguity in how Access-Control-Allow-Origin is being used (in Response 1) as to whether it applies to the response body (unencoded and being returned) or the dictionary listed in the Get-Dictionary header.  I'd naively expect it to apply to the response body; it seems weird to hijack it for the dictionary header.  Maybe it should be in the Response 2 headers?

* From an implementation POV I think Chromium would be uncomfortable relying on an unsupported open source library (yes, we're currently doing it, but it's a part of why there's been less than full excitement coming from this direction about SDCH).  If a security bug comes up, there's no guarantee that it'll be noticed or patched quickly.  So it's an implementation issue, but the lack of current support for openvcdiff is a problem in that space.

* If we're seriously thinking of using arbitrary resources as dictionaries, I think we should work out how a couple of examples would most naturally be implemented and make sure we're not specing ourselves into an implementation corner.  I'm specifically concerned about a) encoding a URL response with the contents of the previous URL response, and b) a proliferation of data structures with memory costs in the client (or server :-}) because of a semi-arbitrary use of resources as dictionaries.  Both client and server have full control over what they advertise, so it wouldn't be a problem building spec-compliant clients & servers, but it might be a disappointment if a server implemented this in order to support some particular usage model, and clients avoided it like the plague because of implementation complexity.

Having said all that, thanks very much for taking this on!

-- Randy


On Thu, Jun 30, 2016 at 12:55 AM, Vasily Chekalkin <ba...@yandex-team.ru> wrote:
Hello.

After recent (fsvo) discussion about state of SDCH and proper speccing it through IETF[1] we (as in Yandex net-dev team) decided to give it go.

But instead of just fixing small things in current SDCH spec [2] (which is properly copyrighted version of [3]) like "X-SDCH: 0" in spec vs "X-SDCH-Encoded: 0" in actual implementation we decided to improve it and fix some major pains. It also includes ideas from "SDCH2"[4] work.

There is an outline of proposal and any feedback will be greatly appreciated.

1. Dictionary format is changed to be just blob. This will simplify dynamic construction of dictionaries. Including using older version of resource to encode newer version.
2. All metadata about dictionaries/encoding/caching is passed via standard HTTP headers with a handful of new one. This will help to solve long standing problem with decoding content without available dictionary. Just because we can easily check it after fetching http headers from cache without fetching body.
3. (Rough idea but it's worth a shot) Incorporate RFC 3229 [5] approach to use different return code for encoded content instead of 200. This should mitigate some crazy proxies which can screw up responses.
4. Use CORS Access-Control-Allow-Origin [6] rules for matching dictionary scope to applicable URLs.
5. Do not use "X-SDCH-Encoded" header at all. Because of item 3 and our own experience of deploying SDCH on large cluster of different sites under single domain.


There are couple of examples of HTTP sessions who it can look:

Example 1: Simple precomputed Dictionary

Request 1:

GET /foo.html HTTP/1.1
Host: example.com
Accept-Encoding: gzip, deflate, br, sdchx
SDCHx-Features: <semicolon separated list of supported features>

This is request from client with cold caches and empty dictionary storage. "SDCHx-Features" header is for extensibility purpose. By default it will contains "encoding=vcdiff" to tell server about supported vcdiff encoding. In the future we can add more content-type specific codecs like courgette, etc.

Response 1:

HTTP/1.1 200 Ok
Set-Cookie: <sdch_tracking_id>
SDCHx-Get-Dictionary: https://cdn.example.com/common.dict
Access-Control-Allow-Origin: cdn.example.com

For precomputed dictionaries flow is mostly the same as in current SDCH implementation: server issue "SDCHx-Get-Dictionary" header with URL for static dictionary. Two other important headers are "Access-Control-Allow-Origin" to allow using dictionary from "cdn.example.com" and tracking cookie. Tracking cookie isn't something sdch specific. Just some unique identifier of the client. It's not relevant for static dictionary case but it can be used in next examples.

Request 2:

GET /common.dict HTTP/1.1
Host: cdn.example.com
Accept-Encoding: gzip, deflate, br, sdchx
SDCHx-Features: <semicolon separated list of supported features>
Cookie: <sdch_tracking_id=id>

Response 2:

HTTP/1.1 200 Ok
Content-Encoding: bg
Content-Length: <length>
SDCHx-Server-Id: <sha256 checksum of Dictionary content>
SDCHx-Related-Pattern: <pattern>
SDCHx-Algo: <algorithm>
Cache-Control: max-age=<max-age>

<dictionary blob>

Description of the headers:

SDCHx-Server-Id serves few purposes:
  a) Identify.
  b) Verify against MITM proxies which corrupts response. E.g. KIS.
  c) Generate client_id which is lower 48 bits of server_id

SDCHx-Related-Pattern is used to determine set of URLs this dictionary is applicable to. Must be validated using CORS semantic for Access-Control-Allow-Origin from original response.

SDCHx-Algo: algorithm to be used with this dictionary. Default is vcdiff, but can be extended in future for content-type specific algos.

Cache-Control: max-age serves the purpose of Dictionary lifetime.

Dictionary is just a blob of data without any particular format.


Request 3:

GET /bar.html HTTP/1.1
Host: example.com
Accept-Encoding: gzip, deflate, br, sdchx
Cookie: <sdch_tracking_id=id>
SDCHx-Features: <semicolon separated list of supported features>
SDCHx-Avail-Dictionaries: <comma separated list of client dictionary ids>


Response 3:

HTTP/1.1 242 Delta encoded
Content-Encoding: sdchx, br
SDCHx-Used-Dictionary-Id: <server_id>
SDCHx-Algo: vcdiff

<vcdiff encoded response>

SDCHx-Used-Dictionary-Id and SDCHx-Alog used to decode response. It also can be used to decode cached content (or reject it if dictionary is no longer available).



Example 2: Dynamic Dictionary (SDCH2/RC3229 replacement)

Because dictionary is just opaque blob of data we can easily use response-as-dictionary approach.

Request 1:

GET /foo.html HTTP/1.1
Host: example.com
Accept-Encoding: gzip, deflate, br, sdchx
SDCHx-Features: <semicolon separated list of supported features>

Response 2:

HTTP/1.1 200 Ok
Content-Type: text/html
Content-Encoding: br
Content-Length: <length>
Set-Cookie: <sdch_tracking_id>
SDCHx-Server-Id: <sha256 checksum of Dictionary content>
SDCHx-Related-Pattern: <pattern>
SDCHx-Algo: <algorithm>
Cache-Control: max-age=<max-age>


In this case server tell client to use this response as dictionary by just providing same set of headers as for precomputed dictionary. SDCHx-Server-Id will require to process whole response body on server to calculate it which can be a memory heavy operation. But with HTTP/2 we can use trailing headers block and send SDCHx-Server-Id in it without delaying whole response.

Subsequent request/responses are exactly the same as in first example.



Semi-formal headers description:

Client headers

Accept-Encoding
Must contains sdchx to notify server about SDCHx support.

SDCHx-Features
Semicolon separated list of features client support.
encoding=<> -- comma separated list of encodings supported by client. By default list consist of single element vcdiff
In future version list features can be extended.


SDCHx-Avail-Dictionaries
Comma separated list of dictionary client ids.

Cookie
To properly support Dynamic Dictionaries server need some unique tracking of client. We can use standard cookie mechanism to do it. Name of the cookie is not specified.

Server headers

SDCHx-Server-Id
Server id is calculated as sha256 of Dictionary content. It also serves base for client id (lower 48 bits) and verification purpose.

SDCHx-Related-Pattern
 host/path to which this Dictionary is applicable. When host  part is empty dictionary is applicable to
 all hosts from which SDCHx-Get-Dictionary response headers was issued
 Original host in case of Dynamic Dictionary

SDCHx-Algo
  Specify either
  algorithm this Dictionary can be used with. Default is vcdiff.
  Algorithm used to encode response. Default is vcdiff.

SDCHx-Used-Dictionary-Id
Dictionary server id used for encoding.

SDCHx-Remove-Dictionary
 a) Comma separated list of Dictionaries which client should remove. Must be subset of SDCHx-Available-Dictionaries header.
 b) or "*" to remove all dictionaries for origin domain.

Cache-Control
max-age=<seconds> determine lifetime of the Dictionary. After this client can re-validate it using standard HTTP caching semantics. Default is <N> seconds.
private - to avoid mis caching on intermediate proxies of delta encoded result server SHOULD set “private” in Cache-Control header for SDCHx encoded responses.
no-transform - to prevent MitM proxies from doing their stupid thingy.




[1] https://codereview.chromium.org/1876683002/
[2] https://docs.google.com/viewer?a=v&pid=forums&srcid=MDIwOTgxNDMwMTgyMjkzMTI2ODcBMDQ2MzU5NDU2MDA0MTg5NDE1MTkBTDZmaENoSG9BZ0FKATAuMQEBdjI&authuser=0
[3] http://lists.w3.org/Archives/Public/ietf-http-wg/2008JulSep/att-0441/Shared_Dictionary_Compression_over_HTTP.pdf
[4] https://docs.google.com/document/d/1UhN7-ZpDqR3vq9OcZ12LQsfhPaNmJNkCM7KRcPLy2VY/edit#
[5] https://tools.ietf.org/html/rfc3229
[6] https://www.w3.org/TR/cors/#access-control-allow-origin-response-header

-- 
Bacek

--
You received this message because you are subscribed to the Google Groups "net-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to net-dev+u...@chromium.org.
To post to this group, send email to net...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/net-dev/99471467262546%40webcorp01e.yandex-team.ru.

John Lenz

unread,
Jul 8, 2016, 11:29:03 AM7/8/16
to SD...@googlegroups.com, Vasily Chekalkin, net-dev
* In past discussions, the primary thing that I wanted to avoid was a distinct dictionary download, that any resource could be declared as a "possible future dictionary".   The current SDCH spec made this impossible.  

* I'd be happier if dictionaries were like other resources and a "Remove Dictionary" did not exist.

* The original implementation of SDCH in Chrome is very specialized to its original use case (hard limit on dictionary sizes, dictionaries were loaded early into memory and pinned there, etc).  I wasn't clear on what if any changes would be reasonably be accepted and if there was any future in SDCH at all.

--
You received this message because you are subscribed to the Google Groups "SDCH" group.
To unsubscribe from this group and stop receiving emails from it, send an email to SDCH+uns...@googlegroups.com.
To post to this group, send email to SD...@googlegroups.com.
Visit this group at https://groups.google.com/group/SDCH.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages