An extension to help SDCH opt-out of encoding some content

9 views
Skip to first unread message

Jim R

unread,
May 3, 2011, 8:05:59 PM5/3/11
to SD...@googlegroups.com
Below is a proposal for a "helpful" extension for folks that would like to use SDCH compression, but only want to use it *some* of the time, or on some of the paths on their site, or on some of the content.

If you already grok the problem and just want to see the proposed resolution, jump to the end of this post.

PROBLEM and MOTIVATION

One complaint I've heard about the proposed SDCH protocol is that there is no nice way (in the meta-data) to restrict at a fine grain which URLs might be compressed relative to a given dictionary.  The meta-data does allow a path to be selected, but any URL that has the path prefix is supposed to be potentially compressible.  Although the specification does not require servers to compress content on an applicable path (i.e., they can ignore an advertised dictionary), an implementation issue exposed in Chromium makes "ignoring" a avil-dictionary problematic.

The implementation issue is derived from real-world networks, where proxies (and AV software) can impact the protocol.

Sometimes the "handling by a proxy" is not that invasive. For example, some proxies (or AV software) removes *all* text in the accept-encoding header of a request.  That does block SDCH compression (and gzip compression!?!), but it doesn't do incredible harm.  It slows the user down, wastes bandwidth, but at least content is not destroyed.  In fact, since the accept-encoding strings never reach the server, the server never responds with a suggestion to get an SDCH dictionary.  

Sometimes the "handling by a proxy" as mildly invasive.  For example some proxies (or AV software) allow the content-encoding to remain intact, allow a dictionary to be fetched and advertised, and don't (try) to wreak havoc until the SDCH encoded content arrives.  The mildly invasive proxies can rip out the "content-encoding: sdch,gzip" and (for instance) replace it with a mere "content-encoding: gzip."  <sigh>.  This is not that much trouble for Google Chromium to handle, as it just remembers that a plausibly-SDCH encodable request was made, so it tentatively tries to SDCH decode the content, and then gracefully passes through (to a mere gzip decoding) if that doesn't work.  The good news is that SDCH encoded content is handled perfectly in that case.

Slightly more malice has been seen when the content, already sdch+gzip encoded, is re-gzip encoded <gulp> a second time by a proxy.  Our conjecture was that a poorly written proxy didn't "recognize" the combined encoding, discarded the encoding string, and tried to toss in a "better" (re)encoding.  Here again, Chromium gracefully handles this content modification.

Unfortunately, the most malicious proxy that we've documented so far simply removes the content, and replaces it with an intersticial warning page, with text kindred to "unknown content arrived from this site.  Don't go there again" <ugh!>.  That damage to the content is problematic because the arriving page "looks" like a valid HTML page.  To handle that most extreme form of malice, Chromium detects the fact that a request was made that *should* have been SDCH encoded, but an encoded response was not provided.  In this very exceptional case, where such extreme malice was shown, Chromium proceeds to re-issue the request, without support for accept-encoding SDCH, and the site is blacklisted (with an exponential back-off).   Simply put, Chromium assumes that *if* it provided an applicable dictionary, then it will use it consistently, and any deviation is a symptom of a poorly written intermediary.

The rub comes when a site really wants to "not SDCH encode" a specific piece of content.  The meta-information only allows for a coarse specification of the applicable paths.  Any time a site fails to encode content with an applicable dictionary advertised, Chromium is forced (by the problem in the last paragraph) to black-list the site for SDCH encoding.

The question raised is then: How does a site SDCH encode some, but not all content on a singular path,  without triggering the black-listing?  The proposed SDCH spec does not allow the "desired" granularity. :-/


PROPOSED RESOLUTION

A suggestion was made on a Chromium bug report that server could indicate (in a response header) that it had deliberately chosen to *not* SDCH encode content, despite being given an "applicable" dictionary.  Note that it is generally not necessary for any server make this indication.  Also note that it would be especially difficult for a dumb proxy to damage this element.   Poorly written proxies tend to either remove, damage, or discard unrecognized headers, but don't tend to add them.  If a proxy removes such a "hint" header, then the status quo shown above will prevail (the protocol will work as described above).  If, as is reasonably probable, given that the proxy allowed the dictionary suggestion header to pass unscathed, the proxy passes along the "hint," then the browser would not need to be concerned when non-SDCH responses arrived (with an header asserting this was the servers intent).

Toward that end, I'm considering adding support in Chromium for the HTTP response header:

X-Sdch-Encode: 0

This header can be added to content that is not SDCH encoded, but is on a path that *should* otherwise have been SDCH encoded, based on the meta-data in the available dictionary.

That above optional response header can then be used by a server to opt-out of encoding content.

Comments?

Thanks,

Jim

Yoav

unread,
May 4, 2011, 4:46:20 AM5/4/11
to SDCH
Sounds great (obviously, since I filed the chromium bug report :) )
Do you have an estimate regarding when this will end up in the dev
channel?

Yoav

Jim Roskind

unread,
May 4, 2011, 12:58:06 PM5/4/11
to SDCH
Generally, DEV channel releases of Google Chrome have historically
gone out at a rate of about one a week.

I have a proposed change list pending a review:

http://codereview.chromium.org/6909033/

It is a pretty tiny change, and unless we have further discussion of
the feature, it should land shortly.

...and of course you can track progress via the bug (which will show
if/when I land):

http://code.google.com/p/chromium/issues/detail?id=78489

If you want to use a version sooner, I'd suggest signing up for the
CANARY channel (Windows only currently, I think). That channel has
historically shipped about 5-7 releases a week (usually once a day
weekdays, and sometimes on weekends), so most tip-of-tree patches
shipped within about a day. ..and of course, you could locally apply
the patch for a private Chromium build.

Jim
Reply all
Reply to author
Forward
0 new messages