Re: [blink-dev] Re: Intent to Implement and Ship: Adding new MHTML headers to support sharing of MHTML pages

24 views
Skip to first unread message

Jian Li

unread,
Oct 10, 2017, 8:46:23 PM10/10/17
to blink-api-ow...@chromium.org
Can any other api owners review this? Thanks.

On Tue, Oct 10, 2017 at 3:00 PM, Chris Harrelson <chri...@chromium.org> wrote:
LGTM1 to ship.

On Tue, Oct 10, 2017 at 11:09 AM, Jian Li <jia...@chromium.org> wrote:
I forgot to update this chromestatus doc to remove the Snapshot-Version header. I just updated it. Thanks.

On Mon, Oct 9, 2017 at 6:33 PM, Chris Harrelson <chri...@chromium.org> wrote:
Hi,

Thanks for doing that, looks good.

My other comments still remain though - I'm not totally sure what modifications to today's MHTML implementation are represented in this intent, and
in particular what headers are new or changed. Is it literally what is at this link? i.e.,

Two new headers, Snapshot-Version and Snapshot-Content-Location will be added.

The existing Subject header will be updated to support encoding of non-printable ASCII characters.

Snapshot-Version is not referenced in the document describing this feature.

Thanks,
Chris

On Mon, Oct 9, 2017 at 1:15 PM, Jian Li <jia...@chromium.org> wrote:


On Thu, Sep 7, 2017 at 11:53 AM, Chris Harrelson <chri...@chromium.org> wrote:


On Wed, Sep 6, 2017 at 4:45 PM, Jian Li <jia...@chromium.org> wrote:
Based on all the feedback received, we have decided to:
  • Drop X- prefix for new headers we plan to add.
These two headers?

Snapshot-Version
Snapshot-Content-Location

I looked at the latest version of the doc, and it doesn't go into any detail on where the number in Snapshot-Version comes from or how it's updated. 
  • We will not add new header for supporting non-printable ASCII title. Instead, Subject will be transformed to support this per RFC 2047 spec.
  • Snapshot-Content-Location header will be added for main resource URL.
 The document has been updated to reflect all these changes. We will figure out where to publish the whole document which lists all specific MHTML behaviors in Chromium.

Thanks for the detailed document investigating and explaining behavior.

I would like to block this Intent on publishing this information with one of the standards bodies in github - WICG or WhatWG perhaps. That way the usual editing, comment and bug flows for specs can proceed for this one. There has also been a lot of good discussion on this thread about details, which is best captured in the github bug process.

Chris, could you please revisit this? We've already post this at https://discourse.wicg.io/t/mhtml-generation-and-loading-as-implemented-in-chrome/2387.
 

Also, has there been any engagement outside of Google, and with other browsers in particular? I think the Intent is missing that section. Are they interested in advancing MHTML?

Chris
 

On Wed, Sep 6, 2017 at 4:10 PM, Dmitry Titov <dim...@chromium.org> wrote:
Indeed, the MHTML only has ietf.org spec that covers just the file format. There is no processing model or any other doc that would specify the UA behavior when generating/loading MHTML. This was probably not a big deal so far considering how low the usage of MHTML is and that most of this usage is by IE and Chrome.

Once completed, we plan to publish our list of specific behaviors, which is not a processing model by any standard, but just a list of implementation choices, so preliminary interop (or even a discussion) is facilitated. At the same time, we hope in the future we replace the MHTML with new packaging format which is being worked on, however as with any standards, the future of packaging is not very well staked out yet, so it's not landing into Chrome soon. So perhaps some implementation notes for MHTML can be useful in interim.

Dmitry

On Wed, Sep 6, 2017 at 3:47 PM Łukasz Anforowicz <luk...@chromium.org> wrote:
On Wednesday, September 6, 2017 at 4:18:47 AM UTC-7, Philip Jägenstedt wrote:
This is a tricky situation, since MHTML isn't web exposed in the usual way, but still we should expect that any useful+successful changes to the format would eventually be supported in all implementations.

I wholeheartedly agree with both of these: 1) I agree that MHTML is weird from standardization perspective and that 2) I hope that Chrome's MHTML implementation will be interoperable in the long-term with other implementations.

One thing that I want to point out is that since a few releases ago Chrome started showing mhtml delivered over http/https (previously it would only show mhtml opened via file:/// url).  I think that means that MHTML *is* web exposed (i.e. similarly to how HTML is web exposed).

Part of the problem here is that there is no currently maintained spec for MHTML (right?) but we need something by which others could implement the same thing without reverse engineering. If https://tools.ietf.org/html/rfc2557 is the closest thing to a spec but it leaves out many details, then an expedient option would be to write a spec that references RC 2557 and adds the necessary changes to serializer, parser and processing model. Delta specs and monkey patching is rightly frowned upon, but it still seems better than nothing.

RE: processing model.  dimich@ points out that RFC2557 only describes the MHTML format, but doesn't talk about things like 1) sandboxing or 2) stripping/ignoring some html attributes and/or javascript or 3) omnibox behavior [i.e. which url to show - the file:///-url VS the original http-url]. 
 
Jian, Łukasz, what's your take on this?

Yeah - having a real spec would be nice, but I think that for now we can just try documenting Chromium's implementation choices (including the custom X-Snapshot-* headers) and publishing them "somewhere" (github?  chromium.org?)

P.S. Looks like there are zero tests related to MHTML in web-platform-tests, and they would have to be manual tests. A bug blocking this bug and explaining why it's not testable would suffice here.

I've opened https://crbug.com/762547 to track this. 

On Tue, Sep 5, 2017 at 5:20 PM Łukasz Anforowicz <luk...@chromium.org> wrote:
<+palmer@ for the discussion about safety/security of parsing MHTML metadata in the browser process>

On Friday, September 1, 2017 at 3:09:29 PM UTC-7, Dmitry Titov wrote:
Thanks Łukasz,

some answers:

On Fri, Sep 1, 2017 at 10:27 AM Łukasz Anforowicz <luk...@chromium.org> wrote:
On Thursday, August 31, 2017 at 5:46:03 PM UTC-7, Jian Li wrote:

Summary

Adds custom headers, X-Snapshot-Title and X-Snapshot-Content-Location, to saved MHTML pages in order to support sharing of MHTML pages.


I assume that in the long term we want other MHTML clients to be able to consume the data generated by Chrome.  Therefore - how do we ensure that the new headers are eventually standardized?  FWIW, I see a MHTML specification in https://tools.ietf.org/html/rfc2557 (but I don't know if there are other specs).

Eventually, if the saved/offline static pages are useful and successful, we'd need to go even further ahead and likely propose a separate file format ( this one which is in the slow-moving development now). This small additions to MHTML allow us to start offering user-visible features that may provide necessary justification for actually replacing the MHTML format with the new one. Adding headers seems to have very low compatibility risk, and wrt other browsers, we are looking at not breaking them. 

So, there is no plan to standardize those additions to MHTML. Instead, the new (standard) packaging format will eventually replace the MHTML altogether.
 
 


Motivation

This allows the receiving party of the shared MHTML file to easily find out and show the basic info about the page. The existing MHTML headers are either not suitable to use (due to not supporting non-ASCII characters) or require additional parsing into multipart body.


Interoperability and Compatibility Risk


Low. New custom headers are being added, which should be skipped by the clients who could not recognize them.


We're considered using the existing MHTML headers but they can't be used because:

1) The Subject header does not work for title containing non-printable ASCII characters. Current version of Chrome and IE choose to output pure printable ASCII characters with all non-printable ASCII characters replaced by "?". Chrome and IE do not use this header when importing and loading MHTML pages, but some other MHTML clients do.


From the description above it seems that other browsers ignore *both* the standard Subject header and the new X-Snapshot-Title header (when reading a MHTML file).  Therefore, from the perspective of interoperability with other browsers it seems safe to have Chrome start using quoted-printable in the Subject header (rather than doing this in the new X-Snapshot-Title header as done in r496885).

Are there other concerns with using the Subject header?  You mentioned that other MHTML clients consume the Subject header - what is the behavior of these clients when quoted-printable encoding is used in the Subject header?  If the other clients are broken when seeing quoted-printable encoding, doesn't this indicate a bug in these other clients?
 

2) The Content-Location header is located in the 1st multipart section, which make the parsing and extraction of this header far more complicated than simply parsing the top headers.


If we want to consume MHTML files produced by other MHTML clients (e.g. IE or Edge) then we would still have to fall back to exposing the original/standard Content-Location header from the 1st multi part section, right?

While Chrome will be still able to open the other MHTML files, Chrome won't import the metadata during such opening. The metadata (Subject, original location etc) are used to streamline sharing and improve UI presenting offline pages in Chrome. Other MHTML files (which are rare in general on mobile) just won't be 'imported' into Chrome, and thus won't be shown in Download Home and other places in Chrome that offer "local offline content" to the user.

I am not sure if I understand what you mean by "just won't be 'imported' into Chrome".  What mechanism will be used to allow sharing/importing into Chrome of 1) MHTML files saved by Chrome, but 2) *not* MHTML files saved by IE or other MHTML clients?  Will Chrome use a sharing channel that is closed to other apps (so - e.g. a user won't be able to share into Chrome attachments coming from GMail - attachments possibly coming from other MHTML clients)?  Will Chrome try to analyse contents of an MHTML file to decide if the MHTML file was generated by Chrome VS other MHTML client?

Having the metadata at the beginning of MHTML as response headers allows simple import of metadata on the browser side (no need to load MHTML into renderer to parse).

There is indeed a delta in code complexity required to A) parse the new X-Snapshot-Content-Location header VS B) parse the standard Content-Location header of the 1st MHTML part.  OTOH, I am not sure if the delta make a significant difference for 1) the decision whether it is safe/secure to do the parsing in the browser or 2) the ongoing cost of code maintenance.

Ultimately, the final decision will be made in a security review, but I think that if we allow parsing of quoted-printable encoded Subject header (or X-Snapshot-Title header), then it should be okay to also treat the "delta" as safe.

I think the maintenance cost is manageable because 1) the "delta" seems nicely unit-testable and 2) the "delta" is not that big - parsing of Subject + X-Snapshot-Content-Location headers will require A) parsing of headers [i.e. reading a line + extracting key/value], B) recognizing "title" and "X-Snapshot-Content-Location" headers, C) skipping irrelevant headers / lines, D) decoding quoted-printable encoding for the "title" header.  The delta for instead parsing the standard Content-Location from the first part is in bold: A) parsing headers, B) recognizing "title" and "Content-Type" and "Content-Location" headers, B2) extracting boundary marker from the Content-Type header C) skipping irrelevant headers / lines until after a boundary marker, D) decoding quoted-printable encoding for the "title" header.

 
 

Ongoing technical constraints

None


Will this feature be supported on all six Blink platforms (Windows, Mac, Linux, Chrome OS, Android, and Android WebView)?

Yes.


Entry on the feature dashboard


--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/8c2d1623-da7a-4dcb-a8f2-b92cfb43d4fc%40chromium.org.

--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/8ceb38dd-3da3-495e-8cde-08abd314f47a%40chromium.org.

--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/33174820-f46c-42e1-a9e6-54abcb5466df%40chromium.org.

--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CACf%3DUnepik4OJtNSrgrTbLv%2BBa7_tCfqfCcQNVe6%3DwOR684btQ%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+unsubscribe@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAOnL0jvoQWwWpLvHnPiAqxOG%2Bzz1pex3Rj3a89NjtrgibjzooQ%40mail.gmail.com.




--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+unsubscribe@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAOnL0ju5rD4b2-RtcNSvnP_rCYMhTci9CkksFFVuTXDnxkzXug%40mail.gmail.com.


Dimitri Glazkov

unread,
Oct 10, 2017, 8:48:32 PM10/10/17
to Jian Li, blink-api-ow...@chromium.org
LGTM2

LGTM1 to ship.

To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org.

--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org.


--
You received this message because you are subscribed to the Google Groups "blink-api-owners-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blink-api-owners-d...@chromium.org.
To post to this group, send email to blink-api-ow...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-api-owners-discuss/CAOnL0jvS60nUZoAdtEEm%2BwDYihP0ko_AY_dZxxCtwCV6CbtMMQ%40mail.gmail.com.

Philip Jägenstedt

unread,
Oct 11, 2017, 5:42:08 AM10/11/17
to Dimitri Glazkov, Jian Li, blink-api-ow...@chromium.org
LGTM3

I have filed https://bugs.chromium.org/p/chromium/issues/detail?id=773621 about potentially writing shared tests for MHTML support, but given that this is an addition and no shared MHTML tests exist currently, that's not blocking anything.

Philip Jägenstedt

unread,
Oct 16, 2017, 4:24:35 AM10/16/17
to Dimitri Glazkov, Jian Li, blink-api-ow...@chromium.org
Oops, this wasn't on blink-dev. I'll follow up on the real thread.
Reply all
Reply to author
Forward
0 new messages