Intent to Implement and Ship: Adding new MHTML headers to support sharing of MHTML pages

254 views
Skip to first unread message

Jian Li

unread,
Aug 31, 2017, 8:46:03 PM8/31/17
to blink-dev

Contact emails

jia...@chromium.org


Spec

https://docs.google.com/document/d/1FvmYUC0S0BkdkR7wZsg0hLdKc_qjGnGahBwwa0CdnHE/edit#heading=h.s65c3k7eanp


Summary

Adds custom headers, X-Snapshot-Title and X-Snapshot-Content-Location, to saved MHTML pages in order to support sharing of MHTML pages.


Motivation

This allows the receiving party of the shared MHTML file to easily find out and show the basic info about the page. The existing MHTML headers are either not suitable to use (due to not supporting non-ASCII characters) or require additional parsing into multipart body.


Interoperability and Compatibility Risk


Low. New custom headers are being added, which should be skipped by the clients who could not recognize them.


We're considered using the existing MHTML headers but they can't be used because:

1) The Subject header does not work for title containing non-printable ASCII characters. Current version of Chrome and IE choose to output pure printable ASCII characters with all non-printable ASCII characters replaced by "?". Chrome and IE do not use this header when importing and loading MHTML pages, but some other MHTML clients do.

2) The Content-Location header is located in the 1st multipart section, which make the parsing and extraction of this header far more complicated than simply parsing the top headers.


Ongoing technical constraints

None


Will this feature be supported on all six Blink platforms (Windows, Mac, Linux, Chrome OS, Android, and Android WebView)?

Yes.


Entry on the feature dashboard


Łukasz Anforowicz

unread,
Sep 1, 2017, 1:27:37 PM9/1/17
to blink-dev
On Thursday, August 31, 2017 at 5:46:03 PM UTC-7, Jian Li wrote:

Summary

Adds custom headers, X-Snapshot-Title and X-Snapshot-Content-Location, to saved MHTML pages in order to support sharing of MHTML pages.


I assume that in the long term we want other MHTML clients to be able to consume the data generated by Chrome.  Therefore - how do we ensure that the new headers are eventually standardized?  FWIW, I see a MHTML specification in https://tools.ietf.org/html/rfc2557 (but I don't know if there are other specs).
 


Motivation

This allows the receiving party of the shared MHTML file to easily find out and show the basic info about the page. The existing MHTML headers are either not suitable to use (due to not supporting non-ASCII characters) or require additional parsing into multipart body.


Interoperability and Compatibility Risk


Low. New custom headers are being added, which should be skipped by the clients who could not recognize them.


We're considered using the existing MHTML headers but they can't be used because:

1) The Subject header does not work for title containing non-printable ASCII characters. Current version of Chrome and IE choose to output pure printable ASCII characters with all non-printable ASCII characters replaced by "?". Chrome and IE do not use this header when importing and loading MHTML pages, but some other MHTML clients do.


From the description above it seems that other browsers ignore *both* the standard Subject header and the new X-Snapshot-Title header (when reading a MHTML file).  Therefore, from the perspective of interoperability with other browsers it seems safe to have Chrome start using quoted-printable in the Subject header (rather than doing this in the new X-Snapshot-Title header as done in r496885).

Are there other concerns with using the Subject header?  You mentioned that other MHTML clients consume the Subject header - what is the behavior of these clients when quoted-printable encoding is used in the Subject header?  If the other clients are broken when seeing quoted-printable encoding, doesn't this indicate a bug in these other clients?
 

2) The Content-Location header is located in the 1st multipart section, which make the parsing and extraction of this header far more complicated than simply parsing the top headers.


If we want to consume MHTML files produced by other MHTML clients (e.g. IE or Edge) then we would still have to fall back to exposing the original/standard Content-Location header from the 1st multi part section, right?

Jian Li

unread,
Sep 1, 2017, 5:54:52 PM9/1/17
to Łukasz Anforowicz, blink-dev
On Fri, Sep 1, 2017 at 10:27 AM, Łukasz Anforowicz <luk...@chromium.org> wrote:
On Thursday, August 31, 2017 at 5:46:03 PM UTC-7, Jian Li wrote:

Summary

Adds custom headers, X-Snapshot-Title and X-Snapshot-Content-Location, to saved MHTML pages in order to support sharing of MHTML pages.


I assume that in the long term we want other MHTML clients to be able to consume the data generated by Chrome.  Therefore - how do we ensure that the new headers are eventually standardized?  FWIW, I see a MHTML specification in https://tools.ietf.org/html/rfc2557 (but I don't know if there are other specs).

Yes, this is the only spec. This spec also includes some other specs as extensions. It seems what we want for these custom headers can be found in other standardized headers (see my reply below) though no example was listed and no vendors have implemented support based on these.
 
 


Motivation

This allows the receiving party of the shared MHTML file to easily find out and show the basic info about the page. The existing MHTML headers are either not suitable to use (due to not supporting non-ASCII characters) or require additional parsing into multipart body.


Interoperability and Compatibility Risk


Low. New custom headers are being added, which should be skipped by the clients who could not recognize them.


We're considered using the existing MHTML headers but they can't be used because:

1) The Subject header does not work for title containing non-printable ASCII characters. Current version of Chrome and IE choose to output pure printable ASCII characters with all non-printable ASCII characters replaced by "?". Chrome and IE do not use this header when importing and loading MHTML pages, but some other MHTML clients do.


From the description above it seems that other browsers ignore *both* the standard Subject header and the new X-Snapshot-Title header (when reading a MHTML file).  Therefore, from the perspective of interoperability with other browsers it seems safe to have Chrome start using quoted-printable in the Subject header (rather than doing this in the new X-Snapshot-Title header as done in r496885).

Are there other concerns with using the Subject header?  You mentioned that other MHTML clients consume the Subject header - what is the behavior of these clients when quoted-printable encoding is used in the Subject header?  If the other clients are broken when seeing quoted-printable encoding, doesn't this indicate a bug in these other clients?

Most of third-part MHTML viewer clients can parse the encoded text in Subject header.

The MHTML spec RFC 2557 indeed lists RFC 2047 in its references. So it seems to be legitimate to put the encoded text in Subject header. I am going to update the proposal to switch to using Subject.
 
 

2) The Content-Location header is located in the 1st multipart section, which make the parsing and extraction of this header far more complicated than simply parsing the top headers.


If we want to consume MHTML files produced by other MHTML clients (e.g. IE or Edge) then we would still have to fall back to exposing the original/standard Content-Location header from the 1st multi part section, right?

The Content-Location header in the 1st multipart is always read and used when the MHTML page is being loaded. This is always the case for current and future version of Chrome,

We plan to add a Content-Location header in the MHTML headers (or called Message headers) in order not to go to 1st multipart when we only need to pull out the necessary metadata without do the full loading.

Originally I thought Content-Location was not standardized as a supported header in MHTML headers (not any example was found). But with closer look at RFC 2557, there is a following sentence: 
A single Content-Location header field is allowed in any message or
   content heading, in addition to a Content-ID header (as specified in
   [MIME1]) and, in Message headings, a Message-ID (as specified in
   [RFC822]).

So I am going to abandon the proposal to add X-Snapshot-Content-Location. Instead, I will add a Content-Location header directly in MHTML headers.



Ongoing technical constraints

None


Will this feature be supported on all six Blink platforms (Windows, Mac, Linux, Chrome OS, Android, and Android WebView)?

Yes.


Entry on the feature dashboard


--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/8c2d1623-da7a-4dcb-a8f2-b92cfb43d4fc%40chromium.org.

Dmitry Titov

unread,
Sep 1, 2017, 6:09:29 PM9/1/17
to Łukasz Anforowicz, blink-dev
Thanks Łukasz,

some answers:

On Fri, Sep 1, 2017 at 10:27 AM Łukasz Anforowicz <luk...@chromium.org> wrote:
On Thursday, August 31, 2017 at 5:46:03 PM UTC-7, Jian Li wrote:

Summary

Adds custom headers, X-Snapshot-Title and X-Snapshot-Content-Location, to saved MHTML pages in order to support sharing of MHTML pages.


I assume that in the long term we want other MHTML clients to be able to consume the data generated by Chrome.  Therefore - how do we ensure that the new headers are eventually standardized?  FWIW, I see a MHTML specification in https://tools.ietf.org/html/rfc2557 (but I don't know if there are other specs).

Eventually, if the saved/offline static pages are useful and successful, we'd need to go even further ahead and likely propose a separate file format ( this one which is in the slow-moving development now). This small additions to MHTML allow us to start offering user-visible features that may provide necessary justification for actually replacing the MHTML format with the new one. Adding headers seems to have very low compatibility risk, and wrt other browsers, we are looking at not breaking them. 

So, there is no plan to standardize those additions to MHTML. Instead, the new (standard) packaging format will eventually replace the MHTML altogether.
 
 


Motivation

This allows the receiving party of the shared MHTML file to easily find out and show the basic info about the page. The existing MHTML headers are either not suitable to use (due to not supporting non-ASCII characters) or require additional parsing into multipart body.


Interoperability and Compatibility Risk


Low. New custom headers are being added, which should be skipped by the clients who could not recognize them.


We're considered using the existing MHTML headers but they can't be used because:

1) The Subject header does not work for title containing non-printable ASCII characters. Current version of Chrome and IE choose to output pure printable ASCII characters with all non-printable ASCII characters replaced by "?". Chrome and IE do not use this header when importing and loading MHTML pages, but some other MHTML clients do.


From the description above it seems that other browsers ignore *both* the standard Subject header and the new X-Snapshot-Title header (when reading a MHTML file).  Therefore, from the perspective of interoperability with other browsers it seems safe to have Chrome start using quoted-printable in the Subject header (rather than doing this in the new X-Snapshot-Title header as done in r496885).

Are there other concerns with using the Subject header?  You mentioned that other MHTML clients consume the Subject header - what is the behavior of these clients when quoted-printable encoding is used in the Subject header?  If the other clients are broken when seeing quoted-printable encoding, doesn't this indicate a bug in these other clients?
 

2) The Content-Location header is located in the 1st multipart section, which make the parsing and extraction of this header far more complicated than simply parsing the top headers.


If we want to consume MHTML files produced by other MHTML clients (e.g. IE or Edge) then we would still have to fall back to exposing the original/standard Content-Location header from the 1st multi part section, right?

While Chrome will be still able to open the other MHTML files, Chrome won't import the metadata during such opening. The metadata (Subject, original location etc) are used to streamline sharing and improve UI presenting offline pages in Chrome. Other MHTML files (which are rare in general on mobile) just won't be 'imported' into Chrome, and thus won't be shown in Download Home and other places in Chrome that offer "local offline content" to the user.

Having the metadata at the beginning of MHTML as response headers allows simple import of metadata on the browser side (no need to load MHTML into renderer to parse).
 

Ongoing technical constraints

None


Will this feature be supported on all six Blink platforms (Windows, Mac, Linux, Chrome OS, Android, and Android WebView)?

Yes.


Entry on the feature dashboard


Jeffrey Yasskin

unread,
Sep 1, 2017, 6:23:15 PM9/1/17
to Jian Li, blink-dev
Could you put the spec on Github/in the WICG before shipping? I know we're hoping to replace this with Web Packaging in the long run, but it'd be good to let folks match our MHTML usage in the meantime.

Could you extend the spec to say what happens when the top-level Content-Location doesn't match the main resource?

Thanks,
Jeffrey

--
You received this message because you are subscribed to the Google Groups "blink-dev" group.

Dmitry Titov

unread,
Sep 1, 2017, 6:33:47 PM9/1/17
to Jian Li, Łukasz Anforowicz, blink-dev
The way I read the section describing URIs of roots vs aggregates makes me think the Content-Location at the content headers is not the same as Content-Location of the root resource. If we use it as a duplicate of the root resource URI, it might violate the spec. Although most MHTML readers probably ignore the content header, it still may cause increased compatibility risk. If your reading matches mine, it is probably better to use a custom header for that.
 



Ongoing technical constraints

None


Will this feature be supported on all six Blink platforms (Windows, Mac, Linux, Chrome OS, Android, and Android WebView)?

Yes.


Entry on the feature dashboard


--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/8c2d1623-da7a-4dcb-a8f2-b92cfb43d4fc%40chromium.org.

--
You received this message because you are subscribed to the Google Groups "blink-dev" group.

Jian Li

unread,
Sep 1, 2017, 6:37:39 PM9/1/17
to Dmitry Titov, Łukasz Anforowicz, blink-dev
Thanks Dmitry for pointing this out. It seems that we do have such concern for using Content-Location header in message part. So I will keep the propose to introduce new custom header X-Snapshot-Content-Location.

Dominic Cooney

unread,
Sep 4, 2017, 3:25:42 AM9/4/17
to Jian Li, Dmitry Titov, Łukasz Anforowicz, blink-dev
On Sat, Sep 2, 2017 at 7:37 AM, Jian Li <jia...@chromium.org> wrote:


On Fri, Sep 1, 2017 at 3:33 PM, Dmitry Titov <dim...@chromium.org> wrote:


On Fri, Sep 1, 2017 at 2:54 PM Jian Li <jia...@chromium.org> wrote:
On Fri, Sep 1, 2017 at 10:27 AM, Łukasz Anforowicz <luk...@chromium.org> wrote:

Does this doc mention X-Snapshot-Title? It seems to instead talk about an encoding in the Subject header. How is X-Snapshot-Title encoded?
 

Łukasz Anforowicz

unread,
Sep 5, 2017, 10:50:41 AM9/5/17
to blink-dev, jia...@chromium.org, dim...@chromium.org, luk...@chromium.org
Based on one of the previous messages above, I think that we are modifying the proposal so that: 1) there won't be a new X-Snapshot-Title header and 2) the Subject header will be encoded using quoted-printable encoding (since the MHTML spec RFC 2557 indeed lists RFC 2047 in its references).

Łukasz Anforowicz

unread,
Sep 5, 2017, 11:19:58 AM9/5/17
to blink-dev, luk...@chromium.org, Chris Palmer
<+palmer@ for the discussion about safety/security of parsing MHTML metadata in the browser process>
I am not sure if I understand what you mean by "just won't be 'imported' into Chrome".  What mechanism will be used to allow sharing/importing into Chrome of 1) MHTML files saved by Chrome, but 2) *not* MHTML files saved by IE or other MHTML clients?  Will Chrome use a sharing channel that is closed to other apps (so - e.g. a user won't be able to share into Chrome attachments coming from GMail - attachments possibly coming from other MHTML clients)?  Will Chrome try to analyse contents of an MHTML file to decide if the MHTML file was generated by Chrome VS other MHTML client?

Having the metadata at the beginning of MHTML as response headers allows simple import of metadata on the browser side (no need to load MHTML into renderer to parse).

There is indeed a delta in code complexity required to A) parse the new X-Snapshot-Content-Location header VS B) parse the standard Content-Location header of the 1st MHTML part.  OTOH, I am not sure if the delta make a significant difference for 1) the decision whether it is safe/secure to do the parsing in the browser or 2) the ongoing cost of code maintenance.

Ultimately, the final decision will be made in a security review, but I think that if we allow parsing of quoted-printable encoded Subject header (or X-Snapshot-Title header), then it should be okay to also treat the "delta" as safe.

I think the maintenance cost is manageable because 1) the "delta" seems nicely unit-testable and 2) the "delta" is not that big - parsing of Subject + X-Snapshot-Content-Location headers will require A) parsing of headers [i.e. reading a line + extracting key/value], B) recognizing "title" and "X-Snapshot-Content-Location" headers, C) skipping irrelevant headers / lines, D) decoding quoted-printable encoding for the "title" header.  The delta for instead parsing the standard Content-Location from the first part is in bold: A) parsing headers, B) recognizing "title" and "Content-Type" and "Content-Location" headers, B2) extracting boundary marker from the Content-Type header C) skipping irrelevant headers / lines until after a boundary marker, D) decoding quoted-printable encoding for the "title" header.

Philip Jägenstedt

unread,
Sep 6, 2017, 7:18:47 AM9/6/17
to Łukasz Anforowicz, blink-dev, Chris Palmer
This is a tricky situation, since MHTML isn't web exposed in the usual way, but still we should expect that any useful+successful changes to the format would eventually be supported in all implementations.

Part of the problem here is that there is no currently maintained spec for MHTML (right?) but we need something by which others could implement the same thing without reverse engineering. If https://tools.ietf.org/html/rfc2557 is the closest thing to a spec but it leaves out many details, then an expedient option would be to write a spec that references RC 2557 and adds the necessary changes to serializer, parser and processing model. Delta specs and monkey patching is rightly frowned upon, but it still seems better than nothing.

Jian, Łukasz, what's your take on this?

P.S. Looks like there are zero tests related to MHTML in web-platform-tests, and they would have to be manual tests. A bug blocking this bug and explaining why it's not testable would suffice here.

Chris Palmer

unread,
Sep 6, 2017, 4:46:40 PM9/6/17
to Philip Jägenstedt, Łukasz Anforowicz, blink-dev
It's not awesome to parse data from untrustworthy sources in privileged processes.

If you can specify a precise grammar for the header names and values, and specify precise behavior on parse failure (in particular, not trying to repair the input or guess what the author 'must have meant'), and if we can use existing well-tested parsers (e.g. `strtod` and `GURL(std::string)`, with negative unit tests) at them, then it could be OK.

Łukasz Anforowicz

unread,
Sep 6, 2017, 6:35:21 PM9/6/17
to blink-dev, foo...@google.com, luk...@chromium.org, pal...@chromium.org
jianli@ pointed out to me that //net already has code for parsing headers and it even seems to include support for quoted-printable encoding.
  • This means that the complexity delta (for using the standard Content-Location from the 1st mhtml part) is much bigger than I've initially estimated: the quoted-printable support is already taken care of today, while the browser-side code for extracting the first mhtml part (well, its headers) would still have to be written from scratch (i.e. I see that all the browser-side callers of net::HttpUtil::ParseContentType pass a null pointer as the |boundary| argument)
  • This also means that there won't be any increase in browser's attack surface (since parsers/decoders in //net layer already have to deal with untrusted data)
  • So - this means that a new X-Snapshot-Content-Location is probably preferable (over trying to parse the standard Content-Location in the 1st mhtml part)

Łukasz Anforowicz

unread,
Sep 6, 2017, 6:47:46 PM9/6/17
to blink-dev, luk...@chromium.org, pal...@chromium.org, foo...@google.com
On Wednesday, September 6, 2017 at 4:18:47 AM UTC-7, Philip Jägenstedt wrote:
This is a tricky situation, since MHTML isn't web exposed in the usual way, but still we should expect that any useful+successful changes to the format would eventually be supported in all implementations.

I wholeheartedly agree with both of these: 1) I agree that MHTML is weird from standardization perspective and that 2) I hope that Chrome's MHTML implementation will be interoperable in the long-term with other implementations.

One thing that I want to point out is that since a few releases ago Chrome started showing mhtml delivered over http/https (previously it would only show mhtml opened via file:/// url).  I think that means that MHTML *is* web exposed (i.e. similarly to how HTML is web exposed).

Part of the problem here is that there is no currently maintained spec for MHTML (right?) but we need something by which others could implement the same thing without reverse engineering. If https://tools.ietf.org/html/rfc2557 is the closest thing to a spec but it leaves out many details, then an expedient option would be to write a spec that references RC 2557 and adds the necessary changes to serializer, parser and processing model. Delta specs and monkey patching is rightly frowned upon, but it still seems better than nothing.

RE: processing model.  dimich@ points out that RFC2557 only describes the MHTML format, but doesn't talk about things like 1) sandboxing or 2) stripping/ignoring some html attributes and/or javascript or 3) omnibox behavior [i.e. which url to show - the file:///-url VS the original http-url]. 
 
Jian, Łukasz, what's your take on this?

Yeah - having a real spec would be nice, but I think that for now we can just try documenting Chromium's implementation choices (including the custom X-Snapshot-* headers) and publishing them "somewhere" (github?  chromium.org?)

P.S. Looks like there are zero tests related to MHTML in web-platform-tests, and they would have to be manual tests. A bug blocking this bug and explaining why it's not testable would suffice here.

I've opened https://crbug.com/762547 to track this. 

Dmitry Titov

unread,
Sep 6, 2017, 7:10:28 PM9/6/17
to Łukasz Anforowicz, blink-dev, pal...@chromium.org, foo...@google.com
Indeed, the MHTML only has ietf.org spec that covers just the file format. There is no processing model or any other doc that would specify the UA behavior when generating/loading MHTML. This was probably not a big deal so far considering how low the usage of MHTML is and that most of this usage is by IE and Chrome.

Once completed, we plan to publish our list of specific behaviors, which is not a processing model by any standard, but just a list of implementation choices, so preliminary interop (or even a discussion) is facilitated. At the same time, we hope in the future we replace the MHTML with new packaging format which is being worked on, however as with any standards, the future of packaging is not very well staked out yet, so it's not landing into Chrome soon. So perhaps some implementation notes for MHTML can be useful in interim.

Dmitry

Jian Li

unread,
Sep 6, 2017, 7:46:04 PM9/6/17
to Dmitry Titov, Łukasz Anforowicz, blink-dev, pal...@chromium.org, foo...@google.com
Based on all the feedback received, we have decided to:
  • Drop X- prefix for new headers we plan to add.
  • We will not add new header for supporting non-printable ASCII title. Instead, Subject will be transformed to support this per RFC 2047 spec.
  • Snapshot-Content-Location header will be added for main resource URL.
 The document has been updated to reflect all these changes. We will figure out where to publish the whole document which lists all specific MHTML behaviors in Chromium.

Chris Harrelson

unread,
Sep 7, 2017, 2:53:39 PM9/7/17
to Jian Li, Dmitry Titov, Łukasz Anforowicz, blink-dev, pal...@chromium.org, Philip Jägenstedt
On Wed, Sep 6, 2017 at 4:45 PM, Jian Li <jia...@chromium.org> wrote:
Based on all the feedback received, we have decided to:
  • Drop X- prefix for new headers we plan to add.
These two headers?

Snapshot-Version
Snapshot-Content-Location

I looked at the latest version of the doc, and it doesn't go into any detail on where the number in Snapshot-Version comes from or how it's updated. 
  • We will not add new header for supporting non-printable ASCII title. Instead, Subject will be transformed to support this per RFC 2047 spec.
  • Snapshot-Content-Location header will be added for main resource URL.
 The document has been updated to reflect all these changes. We will figure out where to publish the whole document which lists all specific MHTML behaviors in Chromium.

Thanks for the detailed document investigating and explaining behavior.

I would like to block this Intent on publishing this information with one of the standards bodies in github - WICG or WhatWG perhaps. That way the usual editing, comment and bug flows for specs can proceed for this one. There has also been a lot of good discussion on this thread about details, which is best captured in the github bug process.

Also, has there been any engagement outside of Google, and with other browsers in particular? I think the Intent is missing that section. Are they interested in advancing MHTML?

Chris
 
To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+unsubscribe@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAOnL0jvoQWwWpLvHnPiAqxOG%2Bzz1pex3Rj3a89NjtrgibjzooQ%40mail.gmail.com.

Jian Li

unread,
Oct 4, 2017, 7:34:40 PM10/4/17
to Dmitry Titov, Łukasz Anforowicz, blink-dev, pal...@chromium.org, Philip Jägenstedt
Thanks to all your great feedback. I've updated the doc to resolve all the raised issues, with the great help from Jeffery.

I am going to post it to https://discourse.wicg.io as suggested.

On Wed, Sep 6, 2017 at 4:45 PM, Jian Li <jia...@chromium.org> wrote:

Jian Li

unread,
Oct 4, 2017, 7:45:42 PM10/4/17
to Dmitry Titov, Łukasz Anforowicz, blink-dev, pal...@chromium.org, Philip Jägenstedt

Jian Li

unread,
Oct 9, 2017, 4:15:38 PM10/9/17
to Chris Harrelson, Dmitry Titov, Łukasz Anforowicz, blink-dev, pal...@chromium.org, Philip Jägenstedt
On Thu, Sep 7, 2017 at 11:53 AM, Chris Harrelson <chri...@chromium.org> wrote:


On Wed, Sep 6, 2017 at 4:45 PM, Jian Li <jia...@chromium.org> wrote:
Based on all the feedback received, we have decided to:
  • Drop X- prefix for new headers we plan to add.
These two headers?

Snapshot-Version
Snapshot-Content-Location

I looked at the latest version of the doc, and it doesn't go into any detail on where the number in Snapshot-Version comes from or how it's updated. 
  • We will not add new header for supporting non-printable ASCII title. Instead, Subject will be transformed to support this per RFC 2047 spec.
  • Snapshot-Content-Location header will be added for main resource URL.
 The document has been updated to reflect all these changes. We will figure out where to publish the whole document which lists all specific MHTML behaviors in Chromium.

Thanks for the detailed document investigating and explaining behavior.

I would like to block this Intent on publishing this information with one of the standards bodies in github - WICG or WhatWG perhaps. That way the usual editing, comment and bug flows for specs can proceed for this one. There has also been a lot of good discussion on this thread about details, which is best captured in the github bug process.

Chris, could you please revisit this? We've already post this at https://discourse.wicg.io/t/mhtml-generation-and-loading-as-implemented-in-chrome/2387.

Chris Harrelson

unread,
Oct 9, 2017, 9:33:55 PM10/9/17
to Jian Li, Dmitry Titov, Łukasz Anforowicz, blink-dev, pal...@chromium.org, Philip Jägenstedt
Hi,

Thanks for doing that, looks good.

My other comments still remain though - I'm not totally sure what modifications to today's MHTML implementation are represented in this intent, and
in particular what headers are new or changed. Is it literally what is at this link? i.e.,

Two new headers, Snapshot-Version and Snapshot-Content-Location will be added.

The existing Subject header will be updated to support encoding of non-printable ASCII characters.

Snapshot-Version is not referenced in the document describing this feature.

Thanks,
Chris

Jian Li

unread,
Oct 10, 2017, 2:09:38 PM10/10/17
to Chris Harrelson, Dmitry Titov, Łukasz Anforowicz, blink-dev, pal...@chromium.org, Philip Jägenstedt
I forgot to update this chromestatus doc to remove the Snapshot-Version header. I just updated it. Thanks.

Chris Harrelson

unread,
Oct 10, 2017, 6:01:06 PM10/10/17
to Jian Li, Dmitry Titov, Łukasz Anforowicz, blink-dev, pal...@chromium.org, Philip Jägenstedt

Philip Jägenstedt

unread,
Oct 16, 2017, 4:26:57 AM10/16/17
to Chris Harrelson, Jian Li, Dmitry Titov, Łukasz Anforowicz, blink-dev, pal...@chromium.org
This thread was pinged on blink-api-owners-discuss and neither Dimitri or I noticed the list had changed. Dimitri said LGTM2 and I said:

LGTM3

I have filed https://bugs.chromium.org/p/chromium/issues/detail?id=773621 about potentially writing shared tests for MHTML support, but given that this is an addition and no shared MHTML tests exist currently, that's not blocking anything.

To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org.

--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org.

sieme...@gmail.com

unread,
Dec 11, 2017, 9:44:45 PM12/11/17
to blink-dev
Hello, I'm developer that uses MTHML format for my own purposes for years and I have something to say.
I think you can just use regular Content-Location header in the heading of a multipart/related.

Opera on Presto engine adds the header Conent-Location in the MHTML header. Many years ago - greather than ten for sure - only IE and Opera could make MHTML files and that was the industry standart. So I think MHTML readers can handle Content-Location header in the main header. 

If a Content-Location header field is used
   in the heading of a multipart/related, this Content-Location SHOULD
   apply to the whole aggregate, not to its root part.
I think it can mean, for example, we can have Content-Location: http://example.com/page/subpage, but the root header can be http://example.com/SPAengine.html if we deal with some kind of Single-Page Application that use HTML5 History API. (Yes, I know that Blink doesn't allow Javascript for MHTML pages, but other browsers do.)
And some simple example: Content-Location in the MHTML header is http://example.com/article.html#paragraph5 but the root header just is http://example.com/article.html.
By the way, others browsers (I've checked IE 11, Opera 12) strips URI #anchor for root Content-Location and it seems logical (in case of Opera for the header in the top headers too and it's not so logical), but Blink don't do that.

Sorry for my bad English.


пятница, 1 сентября 2017 г., 3:46:03 UTC+3 пользователь Jian Li написал:

Chris Harrelson

unread,
Dec 21, 2017, 9:09:16 PM12/21/17
to sieme...@gmail.com, blink-dev
Hi,

Thank you for the feedback. Would you mind repeating it here and continuing discussion in that forum? That way the comments and insights are not lost.

Regards,
Chris

--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+unsubscribe@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/bb81976d-bd36-4af1-9d2c-68799f1cf048%40chromium.org.

Jian Li

unread,
Dec 22, 2017, 4:42:36 PM12/22/17
to Chris Harrelson, sieme...@gmail.com, blink-dev
Reply all
Reply to author
Forward
0 new messages