Contact emails
Spec
Summary
Adds custom headers, X-Snapshot-Title and X-Snapshot-Content-Location, to saved MHTML pages in order to support sharing of MHTML pages.
Motivation
This allows the receiving party of the shared MHTML file to easily find out and show the basic info about the page. The existing MHTML headers are either not suitable to use (due to not supporting non-ASCII characters) or require additional parsing into multipart body.
Interoperability and Compatibility Risk
Low. New custom headers are being added, which should be skipped by the clients who could not recognize them.
We're considered using the existing MHTML headers but they can't be used because:
1) The Subject header does not work for title containing non-printable ASCII characters. Current version of Chrome and IE choose to output pure printable ASCII characters with all non-printable ASCII characters replaced by "?". Chrome and IE do not use this header when importing and loading MHTML pages, but some other MHTML clients do.
2) The Content-Location header is located in the 1st multipart section, which make the parsing and extraction of this header far more complicated than simply parsing the top headers.
Ongoing technical constraints
None
Will this feature be supported on all six Blink platforms (Windows, Mac, Linux, Chrome OS, Android, and Android WebView)?
Yes.
Entry on the feature dashboard
Summary
Adds custom headers, X-Snapshot-Title and X-Snapshot-Content-Location, to saved MHTML pages in order to support sharing of MHTML pages.
Motivation
This allows the receiving party of the shared MHTML file to easily find out and show the basic info about the page. The existing MHTML headers are either not suitable to use (due to not supporting non-ASCII characters) or require additional parsing into multipart body.
Interoperability and Compatibility Risk
Low. New custom headers are being added, which should be skipped by the clients who could not recognize them.
We're considered using the existing MHTML headers but they can't be used because:
1) The Subject header does not work for title containing non-printable ASCII characters. Current version of Chrome and IE choose to output pure printable ASCII characters with all non-printable ASCII characters replaced by "?". Chrome and IE do not use this header when importing and loading MHTML pages, but some other MHTML clients do.
2) The Content-Location header is located in the 1st multipart section, which make the parsing and extraction of this header far more complicated than simply parsing the top headers.
On Thursday, August 31, 2017 at 5:46:03 PM UTC-7, Jian Li wrote:Summary
Adds custom headers, X-Snapshot-Title and X-Snapshot-Content-Location, to saved MHTML pages in order to support sharing of MHTML pages.
I assume that in the long term we want other MHTML clients to be able to consume the data generated by Chrome. Therefore - how do we ensure that the new headers are eventually standardized? FWIW, I see a MHTML specification in https://tools.ietf.org/html/rfc2557 (but I don't know if there are other specs).
Motivation
This allows the receiving party of the shared MHTML file to easily find out and show the basic info about the page. The existing MHTML headers are either not suitable to use (due to not supporting non-ASCII characters) or require additional parsing into multipart body.
Interoperability and Compatibility Risk
Low. New custom headers are being added, which should be skipped by the clients who could not recognize them.
We're considered using the existing MHTML headers but they can't be used because:
1) The Subject header does not work for title containing non-printable ASCII characters. Current version of Chrome and IE choose to output pure printable ASCII characters with all non-printable ASCII characters replaced by "?". Chrome and IE do not use this header when importing and loading MHTML pages, but some other MHTML clients do.
From the description above it seems that other browsers ignore *both* the standard Subject header and the new X-Snapshot-Title header (when reading a MHTML file). Therefore, from the perspective of interoperability with other browsers it seems safe to have Chrome start using quoted-printable in the Subject header (rather than doing this in the new X-Snapshot-Title header as done in r496885).Are there other concerns with using the Subject header? You mentioned that other MHTML clients consume the Subject header - what is the behavior of these clients when quoted-printable encoding is used in the Subject header? If the other clients are broken when seeing quoted-printable encoding, doesn't this indicate a bug in these other clients?
2) The Content-Location header is located in the 1st multipart section, which make the parsing and extraction of this header far more complicated than simply parsing the top headers.
If we want to consume MHTML files produced by other MHTML clients (e.g. IE or Edge) then we would still have to fall back to exposing the original/standard Content-Location header from the 1st multi part section, right?
Ongoing technical constraints
None
Will this feature be supported on all six Blink platforms (Windows, Mac, Linux, Chrome OS, Android, and Android WebView)?
Yes.
Entry on the feature dashboard
--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/8c2d1623-da7a-4dcb-a8f2-b92cfb43d4fc%40chromium.org.
On Thursday, August 31, 2017 at 5:46:03 PM UTC-7, Jian Li wrote:Summary
Adds custom headers, X-Snapshot-Title and X-Snapshot-Content-Location, to saved MHTML pages in order to support sharing of MHTML pages.
I assume that in the long term we want other MHTML clients to be able to consume the data generated by Chrome. Therefore - how do we ensure that the new headers are eventually standardized? FWIW, I see a MHTML specification in https://tools.ietf.org/html/rfc2557 (but I don't know if there are other specs).
Motivation
This allows the receiving party of the shared MHTML file to easily find out and show the basic info about the page. The existing MHTML headers are either not suitable to use (due to not supporting non-ASCII characters) or require additional parsing into multipart body.
Interoperability and Compatibility Risk
Low. New custom headers are being added, which should be skipped by the clients who could not recognize them.
We're considered using the existing MHTML headers but they can't be used because:
1) The Subject header does not work for title containing non-printable ASCII characters. Current version of Chrome and IE choose to output pure printable ASCII characters with all non-printable ASCII characters replaced by "?". Chrome and IE do not use this header when importing and loading MHTML pages, but some other MHTML clients do.
From the description above it seems that other browsers ignore *both* the standard Subject header and the new X-Snapshot-Title header (when reading a MHTML file). Therefore, from the perspective of interoperability with other browsers it seems safe to have Chrome start using quoted-printable in the Subject header (rather than doing this in the new X-Snapshot-Title header as done in r496885).Are there other concerns with using the Subject header? You mentioned that other MHTML clients consume the Subject header - what is the behavior of these clients when quoted-printable encoding is used in the Subject header? If the other clients are broken when seeing quoted-printable encoding, doesn't this indicate a bug in these other clients?2) The Content-Location header is located in the 1st multipart section, which make the parsing and extraction of this header far more complicated than simply parsing the top headers.
If we want to consume MHTML files produced by other MHTML clients (e.g. IE or Edge) then we would still have to fall back to exposing the original/standard Content-Location header from the 1st multi part section, right?
Ongoing technical constraints
None
Will this feature be supported on all six Blink platforms (Windows, Mac, Linux, Chrome OS, Android, and Android WebView)?
Yes.
Entry on the feature dashboard
--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAOnL0js4iBMkR5xxfT_2UYo4iX2G3OPZoah-htUHzsK0bcaARw%40mail.gmail.com.
----Ongoing technical constraints
None
Will this feature be supported on all six Blink platforms (Windows, Mac, Linux, Chrome OS, Android, and Android WebView)?
Yes.
Entry on the feature dashboard
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/8c2d1623-da7a-4dcb-a8f2-b92cfb43d4fc%40chromium.org.
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAOnL0jsP%2BuKBckhZYsTpxdQpOcGg3mw3Jr2LgzQ_NGtHTe%2B1JA%40mail.gmail.com.
On Fri, Sep 1, 2017 at 3:33 PM, Dmitry Titov <dim...@chromium.org> wrote:On Fri, Sep 1, 2017 at 2:54 PM Jian Li <jia...@chromium.org> wrote:On Fri, Sep 1, 2017 at 10:27 AM, Łukasz Anforowicz <luk...@chromium.org> wrote:On Thursday, August 31, 2017 at 5:46:03 PM UTC-7, Jian Li wrote:
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAOnL0junZbZmjx0%2BvMLyAncpyCF%2BanaJdhKmwK139k70400-NQ%40mail.gmail.com.
Having the metadata at the beginning of MHTML as response headers allows simple import of metadata on the browser side (no need to load MHTML into renderer to parse).
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/8ceb38dd-3da3-495e-8cde-08abd314f47a%40chromium.org.
This is a tricky situation, since MHTML isn't web exposed in the usual way, but still we should expect that any useful+successful changes to the format would eventually be supported in all implementations.
Part of the problem here is that there is no currently maintained spec for MHTML (right?) but we need something by which others could implement the same thing without reverse engineering. If https://tools.ietf.org/html/rfc2557 is the closest thing to a spec but it leaves out many details, then an expedient option would be to write a spec that references RC 2557 and adds the necessary changes to serializer, parser and processing model. Delta specs and monkey patching is rightly frowned upon, but it still seems better than nothing.
Jian, Łukasz, what's your take on this?
P.S. Looks like there are zero tests related to MHTML in web-platform-tests, and they would have to be manual tests. A bug blocking this bug and explaining why it's not testable would suffice here.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/33174820-f46c-42e1-a9e6-54abcb5466df%40chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CACf%3DUnepik4OJtNSrgrTbLv%2BBa7_tCfqfCcQNVe6%3DwOR684btQ%40mail.gmail.com.
Based on all the feedback received, we have decided to:
- Drop X- prefix for new headers we plan to add.
The document has been updated to reflect all these changes. We will figure out where to publish the whole document which lists all specific MHTML behaviors in Chromium.
- We will not add new header for supporting non-printable ASCII title. Instead, Subject will be transformed to support this per RFC 2047 spec.
- Snapshot-Content-Location header will be added for main resource URL.
To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+unsubscribe@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAOnL0jvoQWwWpLvHnPiAqxOG%2Bzz1pex3Rj3a89NjtrgibjzooQ%40mail.gmail.com.
On Wed, Sep 6, 2017 at 4:45 PM, Jian Li <jia...@chromium.org> wrote:Based on all the feedback received, we have decided to:
- Drop X- prefix for new headers we plan to add.
These two headers?Snapshot-VersionSnapshot-Content-LocationI looked at the latest version of the doc, and it doesn't go into any detail on where the number in Snapshot-Version comes from or how it's updated.The document has been updated to reflect all these changes. We will figure out where to publish the whole document which lists all specific MHTML behaviors in Chromium.
- We will not add new header for supporting non-printable ASCII title. Instead, Subject will be transformed to support this per RFC 2047 spec.
- Snapshot-Content-Location header will be added for main resource URL.
Thanks for the detailed document investigating and explaining behavior.I would like to block this Intent on publishing this information with one of the standards bodies in github - WICG or WhatWG perhaps. That way the usual editing, comment and bug flows for specs can proceed for this one. There has also been a lot of good discussion on this thread about details, which is best captured in the github bug process.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAOnL0ju5rD4b2-RtcNSvnP_rCYMhTci9CkksFFVuTXDnxkzXug%40mail.gmail.com.
To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAOnL0jvoQWwWpLvHnPiAqxOG%2Bzz1pex3Rj3a89NjtrgibjzooQ%40mail.gmail.com.
--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org.
If a Content-Location header field is used in the heading of a multipart/related, this Content-Location SHOULD apply to the whole aggregate, not to its root part.I think it can mean, for example, we can have Content-Location: http://example.com/page/subpage, but the root header can be http://example.com/SPAengine.html if we deal with some kind of Single-Page Application that use HTML5 History API. (Yes, I know that Blink doesn't allow Javascript for MHTML pages, but other browsers do.)
--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+unsubscribe@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/bb81976d-bd36-4af1-9d2c-68799f1cf048%40chromium.org.