Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[VM] Incorrectly encoded non-ASCII headers

2 views
Skip to first unread message

Yeechang Lee

unread,
Mar 4, 2016, 9:52:20 PM3/4/16
to VM mailing list
(Disclaimer: I am on Emacs 23 and VM 8.1.2.)

I sometimes receive messages in which headers--usually the subject
line--uses non-ASCII characters without quoting them as per RFC
2047. (Wikipedia's article-of-the-day mailing list is a frequent
offender.)

I realize the best solution is to have the sender change its ways and
emit standards-adhering messages, but in the meanwhile, could VM gain
the ability to assume that the body's encoding style in a message also
applies to the headers? An alternative would be to assume that headers
are 8-bit clean unless RFC 2047-style quoting appears. (Can either be
done on our own with some elisp in the meanwhile, I wonder? I wouldn't
want the message itself modified; just the presentation buffer.)

Uday Reddy

unread,
Mar 5, 2016, 3:01:29 PM3/5/16
to Yeechang Lee, VM mailing list
What character set are the headers in?

Uday

Yeechang Lee

unread,
Mar 5, 2016, 3:16:54 PM3/5/16
to VM mailing list
Uday Reddy <usr.vm...@gmail.com> says:
> What character set are the headers in?

From Wikipedia:
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64

From The New York Times:
Content-Type: text/html; charset=utf-8; format=flowed
Mime-version: 1.0

From Open Library
Content-Disposition: inline
Content-Type: text/plain; charset=UTF-8
MIME-Version: 1.0

From Soap.com (subject incorrectly displayed in summary, but
correctly displayed in presentation buffer. Only one in which Unicode
is not involved.):
MIME-Version: 1.0
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

I can send the messages as attachments off-list if helpful.

PS - On reflection, of the choices I earlier mentioned:

> could VM gain the ability to assume that the body's encoding style
> in a message also applies to the headers? An alternative would be to
> assume that headers are 8-bit clean unless RFC 2047-style quoting
> appears

I suspect that the second is preferable, as amended:

> Assume that human-readable headers are 8-bit clean and in UTF-8
> unless RFC 2047-style quoting appears

Setting aside cases in which the encoding header is incorrect, I'm
pretty sure I've seen messages in which header and body encodings
don't match.

Perhaps the above should apply to the body as well? (Or does it
already? I have one in my folder of test messages I've received over
the years in which Content-Type: only says "text/plain;", nothing
else, despite Unicode in the body. VM displays the message correctly
in Emacs 23.)

--
geo:37.783333,-122.416667

0 new messages