Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

MIME header question

2 views
Skip to first unread message

Adam H. Kerman

unread,
Jan 17, 2022, 2:21:26 PM1/17/22
to
I asked in another newsgroup.

Eduardo, in your interpretation of the RFCs is declaring 7 bit on
Content Transfer Encoding in conflict with declaring UTF-8 as the
character set?

Logically it seems to me that the two headers should be set jointly and
not UTF-8 without the use of non-ASCII characters if transfer encoding
is marked as 7 bit.

pine/alpine have always parsed for the lowest denomination character set
despite the user's settings. If there are no non-ASCII characters, then
the character set marking is US-ASCII and transfer encoding 7 bit.

I don't know of another client that performs that parsing.

Eduardo Chappa

unread,
Jan 17, 2022, 3:15:22 PM1/17/22
to
On Mon, 17 Jan 2022, Adam H. Kerman wrote:

> Eduardo, in your interpretation of the RFCs is declaring 7 bit on
> Content Transfer Encoding in conflict with declaring UTF-8 as the
> character set?

Dear Adam,

I do not think there is a conflict here. Let me say it in a different way.
The Content-Tranfer-Encoding here just tells you how to process the data.
If could have other values, such as base64, or quoted-printable, so the
value tells you what to do with the data. In the case of 7 bit just
interpret that 7 bit in the charset, in this case utf-8, which actually
means US-ASCII. In other words

7bit intersected with utf-8 = us-ascii,

so you could write us-ascii for the charset in this case, or utf-8. It
seems more like a question of style, not of correctness.

Having said that, I prefer to use us-ascii in this case because more
clients are likely to understand us-ascii instead of utf-8. Alpine did not
get utf-8 handling until very late, while many other clients understood
utf-8, so it was better for pine users to receive a message 7bit in
us-ascii than 7-bit in utf-8, because Pine could not handle the latter.

I doubt that there are Pine users still out there (although I can always
be proven wrong) but it is better to be conservative here in my opinion.

--
Eduardo
https://tinyurl.com/yc377wlh (web)
http://repo.or.cz/alpine.git (Git)

Adam H. Kerman

unread,
Jan 17, 2022, 4:49:51 PM1/17/22
to
Eduardo Chappa <cha...@washington.edu> wrote:
>On Mon, 17 Jan 2022, Adam H. Kerman wrote:

>>Eduardo, in your interpretation of the RFCs is declaring 7 bit on
>>Content Transfer Encoding in conflict with declaring UTF-8 as the
>>character set?

>I do not think there is a conflict here. Let me say it in a different way.
>The Content-Tranfer-Encoding here just tells you how to process the data.
>If could have other values, such as base64, or quoted-printable, so the
>value tells you what to do with the data. In the case of 7 bit just
>interpret that 7 bit in the charset, in this case utf-8, which actually
>means US-ASCII. In other words

> 7bit intersected with utf-8 = us-ascii,

>so you could write us-ascii for the charset in this case, or utf-8. It
>seems more like a question of style, not of correctness.

Thanks. This is why I asked you. I thought 7 bit was about the
communication channel and not the capabilities of the client and display
on the other end.

If the display interprets MIME headers, does that mean the same 7-bit
character is displayed ignoring the eighth bit or two characters are
displayed in a UTF-8 double byte character? All this time, when my
terminal emulation translation didn't match what was received (I have to
change it manually), I thought I was changed the assumed character set,
not the transfer encoding toggle.

>Having said that, I prefer to use us-ascii in this case because more
>clients are likely to understand us-ascii instead of utf-8. Alpine did not
>get utf-8 handling until very late, while many other clients understood
>utf-8, so it was better for pine users to receive a message 7bit in
>us-ascii than 7-bit in utf-8, because Pine could not handle the latter.

>I doubt that there are Pine users still out there (although I can always
>be proven wrong) but it is better to be conservative here in my opinion.

I certainly agree with you.

Eduardo Chappa

unread,
Jan 17, 2022, 6:29:26 PM1/17/22
to
On Mon, 17 Jan 2022, Adam H. Kerman wrote:

> Thanks. This is why I asked you. I thought 7 bit was about the
> communication channel and not the capabilities of the client and display
> on the other end.
>
> If the display interprets MIME headers, does that mean the same 7-bit
> character is displayed ignoring the eighth bit or two characters are
> displayed in a UTF-8 double byte character? All this time, when my
> terminal emulation translation didn't match what was received (I have to
> change it manually), I thought I was changed the assumed character set,
> not the transfer encoding toggle.

Dear Adam,

I never used the word display to refer to how the message actually
displays on the screen. The headers tell the client what to do internally.
For example, if the content-transfer-encoding were base64, then this tells
the client to decode the encoded blob. Same with 7bit. It just tells to
interpret the 7 bit it finds in the given charset. This will become a
character on screen later on.

I have to acknowledge that I do not understand completely what you are
saying. There is no "transfer encoding toggle" in Alpine, nor there is a
"assumed character set", so I am not exactly sure what you are referring
to, but if I understand you correctly, you are asking what happens to
multibyte characters. Unless you make changes to the default configuration
in Alpine, Alpine will send to the terminal utf-8 codes, which the
terminal will display if it is utf-8 capable. Do you have Alpine and our
terminal configured differently?

John Levine

unread,
Jan 17, 2022, 6:36:02 PM1/17/22
to
It appears that Eduardo Chappa <cha...@washington.edu> said:
>On Mon, 17 Jan 2022, Adam H. Kerman wrote:
>
>> Eduardo, in your interpretation of the RFCs is declaring 7 bit on
>> Content Transfer Encoding in conflict with declaring UTF-8 as the
>> character set?

I'm not Eduardo, but it's clearly not valid. RFC 2045 says

An encoding type of 7BIT requires that the body
is already in a 7bit mail-ready representation.

Needless to say, UTF-8 is not 7bit mail-ready. I can believe that
some mail programs have tried to make sense of this, but it's utterly
ad-hoc and whatever they do with it is wrong. Maybe stuff declared to
be UTF-8 is in fact just ASCII in a particular message, but I wouldn't
count on it.

> I doubt that there are Pine users still out there (although I can always
> be proven wrong) but it is better to be conservative here in my opinion.

Probably not, although there are plenty of us Alpine users.

R's,
John


--
Regards,
John Levine, jo...@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

Adam H. Kerman

unread,
Jan 17, 2022, 10:52:59 PM1/17/22
to
Eduardo Chappa <cha...@washington.edu> wrote:
>On Mon, 17 Jan 2022, Adam H. Kerman wrote:

>>Thanks. This is why I asked you. I thought 7 bit was about the
>>communication channel and not the capabilities of the client and display
>>on the other end.

>>If the display interprets MIME headers, does that mean the same 7-bit
>>character is displayed ignoring the eighth bit or two characters are
>>displayed in a UTF-8 double byte character? All this time, when my
>>terminal emulation translation didn't match what was received (I have to
>>change it manually), I thought I was changed the assumed character set,
>>not the transfer encoding toggle.

> I never used the word display to refer to how the message actually
>displays on the screen. The headers tell the client what to do internally.
>For example, if the content-transfer-encoding were base64, then this tells
>the client to decode the encoded blob. Same with 7bit. It just tells to
>interpret the 7 bit it finds in the given charset. This will become a
>character on screen later on.

> I have to acknowledge that I do not understand completely what you are
>saying. There is no "transfer encoding toggle" in Alpine,

Sorry to be unclear. I just meant that the standard allows a choice of
encoding schemes, as you've been discussing.

>nor there is a "assumed character set",

The user can name a character set in .pinerc. Isn't that for the composer
as well as the display? If there are no non-ASCII characters, the MIME
header declares ASCII no matter how the user set this feature.

I liked the fact that alpine declares a lowest denomination character
set.

>so I am not exactly sure what you are referring
>to, but if I understand you correctly, you are asking what happens to
>multibyte characters. Unless you make changes to the default configuration
>in Alpine, Alpine will send to the terminal utf-8 codes, which the
>terminal will display if it is utf-8 capable. Do you have Alpine and our
>terminal configured differently?

I usually have to change the translation between ISO-8859-1 and UTF-8
depending on what Usenet article I'm looking at. alpine isn't my
newsreader. Also, in followup, I liked to get rid of the nonprinting
characters; translation mismatch can make them visible. I post in ASCII
whenever possible.
0 new messages