Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

attachments filename encoding problem

1,561 views
Skip to first unread message

fix

unread,
May 31, 2012, 11:47:29 PM5/31/12
to support-t...@lists.mozilla.org
Hello,

I've faced the problem forwarding messages which include file
attachments with russian letters in filename (perhaps this would also
affect other non-american) languages. The problem is as follows:

The message is in html utf-8 afai understand:

-------- QUOTE FROM THE MESSAGE --------

--=_mixed 003E1A2744257A0F_=
Content-Type: multipart/alternative; boundary="=_alternative
003E1A2744257A0F_="


--=_alternative 003E1A2744257A0F_=
Content-Type: text/plain;
charset=UTF-8
Content-Transfer-Encoding: base64

-------- END QUOTE FROM THE MESSAGE --------

it also contains attachment in koi8-r:

-------- QUOTE FROM THE MESSAGE --------

--=_alternative 003E1A2744257A0F_=--
--=_mixed 003E1A2744257A0F_=
Content-Type: application/zip; name="=?KOI8-R?B?8NLJ18XUICEuemlw?="
Content-Disposition: attachment; filename="=?KOI8-R?B?8NLJ18XUICEuemlw?="
Content-Transfer-Encoding: base64

-------- END QUOTE FROM THE MESSAGE --------

ok I see the message attached. now I need to forward the message. The
attachment filename becomes unreadable, the text itself in message is
readable:

-------- QUOTE FROM THE MESSAGE --------

Content-Type: application/zip;
name="=?UTF-8?B?77+9?="
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
filename*=UTF-8''%EF%BF%BD

-------- END QUOTE FROM THE MESSAGE --------

I've put the messages to pastebin perhaps it's better than quoting them
here. It could also be a big in TB but I would like to see some feedback
from the user list before writing to developers. Maybe it's not a bug.
here are the pastebins:
http://pastebin.com/AfFdp3vE original message
http://pastebin.com/NUwQH7b4 resent message (taken from sent folder in TB)
http://pastebin.com/FcrA42yf resent message (received at gmail.com)

Thanks!

Michael A. Puls II

unread,
Jun 1, 2012, 2:59:59 AM6/1/12
to support-t...@lists.mozilla.org
Note that Thunderbird uses RFC2231 rules for encoding the filename param
(including "Parameter Value Continuations" and its "Parameter Value
Character Set and Language Information" rules) in the
Content-Disposition header.

However, for compatibility with some clients like Outlook, Outlook
Express, Pine, email_to_text_message services (like num...@vtext.com)
and other clients that don't support RFC2231, Thunderbird uses RFC2047
rules for encoding the name param in the Content-Type header so the
filename for the attachment comes out right (IE doesn't support RFC2231
in HTTP header either, even though other browsers do). So, if the
filename is just short, simple ascii, it'll be just filename="ascii".
But, if it's long and or contains non-ascii characters, it might be
encoded as base64 or quoted-printable for example using the methods in
RFC2047.

In the config editor, I think there's an option to control this to
always using RFC2231, but not sure if it's still there.

Opera does the same as Thunderbird pretty much.

Sylpheed has an option (off by default) to use RFC2231 instead of 822/2047.

On 5/31/2012 11:47 PM, fix wrote:
> Content-Type: application/zip;
> name="=?UTF-8?B?77+9?="
> Content-Transfer-Encoding: base64
> Content-Disposition: attachment;
> filename*=UTF-8''%EF%BF%BD

If the attachment's filename is supposed to be "Привет !.zip" (guessing
by looking at the links you posted) for example, then, it should be:

Content-Type: application/zip; name="=?UTF-8?B?0J/RgNC40LLQtdGCICEuemlw?="
Content-Disposition: attachment;
filename*=UTF-8''%D0%9F%D1%80%D0%B8%D0%B2%D0%B5%D1%82%20!.zip

that ends up in the outgoing message.

%EF%BF%BD (and the base64 version of that in the content-type name
param) looks like replacement characters or something as if Thunderbird
had a problem understanding/converting the Russian to UTF-8 when
reply/forwarding.

Don't know if that helps any, but...

--
Michael


ishikawa

unread,
Jun 13, 2012, 11:22:13 PM6/13/12
to
I noticed a filename encoding problem years ago with Japanese character
filename between thunderbird and a mailer on Mac:
According to
http://en.wikibooks.org/wiki/Perl_Programming/Unicode_UTF-8

"replacement character" is
This is Unicode's "replacement character" (codepoint U+FFFD), which is used
to indicate when a Unicode parser (such as a browser) was not able to decode
a stream of Unicode encoded data. The problem is likely an encode/decode
problem somewhere in the chain. (U+FFFD encodes to EF BF
BD in UTF-8.)

So there is something broken in TB.
The original poster didn't say which version, but
it may be worth trying the latest version to see if the problem persists and
then, if so, submit a bugzilla report.


0 new messages