BUG: GMail adds spaces in UTF-8 encoded "Subject:"

1,145 views
Skip to first unread message

simos

unread,
Dec 7, 2004, 9:01:27 PM12/7/04
to Gmail...@googlegroups.com
I have noticed that GMail adds spaces every 10-15 characters in the
"Subject:" line if it contains Utf-8 encoded characters.

Could someone else verify with another language?

Simply send an e-mail to youself with a subject line in a non-latin
(for example non iso-8859-1) character as with

Subject: AAAAAAAAAAAAAAAAAAAAAAAAAAAAA (replace "A" with character of
choice).

The result (bug) should be like

Subject: AAAAAAAAAAAA AAAAAAAAAAAAA AAAA

I have reported this to GMail as bug #17701124.
I have verified this bug for Greek.
Could you please try to verify and report back here?

Klaus Alexander Seistrup

unread,
Dec 8, 2004, 2:03:38 AM12/8/04
to gmail...@googlegroups.com
It's a normal "bug" in many mail clients. The reason is probably that
the subject line is QP encoded partially, and perhaps Gmail doesn't
"assemble" the line properly before decoding from Quoted Printable.
What does the subject line look like if you choose "show original"?
Is it QP encoded?

I haven't tried in Gmail, but I've seen the same happening in other
mail client using QP encoded iso-8859-1.

Cheers,

// Alex
--
Klaus Alexander Seistrup
Copenhagen · Denmark

simos

unread,
Dec 8, 2004, 8:55:46 AM12/8/04
to Gmail...@googlegroups.com
The subject field, when viewing the raw e-mail content, looks like:

Subject: =?UTF-8?B?zrHOsc6xzrHOsc6xzrHOsc6xzrHOsc6xzrHOsc6x?=
=?UTF-8?B?zrHOsc6xzrHOsc6xzrHOsc6xzrHOsc6xzrHOsc6xzrE=?=
=?UTF-8?B?zrHOsc6xzrHOsc6xzrHOsc6xzrHOsc6xzrHOsc6x?=
=?UTF-8?B?zrHOsc6xzrHOsc6xzrHOsc6xzrHOsc6xzrHOsc6xzrE=?=

Notice that the second and subsequent lines do not start at position
one (there is a space character). This is easy to understand so that
the continuation lines of the Subject are not parsed as new header
lines. It appears as GMail somehow does not eliminate those
first-column spaces but uses them.

simos

unread,
Dec 8, 2004, 9:19:25 AM12/8/04
to Gmail...@googlegroups.com
I have tried receiving the same e-mail with Evolution (Linux) and it
appeared ok.
Ximian Evolution does not have the QP bug that GMail currently has.

Some testing. The double quotes are mine.

A. The original subject is
"Subject:
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"

B. That's how the Subject appears on GMail:
"Subject: aaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaa aaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaa"

C. If you send an e-mail from GMail to a Evolution user (non GMail
account, not UTF processing..), it appears ok. The raw Subject is:
"Subject: =?UTF-8?B?zrHOsc6xzrHOsc6xzrHOsc6xzrHOsc6xzrE=?=
=?UTF-8?B?zrHOsc6xzrHOsc6xzrHOsc6xzrHOsc6xzrHOsQ==?=
=?UTF-8?B?zrHOsc6xzrHOsc6xzrHOsc6xzrHOsc6xzrHOsQ==?=
=?UTF-8?B?zrHOsc6xzrHOsc6xzrHOsc6xzrHOsc6xzrHOsQ==?="

D. GMail processes the subject line when storing the mail in a way that
damages it. It appears to decode (wrongly) and then encode again.


"Subject: =?UTF-8?B?zrHOsc6xzrHOsc6xzrHOsc6xzrHOsc6xzrHOsc6x?=
=?UTF-8?B?zrHOsc6xzrHOsc6xzrHOsc6xzrHOsc6xzrHOsc6xzrE=?=
=?UTF-8?B?zrHOsc6xzrHOsc6xzrHOsc6xzrHOsc6xzrHOsc6x?=
=?UTF-8?B?zrHOsc6xzrHOsc6xzrHOsc6xzrHOsc6xzrHOsc6xzrE=?="

E. If you send such a wrongly decoded and then re-encoded e-mail to a
non-Gmail account (for example, go to "Sent mail" and forward it), it
looks like:
Subject: "=?UTF-8?B?zrHOsc6xzrHOsc6xzrHOsc6xzrHOsc6xIM6xzrHOsSA=?=
=?UTF-8?B?zrHOsc6xzrHOsc6xzrHOsc6xzrHOsc6xzrHOsSDOsc6x?=
=?UTF-8?B?IM6xzrHOsc6xzrHOsc6xzrHOsc6xzrHOsc6xzrEgzrE=?=
=?UTF-8?B?IM6xzrHOsc6xzrHOsc6xzrHOsc6xzrHOsc6xzrHOsc6x?="
It appears the GMail adds more spaces in this version (inconclusive).

Klaus Alexander Seistrup

unread,
Dec 8, 2004, 9:37:13 AM12/8/04
to Gmail...@googlegroups.com
Exactly, and it will happen to all subject lines with QP- or
Base64-encoded chars using multiple lines.

simos

unread,
Dec 10, 2004, 10:11:36 AM12/10/04
to Gmail...@googlegroups.com
Just got an e-mail from the GMail team that they are working on this.
Yeah!

Tomi Häsä

unread,
Dec 11, 2004, 9:59:22 AM12/11/04
to Gmail...@googlegroups.com
simos wrote:
>
> I have noticed that GMail adds spaces every 10-15 characters in the
> "Subject:" line if it contains Utf-8 encoded characters.
>
> Could someone else verify with another language?
>
> Simply send an e-mail to youself with a subject line in a non-latin
> (for example non iso-8859-1) character as with
>
> Subject: AAAAAAAAAAAAAAAAAAAAAAAAAAAAA (replace "A" with character
of
> choice).
>
> The result (bug) should be like
>
> Subject: AAAAAAAAAAAA AAAAAAAAAAAAA AAAA

I noticed the same kind of behaviour when I posted test messages with
different character encodings with different languages with Google
Groups web interface. For example, the problem didn't occur with
ISO-8859-1 and EUC-CN encodings, but it did occur with UTF-8 and
GB2312 encodings. See these test postings (as you can see, some of
the subjects contain added spaces, some don't):

http://groups.google.co.uk/groups?num=100&hl=en&lr=&scoring=r&as_drrb=q&q=test+author%3Atom...@gmail.com

This is OK (Finnish/ISO-8859-1):

http://groups.google.co.uk/groups?selm=ci7553%24bna%40odah37.prod.google.com&output=gplain

This isn't OK (Finnish/UTF-8):

http://groups.google.co.uk/groups?selm=297507af.0409032139.40867775%40posting.google.com&oe=UTF-8&output=gplain

This is OK (Chinese/EUC-CN):

http://groups.google.co.uk/groups?q=test+author:tomi.hasa%40gmail.com&hl=en&lr=&scoring=r&selm=297507af.0409050820.6cc60473%40posting.google.com&rnum=11

This isn't OK (Chinese/GB2312):

http://groups.google.co.uk/groups?q=test+author:tomi.hasa%40gmail.com&hl=en&lr=&scoring=r&selm=297507af.0409052328.4c57a619%40posting.google.com&rnum=8

Reply all
Reply to author
Forward
0 new messages