Non-ascii characters

38 views
Skip to first unread message

Henrik Frisk

unread,
Jun 3, 2015, 6:36:02 AM6/3/15
to mu-di...@googlegroups.com
Hi all,

Now and then I have trouble displaying non-ascii chars correctly. For example: "för igår" comes out as "för igÃ¥r".

How can I make mu4e/emacs decode this correctly?

/Henrik

Joost Kremers

unread,
Jun 4, 2015, 3:44:06 AM6/4/15
to mu-di...@googlegroups.com
What's happening here is that the email is utf-8 encoded, while Emacs /
mu4e thinks it's latin-1. Since Emacs and mu4e tend do deal well with
encodings, I suspect the problem is that the relevant emails aren't
announcing their encoding correctly. An Email that's encoded as utf-8
should have a header announcing this, e.g.:

,----
| Content-Type: text/plain; charset=UTF-8
`----

Check the raw message (type `.` in the *mu4e-view* buffer, `q` to get
back to the normal display) and see if there is a Content-Type header
and what it says.


--
Joost Kremers
Life has its moments

Henrik Frisk

unread,
Jun 29, 2015, 4:57:11 PM6/29/15
to mu-di...@googlegroups.com

There are two content-type headers in one of these problematic mails:

Content-Type: multipart/alternative; boundary="Apple-Mail=_7FC14FEE-F99A-4FD7-939B-5FA4BDC4F1B8"

and

Content-Type: text/plain;
    charset=windows-1252

--
You received this message because you are subscribed to the Google Groups "mu-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mu-discuss+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Joost Kremers

unread,
Jun 30, 2015, 5:24:38 AM6/30/15
to mu-di...@googlegroups.com

On Mon, Jun 29 2015, Henrik Frisk <fri...@gmail.com> wrote:
> There are two content-type headers in one of these problematic mails:
>
> Content-Type: multipart/alternative;
> boundary="Apple-Mail=_7FC14FEE-F99A-4FD7-939B-5FA4BDC4F1B8"
>
> and
>
> Content-Type: text/plain;
> charset=windows-1252

That's the culprit right there. If the email is utf8-encoded, which it
obviously is (or at least the part that you are seeing is), then this
line is simply wrong. Whoever is sending you these emails should fix (or
upgrade) their mail app.

Henrik Frisk

unread,
Jun 30, 2015, 4:02:36 PM6/30/15
to mu-di...@googlegroups.com
Right, I can see that is the problem in some cases. Thing is, it's happening a little too often. Also, it seems to me that messages are sometime displayed correctly and other times not, but I'm not sure about that part. I can't reproduce it right now.

I switched from mh-e in April and I didn't have the problem with mh-e. I will dig up more examples and wee if I can find a pattern.

/Henrik

Olaf Meeuwissen

unread,
Jun 30, 2015, 7:39:19 PM6/30/15
to mu-di...@googlegroups.com
I've had my share of non-ASCII (iso-2022-jp) trouble and found it
helpful to extract the text/plain MIME part and feed it to iconv to
convert it from whatever charset was given to, say, UTF-8. It will
"choke" on the first encoding violation it sees.

You can use reformime(1) to extract the MIME part.

In my case it turned out that the mail sender's software thinks it's
okay to use private extensions to the declared charset. So it was
really iso-2022-jp+MS extensions (sometimes known as iso-2022-jp-ms and
TTBOMK not supported by emacs). These mails still display as raw ASCII
with mu4e but switching to raw display (with '.') it turns into readable
Japanese except for characters that fall in the extensions (mostly just
things like ①).

# The above is really a lot like the Shift_JIS situation where both MS
# and the Mac have added some extensions. The MS flavour is so common
# that the W3C even recommends that browsers treat Shift_JIS as the MS
# flavour. Go figure.
# See http://www.w3.org/TR/encoding/#shift_jis

Hope this helps,
--
Olaf Meeuwissen, LPIC-2 FLOSS Engineer -- AVASYS CORPORATION
FSF Associate Member #1962 Help support software freedom
http://www.fsf.org/jf?referrer=1962
Reply all
Reply to author
Forward
0 new messages