Removing libmime functionality

Joshua Cranmer

unread,

Jan 15, 2013, 12:38:38 AM1/15/13

to

libmime, as you may or may not know, is a very old module. Comments
indicate that its original genesis is at least early 1996, and it has no
references to RFC 2045 but a smattering to RFC 1521 and 1341, even older
variations of MIME. Time has moved in the 17 years since its genesis,
and I think it's worth considering removing functionality and features
that appear to be almost nonexistent in the modern email world. In my
rewrite of libmime in JS, I am not planning on providing support for
these features at all, and I am willing to countenance wholesale removal
in the current C implementation.

The candidate features are:
* multipart/appledouble - The last bug filed here was filed at the
beginning of 2011, where it was noted that the we appear to be mangling
the output data, but no progress has been observed on the bug, and it's
mentioned that few apps even bother with it. I do recall seeing
appledouble crop up in earlier code archaeology as being a potential
security hole.
* x-sun-attachment - I see no open bugs mentioning this. And the name
does not even follow any semblance of MIME rules. i suspect this has
been unused for a long time.
* BinHex support - There's an open bug on this not working correctly on
mac (but apparently on Linux), filed in 2006 with very little indication
if anyone still cares about it not working.
* text/enriched and text/richtext - The RFCs here actually go so far as
to say that they are a temporary solution. We don't generate these
emails (there is, unsurprisingly, an open bug here, filed in 2006, about
doing so), and we don't even parse them properly: we ignore features
(the bug to implement these was WONTFIX'd in 2008), and one of the
translations is to an HTML tag that hasn't been handled by gecko...
since 2002 (with no one apparently noticing this issue).

I also have lower-level technical features whose utility I am dubious of:
* The forceCharset parameter to RFC 2047 decoding (which overrides the
charset declared in the =? ?= tokens).
* Split and Header display parameters to nsIMimeStreamConverter
* text/xml and text/plain MIME emitters
* rot13, headers= [note: this is not header=] magic MIME URL parameters
* attempts to parse pre-libmime part numbers

If you disagree with removing support for these features, please speak
up. Also speak up if you have any other questions, comments, inquiries,
or concerns.

Axel Hecht

unread,

Jan 15, 2013, 3:01:47 AM1/15/13

to

My concern would be the ability of reading archived data.

What happens after this change if you hit an email that uses these
features? Worse still, could there be emails relying on bugs in our impl?

Axel

Arivald

unread,

Jan 15, 2013, 4:01:36 AM1/15/13

to

W dniu 2013-01-15 06:38, Joshua Cranmer pisze:

> libmime, as you may or may not know, is a very old module. Comments
> indicate that its original genesis is at least early 1996, and it has no
> references to RFC 2045 but a smattering to RFC 1521 and 1341, even older
> variations of MIME. Time has moved in the 17 years since its genesis,
> and I think it's worth considering removing functionality and features
> that appear to be almost nonexistent in the modern email world. In my
> rewrite of libmime in JS, I am not planning on providing support for
> these features at all, and I am willing to countenance wholesale removal
> in the current C implementation.

Why JS? Emails with attachments could be large, are You sure JS
performance is enough?

I think it will be better to write it in C++, and move most heavy
processing off main thread. As I remember, all JS is executed in main
thread, and there is already too much problems with it (infamous TB
"pauses"...)

Also it will be good to make it "pluggable". I mean allow to add support
for any MIME through some interfaces. This will allow anyone to add any
old or new MIME feature support. Best is plug-in could be written in JS.

--
Arivald

Jonathan Protzenko

unread,

Jan 15, 2013, 4:44:14 AM1/15/13

to Joshua Cranmer, dev-apps-t...@lists.mozilla.org

First of all, congratulations on your heroic efforts for getting rid of
libmime. Having hacked on it significantly, I think I can truly
appreciate your work :).

A great deal of complexity stems from the fact that libmime has
provisions for old, buggy email clients; I remember reading some
comments about special-casing for one special version of Navigator that
used to send malformed emails... so yay for removing support for old
emails!

A few questions.
- What is the transition plan? One thing we could do is, whenever gloda
indexes a message, have it decoded both by your library and the original
libmime, and see whether the two disagree. That would be a good test for
your library, and it would only affect the jsmimeemitter, not the
regular message display component.
- Have you talked this over with Patrick Brunschwig (Enigmail author)?
There are some people out there who definitely need to be able to plug
in your infrastructure to provide support for extra mime parts.
- Does your new parser create a MimeMessage as in
mailnews/db/gloda/modules/mimemsg.js?
- If so, do you have plans for creating a MimeMessage → HTML renderer?

Thanks for all your efforts!

jonathan

> _______________________________________________
> dev-apps-thunderbird mailing list
> dev-apps-t...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-apps-thunderbird

Joshua Cranmer

unread,

Jan 15, 2013, 11:00:10 AM1/15/13

to

On 1/15/2013 3:44 AM, Jonathan Protzenko wrote:
> - What is the transition plan? One thing we could do is, whenever gloda
> indexes a message, have it decoded both by your library and the original
> libmime, and see whether the two disagree. That would be a good test for
> your library, and it would only affect the jsmimeemitter, not the
> regular message display component.

There are approximately 5 distinct entry points to libmime:
- nsIMimeHeaders (an XPCOM wrapper around the MimeHeaders, er, struct)
- nsIMimeConverter (effectively an exposed RFC 2047 encoding/decoding
library, with other noscript methods I have a patch to replace with a
more natural C++ API)
- nsIMsgHeaderParser (effectively a parser for To: and related headers)
- the stream converter
- Gloda

The first three interfaces I plan to unconditionally replace with my
implementation, and I already have WIPs for two of them. It turns out
that we actually have pretty decent coverage of these interfaces in
tests, so passing our test suite on these interfaces makes me relatively
confident in my implementation. For gloda and the stream converter, my
plan is to provide a transition by providing alternate implementations
that can be controlled with a preference. My original goal was to have a
prototype in the tree by the time we ship the TB 24 branch, but it looks
like I will slip that schedule.

There are also other places where people should be using libmime but
aren't because, well, you can't, and as a result code up their own
ad-hoc parsers; my plan is to switch these as what I have works. The
cases I can think of off the top of my head are the fakeservers,
nsMsgBodySearch, and nsMsgDBFolder::GetMessageTextFromStream.

> - Have you talked this over with Patrick Brunschwig (Enigmail author)?
> There are some people out there who definitely need to be able to plug
> in your infrastructure to provide support for extra mime parts.

No, not yet--I have not yet prototyped this stage, but the needs of
Enigmail are one of the factors that drive my design decisions.

> - Does your new parser create a MimeMessage as in
> mailnews/db/gloda/modules/mimemsg.js?

No. Gloda's model of a mime message is similar to my own, but it is not
quite sufficient for my needs, and I need to investigate how gloda deals
with some of libmime's magic better [1], particularly for the case of
uuencode and yenc message bodies. Generating mimemsg.js from my own mime
parser takes fewer than 100 lines of code at present.

> - If so, do you have plans for creating a MimeMessage → HTML renderer?

My plans are to cleanly divide the MIME parser into three separate stages:
1. MIME structure parser
2. Mime structure -> body and attachments view
3. Actually displaying a message in the UI

(although keeping 2 and 3 separate is proving harder than anticipated)

[1] One key example here is that gloda lets libmime convert some parts
to HTML but not others--text/enriched being the dominant example
here--which is actually what prompted my featurectomy proposal in the
first place.

Joshua Cranmer

unread,

Jan 15, 2013, 11:40:25 AM1/15/13

to

On 1/15/2013 2:01 AM, Axel Hecht wrote:
> My concern would be the ability of reading archived data.
>
> What happens after this change if you hit an email that uses these
> features? Worse still, could there be emails relying on bugs in our impl?

text/enriched would degrade to text/plain (so you would see the
formatting tags), but I suspect that whatever little use it now sees is
mostly paired with a text/plain or text/html in multipart/alternatives,
which we would prefer over it. I'm not as familiar with the other
formats, but multipart/appledouble would have a spurious second
attachment for the resource fork. We don't appear to even decode binhex
anymore (it may have been partially removed some time ago), so people
may not even notice a loss there compared to current versions. In the
case of x-sun-attachment, we would probably just show something that
would be akin to looking at MIME in a non-MIME compliant email messenger.

Joshua Cranmer

unread,

Jan 15, 2013, 11:41:20 AM1/15/13

to

There are several reasons to prefer JS:
1. This allows for better future-proofing of our code with respect to
changes in Gecko. Web-compatible APIs are much less likely to change
under us than internal XPCOM APIs, and also more likely to see
performance improvements.
2. Writing JS that can run with content privileges allows us to share
this code with Gaia.
3. JS provides much more flexible APIs for several components, in
particular easier string processing and easy-to-use hashtables.
4. It is actually easier to use multiple threads in JS than it is in
C++, since C++ tempts you to use main-thread-only XPCOM interfaces. For
example, libmime presently reads about 50 preferences in several
different places, which cannot be done off the main thread [1].

I have given a great deal of thought in how to design the JS MIME
parser, and key goals do include extensibility and minimizing
unnecessary translation work.

[1] Libmime is not the reason we must do everything on the main thread.
That reason is actually the database, which is a very inherently
single-threaded implementation and whose use is very pervasive in the
backend. That said, this implementation may be enough to let us do
various indexing tasks off the main thread.

Jonathan Kamens

unread,

Jan 18, 2013, 11:34:32 AM1/18/13

to

Would love teo see us get rid of libmime.

Is the text/plain emitter used when a message has only HTML
and the user sets View | Message Body As | Plain Text? If so,
how will that behave if the text/plain emitter is gone?

Joshua Cranmer

unread,

Jan 19, 2013, 12:47:08 PM1/19/13

to

That part of the original post was more directed at developers and addon
authors, so the terminology may be confusing. The emitters live in the
offshoot directory mime/emitters/src and are the bridge between the
internal libmime functionality and the actual output mechanisms. Message
display (and most libmime functionality, in fact) uses the HTML emitter;
View | Message Body As | Plain Text affects which converter gets used to
translate a text/plain part.