I'm working on replacing uconv with encoding_rs:
https://bugzilla.mozilla.org/show_bug.cgi?id=encoding_rs
encoding_rs is a newly-written implementation of the Encoding Standard
that is more correct than uconv, performs better (if compiled with
nightly Rust), supports both UTF-16 and UTF-8 as the in-memory Unicode
representation (without data table duplication) and supports both Rust
and C++ callers.
Since it appears that mailnews components aren't being written in
Rust, the C++ case is the one that is of relevance to mailnews. The
current state of the API can be seen in
https://hg.mozilla.org/try/file/0dfc31834877/intl/Encoding.h . (It's
best to scroll to mozilla::Encoding and read it first before reading
mozilla::Decoder and mozilla::Encoder.)
The change isn't imminent, because the new code is still orange on
try, is unreviewed and requires nightly Rust in order not to regress
performance. (The last point is the main source of schedule
uncertainty.) Still, I think it would make sense for Thunderbird
developers to assess the impact on mailnews code sooner than later, so
that the change doesn't come as a surprise.
Particular things of note are:
* The set of converters will no longer be extensible via XPCOM.
Therefore, mailnews will no longer be able to register UTF-7 via XPCOM
and expect be able to instantiate a UTF-7 decoder using
mozilla::dom::EncodingUtils::DecoderForEncoding(). Instead, UTF-7
handling will need to happen is a one-off special case outside the
converter framework.
* nsIUnicodeDecoder and nsIUnicodeEncoder will be replaced by
mozilla::Decoder and mozilla::Encoder, which have a similar but subtly
different API. The API changes address design flaws in the previous
API.
* When holding something that designates an encoding, it will be
preferable to hold const mozilla::Encoding* instead of holding an
nsACString containing the name of the encoding. (All instances of
mozilla::Encoding are static, so there's no need to refcount the
pointer.) If you need to designate a particular encoding at compile
time, there are constants of the form UTF_8_ENCODING for referring to
the encodings directly (as opposed to having to resolve a name or a
label at run time).
* When you have the entire input in nsAString or nsACString,
mozilla::Encoding provides non-streaming conversion methods that hide
the complexity of using mozilla::Decoder and mozilla::Encoder (and
also avoid having to malloc the converter by doing a stack allocation
in Rust instead).
* Since conversion to and from UTF-8 is supported, there is no need to
pivot through UTF-16 when converting from an arbitrary encoding to
UTF-8. It appears that mailnews currently has code that wants to
convert stuff to UTF-8 and is forced to pivot through UTF-16.
Replacing this code with direct conversions to UTF-8 should give
mailnews a nice performance boost.
* The ISO-2022-JP decoder (for consistency with other browsers)
doesn't support ISO-2022-JP-2. While it's unlikely that new email will
arrive as ISO-2022-JP-2, it may exist in archive mailboxes as having
been sent by Apple Mail before it switched to always sending UTF-8.
--
Henri Sivonen
hsiv...@hsivonen.fi
https://hsivonen.fi/