Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Future of out-of-tree spell checkers?

463 views
Skip to first unread message

Henri Sivonen

unread,
Mar 22, 2017, 5:18:58 AM3/22/17
to dev-platform
Without XPCOM extensions, what's the story for out-of-tree spell checkers?

Finnish spell checking in Firefox (and Thunderbird) has so far been
accomplished using the mozvoikko extension, which implements
mozISpellCheckingEngine in JS and connects to the libvoikko[1] back
end via jsctypes. (Even though hunspell was initially developed for
Hungarian and, therefore, was initially hoped to be suitable for
Finnish, it turned out to be inadequate for dealing with Finnish.)

Previously, libvoikko was GPL-only, but it seems that most code in the
newest version can be alternatively used under MPL 1.1. (I don't know
why one would want to compile in the GPL-only stuff. Maybe for
compatibility with legacy-format Northern Sami or Greenlandic
dictionaries?)

Considering that mozvoikko already requires libvoikko to be present on
the system by other means and libvoikko now supports a non-GPL
configuration, could we put C++ glue code in-tree and dlopen libvoikko
if found?

[1] http://voikko.puimula.org/
--
Henri Sivonen
hsiv...@hsivonen.fi
https://hsivonen.fi/

Henri Sivonen

unread,
Mar 22, 2017, 6:51:48 AM3/22/17
to dev-platform
On Wed, Mar 22, 2017 at 11:18 AM, Henri Sivonen <hsiv...@hsivonen.fi> wrote:
> Without XPCOM extensions, what's the story for out-of-tree spell checkers?
>
> Finnish spell checking in Firefox (and Thunderbird) has so far been
> accomplished using the mozvoikko extension, which implements
> mozISpellCheckingEngine in JS and connects to the libvoikko[1] back
> end via jsctypes.

Further searching strongly suggest that there exist just 3
implementors of mozISpellCheckingEngine:
1) The in-tree wrapper for Mozilla's fork of Hunspell.
2) The mozvoikko extension that provides Finnish spell checking using
libvoikko.
3) The Kukkuniiaat extension that provides Greenlandic spell checking
using libvoikko.

To me, this is a strong indication that we should add a C++ adapter
for (dlopened) libvoikko in-tree and deCOMtaminate
mozISpellCheckingEngine while at it.

(FWIW, the desktop browser market share of Firefox in both Finland and
Greenland is above the average for Europe. It would be sad to mess
that up by just letting this stuff break.)

Nicolas B. Pierron

unread,
Mar 22, 2017, 9:52:40 AM3/22/17
to
On 03/22/2017 09:18 AM, Henri Sivonen wrote:
> Without XPCOM extensions, what's the story for out-of-tree spell checkers?
>
> […], which implements
> mozISpellCheckingEngine in JS and connects to the libvoikko[1] back
> end via jsctypes. […]

Would compiling libvoikko to WebAssembly remove the need for jsctypes and XPCOM?

--
Nicolas B. Pierron

Henri Sivonen

unread,
Mar 22, 2017, 10:11:13 AM3/22/17
to dev-platform
It would remove the need for jsctypes, but how would a WebAssembly
program in a Web Extension get to act as a spell checking engine once
extensions can no longer implement XPCOM interfaces
(mozISpellCheckingEngine in this case)?

Jorge Villalobos

unread,
Mar 22, 2017, 10:39:54 AM3/22/17
to
Note there is a bug on file to implement an spell-checker API for
WebExtensions: https://bugzilla.mozilla.org/show_bug.cgi?id=1343551

The API request was approved but is low priority.

Jorge

Axel Hecht

unread,
Mar 22, 2017, 10:45:17 AM3/22/17
to
Am 22.03.17 um 15:39 schrieb Jorge Villalobos:
Note, that bug seems about using an API like mozISpellCheckingEngine
from web extensions.

It doesn't seem to be about providing an implementation of it via a web
extension.

Axel

Henri Sivonen

unread,
Mar 22, 2017, 11:09:35 AM3/22/17
to dev-platform
Indeed.

Considering that there seems to be only one out-of-tree library that
gets glued into a mozISpellCheckingEngine provider (libvoikko), it
seems to me that it would be misplaced effort if Mozilla designed a
Web Extension API for providing a spell checker and then asked the
Voikko developers to figure out how to compile the code into
WebAssembly and how to package the wasm and all the data files as a
Web Extension.

dlopening libvoikko, if installed, and having thin C++ glue code
in-tree seems much simpler, except maybe for sandboxing. What are the
sandboxing implications of dlopening a shared library that will want
to load its data files?

Julian Hector

unread,
Mar 22, 2017, 11:45:06 AM3/22/17
to Henri Sivonen, dev-platform
Hey Henri, Freddy pointed me to the sandboxing part of the question, here
is my impression.

In general, if the Sandbox is running any additional code that is not in
the tree could also be accomplished with a compromised child process.

However in case of dlopen() it is important to know our Sandbox is not
immediately active upon process start, rather it is an ipc message that is
received on the child side to activate sandboxing. So if the dlopen happens
before the Sandbox ipc message is received the loaded code can do dangerous
stuff (e.g. By declaring a ctor constructor function of the library that is
executed upon loading)

Second, if the dl is injected into the parent it can possibly expose a new
way for the child to communicate with the parent which would expose more
attack surface.

And of course the code injected into the child can be vulnerable and expose
more attack surface for the child which could make it easier to be
compromised.

I hope that answers your question.

Julian


On Mar 22, 2017 16:09, "Henri Sivonen" <hsiv...@hsivonen.fi> wrote:

> On Wed, Mar 22, 2017 at 4:45 PM, Axel Hecht <l1...@mozilla.com> wrote:
> Indeed.
>
> Considering that there seems to be only one out-of-tree library that
> gets glued into a mozISpellCheckingEngine provider (libvoikko), it
> seems to me that it would be misplaced effort if Mozilla designed a
> Web Extension API for providing a spell checker and then asked the
> Voikko developers to figure out how to compile the code into
> WebAssembly and how to package the wasm and all the data files as a
> Web Extension.
>
> dlopening libvoikko, if installed, and having thin C++ glue code
> in-tree seems much simpler, except maybe for sandboxing. What are the
> sandboxing implications of dlopening a shared library that will want
> to load its data files?
>
> --
> Henri Sivonen
> hsiv...@hsivonen.fi
> https://hsivonen.fi/
> _______________________________________________
> dev-platform mailing list
> dev-pl...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform
>

Jeff Muizelaar

unread,
Mar 22, 2017, 11:50:52 AM3/22/17
to Henri Sivonen, dev-platform
On Wed, Mar 22, 2017 at 11:08 AM, Henri Sivonen <hsiv...@hsivonen.fi> wrote:
>
> dlopening libvoikko, if installed, and having thin C++ glue code
> in-tree seems much simpler, except maybe for sandboxing. What are the
> sandboxing implications of dlopening a shared library that will want
> to load its data files?

My understanding is that the spell checker mostly lives in the Chrome
process so it seems sandboxing won't be a problem.

-Jeff

Ehsan Akhgari

unread,
Mar 23, 2017, 8:39:09 PM3/23/17
to Jeff Muizelaar, Henri Sivonen, Kris Maglione, dev-platform, Jörg Knobloch, William McCloskey
On Wed, Mar 22, 2017 at 11:50 AM, Jeff Muizelaar <jmuiz...@mozilla.com>
wrote:
That is mostly correct. The spell checker *completely* lives in the parent
process and is completely unaffected by sandboxing.

But that's actually a problem. My understanding is that WebExtensions
won't be allowed to load code in the parent process. Bill, Kris, is that
correct? If yes, we should work with the maintainers of the Finnish and
Greenlandic dictionaries on adding custom support for loading their code...

(CCing Jorg as this also affects Thunderbird users in those languages.)

--
Ehsan

Henri Sivonen

unread,
Mar 24, 2017, 4:20:40 AM3/24/17
to Ehsan Akhgari, Jeff Muizelaar, Kris Maglione, dev-platform, Jörg Knobloch, William McCloskey
On Fri, Mar 24, 2017 at 2:38 AM, Ehsan Akhgari <ehsan....@gmail.com> wrote:
> On Wed, Mar 22, 2017 at 11:50 AM, Jeff Muizelaar <jmuiz...@mozilla.com>
> wrote:
>>
>> On Wed, Mar 22, 2017 at 11:08 AM, Henri Sivonen <hsiv...@hsivonen.fi>
>> wrote:
>> >
>> > dlopening libvoikko, if installed, and having thin C++ glue code
>> > in-tree seems much simpler, except maybe for sandboxing. What are the
>> > sandboxing implications of dlopening a shared library that will want
>> > to load its data files?
>>
>> My understanding is that the spell checker mostly lives in the Chrome
>> process so it seems sandboxing won't be a problem.
>
>
> That is mostly correct. The spell checker *completely* lives in the parent
> process and is completely unaffected by sandboxing.
>
> But that's actually a problem. My understanding is that WebExtensions won't
> be allowed to load code in the parent process. Bill, Kris, is that correct?
> If yes, we should work with the maintainers of the Finnish and Greenlandic
> dictionaries on adding custom support for loading their code...

But when (according to doing a Google Web search excluding mozilla.org
and wading through all the results and by searching the JS for all
AMO-hosted extensions) the only out-of-tree spell checkers use
libvoikko, why involve Web Extensions at all? Why wouldn't we dlopen
libvoikko and put a thin C++ adapter between libvoikko's C API and our
internal C++ interface in-tree? That would be significantly simpler
than involving Web extensions.

Ehsan Akhgari

unread,
Mar 24, 2017, 9:21:07 AM3/24/17
to Henri Sivonen, Jeff Muizelaar, Kris Maglione, dev-platform, Jörg Knobloch, William McCloskey
On 2017-03-24 4:20 AM, Henri Sivonen wrote:
> On Fri, Mar 24, 2017 at 2:38 AM, Ehsan Akhgari <ehsan....@gmail.com> wrote:
>> On Wed, Mar 22, 2017 at 11:50 AM, Jeff Muizelaar <jmuiz...@mozilla.com>
>> wrote:
>>>
>>> On Wed, Mar 22, 2017 at 11:08 AM, Henri Sivonen <hsiv...@hsivonen.fi>
>>> wrote:
>>>>
>>>> dlopening libvoikko, if installed, and having thin C++ glue code
>>>> in-tree seems much simpler, except maybe for sandboxing. What are the
>>>> sandboxing implications of dlopening a shared library that will want
>>>> to load its data files?
>>>
>>> My understanding is that the spell checker mostly lives in the Chrome
>>> process so it seems sandboxing won't be a problem.
>>
>>
>> That is mostly correct. The spell checker *completely* lives in the parent
>> process and is completely unaffected by sandboxing.
>>
>> But that's actually a problem. My understanding is that WebExtensions won't
>> be allowed to load code in the parent process. Bill, Kris, is that correct?
>> If yes, we should work with the maintainers of the Finnish and Greenlandic
>> dictionaries on adding custom support for loading their code...
>
> But when (according to doing a Google Web search excluding mozilla.org
> and wading through all the results and by searching the JS for all
> AMO-hosted extensions) the only out-of-tree spell checkers use
> libvoikko, why involve Web Extensions at all? Why wouldn't we dlopen
> libvoikko and put a thin C++ adapter between libvoikko's C API and our
> internal C++ interface in-tree? That would be significantly simpler
> than involving Web extensions.

Is that different than what I suggested above in some way that I'm
missing? I think it's better to engage the developers of those
libraries first and ask them how they would like us to proceed. At any
rate, something has to change on their side, since after Firefox 57
presumably Firefox would just ignore their XPI file or something. The
actual implementation mechanism would probably end up being the
dlopening that you're suggesting, but if we're going to be signing up to
doing that, we better have at least a communication channel with the
authors of those libraries in case for example we need to change
something on our interface some day.

Bill McCloskey

unread,
Mar 24, 2017, 2:45:12 PM3/24/17
to Ehsan Akhgari, Henri Sivonen, Kris Maglione, dev-platform, Jörg Knobloch, Jeff Muizelaar
If we do end up going with the dlopen plan, let's make sure that we enforce
some kind of code signing. We're finally almost rid of all the untrusted
binary code that we used to load (NPAPI, binary XPCOM, ctypes). It would be
a shame to open up a new path.

-Bill

On Fri, Mar 24, 2017 at 6:20 AM, Ehsan Akhgari <ehsan....@gmail.com>
wrote:

> On 2017-03-24 4:20 AM, Henri Sivonen wrote:
> > On Fri, Mar 24, 2017 at 2:38 AM, Ehsan Akhgari <ehsan....@gmail.com>
> wrote:
> >> On Wed, Mar 22, 2017 at 11:50 AM, Jeff Muizelaar <
> jmuiz...@mozilla.com>

Ehsan Akhgari

unread,
Mar 25, 2017, 2:49:10 PM3/25/17
to bi...@mozilla.com, Henri Sivonen, Kris Maglione, dev-platform, Jörg Knobloch, Jeff Muizelaar
On 2017-03-24 2:45 PM, Bill McCloskey wrote:
> If we do end up going with the dlopen plan, let's make sure that we
> enforce some kind of code signing. We're finally almost rid of all the
> untrusted binary code that we used to load (NPAPI, binary XPCOM,
> ctypes). It would be a shame to open up a new path.

Yeah, I agree.

Another option would be talking to the maintainers of libvoikko and
seeing if they would be open to maintaining the Mozilla bindings, in
which case I think we should even consider doing something like what we
do to download the OpenH264 binary at runtime when we need to. We could
even build and sign it in the infrastructure ourselves if we imported it
into the tree, with task cluster this is possible today with a super
simple shell script (well, at least the building side of it!). We
basically need someone to sign up for maintaining it, since I doubt that
MoCo will prioritize supporting a whole new spell checking backend for 2
new languages, but I think we can do a lot to help.

> On Fri, Mar 24, 2017 at 6:20 AM, Ehsan Akhgari <ehsan....@gmail.com
> <mailto:ehsan....@gmail.com>> wrote:
>
> On 2017-03-24 4:20 AM, Henri Sivonen wrote:
> > On Fri, Mar 24, 2017 at 2:38 AM, Ehsan Akhgari
> <ehsan....@gmail.com <mailto:ehsan....@gmail.com>> wrote:
> >> On Wed, Mar 22, 2017 at 11:50 AM, Jeff Muizelaar
> <jmuiz...@mozilla.com <mailto:jmuiz...@mozilla.com>>
> >> wrote:
> >>>
> >>> On Wed, Mar 22, 2017 at 11:08 AM, Henri Sivonen
> <hsiv...@hsivonen.fi <mailto:hsiv...@hsivonen.fi>>
> >>> wrote:
> >>>>
> >>>> dlopening libvoikko, if installed, and having thin C++ glue code
> >>>> in-tree seems much simpler, except maybe for sandboxing. What
> are the
> >>>> sandboxing implications of dlopening a shared library that will
> want
> >>>> to load its data files?
> >>>
> >>> My understanding is that the spell checker mostly lives in the
> Chrome
> >>> process so it seems sandboxing won't be a problem.
> >>
> >>
> >> That is mostly correct. The spell checker *completely* lives in
> the parent
> >> process and is completely unaffected by sandboxing.
> >>
> >> But that's actually a problem. My understanding is that
> WebExtensions won't
> >> be allowed to load code in the parent process. Bill, Kris, is
> that correct?
> >> If yes, we should work with the maintainers of the Finnish and
> Greenlandic
> >> dictionaries on adding custom support for loading their code...
> >
> > But when (according to doing a Google Web search excluding
> mozilla.org <http://mozilla.org>

Henri Sivonen

unread,
Mar 27, 2017, 3:31:07 AM3/27/17
to Ehsan Akhgari, Jeff Muizelaar, Kris Maglione, dev-platform, Jörg Knobloch, William McCloskey
On Fri, Mar 24, 2017 at 3:20 PM, Ehsan Akhgari <ehsan....@gmail.com> wrote:
> On 2017-03-24 4:20 AM, Henri Sivonen wrote:
>> On Fri, Mar 24, 2017 at 2:38 AM, Ehsan Akhgari <ehsan....@gmail.com> wrote:
>>> On Wed, Mar 22, 2017 at 11:50 AM, Jeff Muizelaar <jmuiz...@mozilla.com>
>>> wrote:
>>>>
>>>> On Wed, Mar 22, 2017 at 11:08 AM, Henri Sivonen <hsiv...@hsivonen.fi>
>>>> wrote:
>>>>>
>>>>> dlopening libvoikko, if installed, and having thin C++ glue code
>>>>> in-tree seems much simpler, except maybe for sandboxing. What are the
>>>>> sandboxing implications of dlopening a shared library that will want
>>>>> to load its data files?
>>>>
>>>> My understanding is that the spell checker mostly lives in the Chrome
>>>> process so it seems sandboxing won't be a problem.
>>>
>>>
>>> That is mostly correct. The spell checker *completely* lives in the parent
>>> process and is completely unaffected by sandboxing.
>>>
>>> But that's actually a problem. My understanding is that WebExtensions won't
>>> be allowed to load code in the parent process. Bill, Kris, is that correct?
>>> If yes, we should work with the maintainers of the Finnish and Greenlandic
>>> dictionaries on adding custom support for loading their code...
>>
>> But when (according to doing a Google Web search excluding mozilla.org
>> and wading through all the results and by searching the JS for all
>> AMO-hosted extensions) the only out-of-tree spell checkers use
>> libvoikko, why involve Web Extensions at all? Why wouldn't we dlopen
>> libvoikko and put a thin C++ adapter between libvoikko's C API and our
>> internal C++ interface in-tree? That would be significantly simpler
>> than involving Web extensions.
>
> Is that different than what I suggested above in some way that I'm
> missing?

I thought you meant that Web Extensions were your primary choice if
they could load code into the parent process.

> I think it's better to engage the developers of those
> libraries first and ask them how they would like us to proceed.

I wanted to get an understanding of what we'd be OK with before
contacting Harri Pitkänen (libvoikko developer) or Timo Jyrinki
(libvoikko and mozvoikko maintainer for Debian and Ubuntu), because I
don't want to cause them to write code only to find a Mozilla decision
render the code useless.

On Fri, Mar 24, 2017 at 8:45 PM, Bill McCloskey <wmccl...@mozilla.com> wrote:
> If we do end up going with the dlopen plan, let's make sure that we enforce
> some kind of code signing. We're finally almost rid of all the untrusted
> binary code that we used to load (NPAPI, binary XPCOM, ctypes). It would be
> a shame to open up a new path.

What threat do you intend to defend against?

On Linux, we should think of libvoikko as an optional system library.
(If you install Ubuntu choosing English as the system language at
install time, libvoikko is not installed by default. If you install
Ubuntu choosing Finnish as the system language at install time,
libvoikko is installed by default. In any case, you can get it from
the distro repo.) We already dlopen() PulseAudio as a system library
that we don't verify. In the Crash Reporter, we dlopen() libcurl and
some Gnome stuff. I expect that someone operating with the user's
privileges can cause whatever unverified code to be mapped into our
address space via LD_PRELOAD and system libraries that we link against
unconditionally.

As for Windows, since a spell checker doesn't add Web-exposed
functionality, we wouldn't have the risk that we had with NPAPI (or,
technically, with arbitrary add-ons) that a site could entice users to
run a random setup.exe in order to see some additional Web content.
The libvoikko API is pretty narrow, so I wouldn't expect it to enable
more anti-virus mischief than what can be done by hooking stuff into
the Windows system DLLs that we need to use.

The main problems I see are:
1) Right now the libvoikko distribution point is without https.
(Fixable with Let's Encrypt.)
2) We couldn't trigger a libvoikko autoupdate on Windows/Mac if there
was a crasher bug in the library. (I'd expect the distros to take care
of pushing an update in the Linux case. It's the same situation with
e.g. PulseAudio and Gtk anyway.)

On Sat, Mar 25, 2017 at 8:48 PM, Ehsan Akhgari <ehsan....@gmail.com> wrote:
> Another option would be talking to the maintainers of libvoikko and
> seeing if they would be open to maintaining the Mozilla bindings, in
> which case I think we should even consider doing something like what we
> do to download the OpenH264 binary at runtime when we need to. We could
> even build and sign it in the infrastructure ourselves if we imported it
> into the tree, with task cluster this is possible today with a super
> simple shell script (well, at least the building side of it!).

If we are willing to do the engineering for that, that would be great!
(Of course, just putting libvoikko into libxul would be simpler, but
would cost an added 250 KB in libxul size for everyone who doesn't
need libvoikko.)

The situation with the Tracking Protection data suggests that we are
OK with GPLed run-time downloaded data files even though not code.
Have I inferred the licensing position correctly? That is, if we
distributed libvoikko under MPL 1.1, would we be OK with also
distributing and autodownloading GPLed dictionary files?

> We
> basically need someone to sign up for maintaining it, since I doubt that
> MoCo will prioritize supporting a whole new spell checking backend for 2
> new languages, but I think we can do a lot to help.

I wouldn't expect MoCo to sign up to maintain libvoikko itself, but
the glue code needed between mozISpellCheckingEngine and libvoikko is
*extremely* thin:
https://github.com/voikko/mozvoikko/blob/master/components/MozVoikko2.js
(most of that code is jsctypes and XPCOM boilerplate; the code that
would need to be rewritten in C++ is almost nothing except trivial
forwarding to C.)

Henri Sivonen

unread,
Mar 28, 2017, 6:03:57 AM3/28/17
to Ehsan Akhgari, dev-platform, Kris Maglione, William McCloskey, Jörg Knobloch, Jeff Muizelaar
On Sat, Mar 25, 2017 at 8:48 PM, Ehsan Akhgari <ehsan....@gmail.com> wrote:
> Another option would be talking to the maintainers of libvoikko and
> seeing if they would be open to maintaining the Mozilla bindings,

I started a fact-finding thread on the libvoikko list:
http://lists.puimula.org/pipermail/libvoikko/2017-March/000896.html

(Not about anyone writing any code yet.)

Ehsan Akhgari

unread,
Apr 15, 2017, 1:06:24 PM4/15/17
to Henri Sivonen, Jeff Muizelaar, Kris Maglione, dev-platform, Jörg Knobloch, William McCloskey
On 2017-03-27 3:30 AM, Henri Sivonen wrote:
>>> But when (according to doing a Google Web search excluding mozilla.org
>>> and wading through all the results and by searching the JS for all
>>> AMO-hosted extensions) the only out-of-tree spell checkers use
>>> libvoikko, why involve Web Extensions at all? Why wouldn't we dlopen
>>> libvoikko and put a thin C++ adapter between libvoikko's C API and our
>>> internal C++ interface in-tree? That would be significantly simpler
>>> than involving Web extensions.
>>
>> Is that different than what I suggested above in some way that I'm
>> missing?
>
> I thought you meant that Web Extensions were your primary choice if
> they could load code into the parent process.

No, that's not what I meant.

> The main problems I see are:
> 1) Right now the libvoikko distribution point is without https.
> (Fixable with Let's Encrypt.)
> 2) We couldn't trigger a libvoikko autoupdate on Windows/Mac if there
> was a crasher bug in the library. (I'd expect the distros to take care
> of pushing an update in the Linux case. It's the same situation with
> e.g. PulseAudio and Gtk anyway.)

It is also untrusted and unsigned code and can cause security and
unstability issues. We have done a lot of work to remove all such code
from our parent process. I don't think it's useful to make an analogy
between this code and things like gtk.

> On Sat, Mar 25, 2017 at 8:48 PM, Ehsan Akhgari <ehsan....@gmail.com> wrote:
>> Another option would be talking to the maintainers of libvoikko and
>> seeing if they would be open to maintaining the Mozilla bindings, in
>> which case I think we should even consider doing something like what we
>> do to download the OpenH264 binary at runtime when we need to. We could
>> even build and sign it in the infrastructure ourselves if we imported it
>> into the tree, with task cluster this is possible today with a super
>> simple shell script (well, at least the building side of it!).
>
> If we are willing to do the engineering for that, that would be great!
> (Of course, just putting libvoikko into libxul would be simpler, but
> would cost an added 250 KB in libxul size for everyone who doesn't
> need libvoikko.)

That's not an option. 250KB for essentially dead code for most of our
users is too much.

> The situation with the Tracking Protection data suggests that we are
> OK with GPLed run-time downloaded data files even though not code.
> Have I inferred the licensing position correctly? That is, if we
> distributed libvoikko under MPL 1.1, would we be OK with also
> distributing and autodownloading GPLed dictionary files?
>
>> We
>> basically need someone to sign up for maintaining it, since I doubt that
>> MoCo will prioritize supporting a whole new spell checking backend for 2
>> new languages, but I think we can do a lot to help.
>
> I wouldn't expect MoCo to sign up to maintain libvoikko itself, but
> the glue code needed between mozISpellCheckingEngine and libvoikko is
> *extremely* thin:
> https://github.com/voikko/mozvoikko/blob/master/components/MozVoikko2.js
> (most of that code is jsctypes and XPCOM boilerplate; the code that
> would need to be rewritten in C++ is almost nothing except trivial
> forwarding to C.)

The mozISpellCheckingEngine interface should be considered very unstable
and subject to change in the very near future (possibly Firefox 55). In
fact in bug 1303749 we may break it by moving the spell checking to
happen in a background thread, and then there may be no scriptable
interface to it any more.

It may still be possible for them to provide a native library to us that
we can load on the background thread and call into but it may require
code changes on their side as well as our side to get that to work properly.

Henri Sivonen

unread,
Apr 18, 2017, 2:39:04 AM4/18/17
to dev-platform
On Sat, Apr 15, 2017 at 8:06 PM, Ehsan Akhgari <ehsan....@gmail.com> wrote:
> On 2017-03-27 3:30 AM, Henri Sivonen wrote:
>> 2) We couldn't trigger a libvoikko autoupdate on Windows/Mac if there
>> was a crasher bug in the library. (I'd expect the distros to take care
>> of pushing an update in the Linux case. It's the same situation with
>> e.g. PulseAudio and Gtk anyway.)
>
> It is also untrusted and unsigned code and can cause security and
> unstability issues. We have done a lot of work to remove all such code
> from our parent process. I don't think it's useful to make an analogy
> between this code and things like gtk.

I get it that libvoikko and gtk may (I haven't checked) have a
different code quality level and, therefore, involve different parent
process crash or exploitability risk. However, on e.g. Ubuntu and
Debian the trust and signedness status is indeed the same as for gtk:
both gtk and libvoikko are distro-provided code that is signed for
delivery but signatures aren't checked when executing the code (i.e.
the trust model of the OS doesn't treat root-owned libraries under
/usr as adversarial in general) and the distro is responsible for
pushing updates in case of critical bugs.

It would help me understand the issues if you could expand on your
trust and signing concerns.

>> On Sat, Mar 25, 2017 at 8:48 PM, Ehsan Akhgari <ehsan....@gmail.com> wrote:
>>> Another option would be talking to the maintainers of libvoikko and
>>> seeing if they would be open to maintaining the Mozilla bindings, in
>>> which case I think we should even consider doing something like what we
>>> do to download the OpenH264 binary at runtime when we need to. We could
>>> even build and sign it in the infrastructure ourselves if we imported it
>>> into the tree, with task cluster this is possible today with a super
>>> simple shell script (well, at least the building side of it!).
>>
>> If we are willing to do the engineering for that, that would be great!
>> (Of course, just putting libvoikko into libxul would be simpler, but
>> would cost an added 250 KB in libxul size for everyone who doesn't
>> need libvoikko.)
>
> That's not an option. 250KB for essentially dead code for most of our
> users is too much.

Annoyingly, chances are that no one will be willing to say ahead of
time how many kilobytes would be acceptable. :-/

As for how many users this would benefit, there's a big difference
between the immediate and the potential. The immediate is: very few
relative to the entire Firefox user population. There exist
dictionaries with clear licensing for Finnish, Northern Sami, Southern
Sami and Lule Sami and a dictionary with unclear (at least to me)
licensing for Greenlandic. The spell checking engine has broader
applicability, though. Maybe if we made it available with the same
ease as Hunspell, it would make it worthwhile for other languages that
are too complex for Hunspell to get dictionaries made or maybe some
languages that are unsatisfactorily supported by Hunspell would
migrate leading to better UX for users whose language already seems to
be covered by Hunspell but isn't actually handled well by Hunspell.
Hard to say.

> It may still be possible for them to provide a native library to us that
> we can load on the background thread and call into but it may require
> code changes on their side as well as our side to get that to work properly.

In a background thread in the chrome process? I.e. not isolated in a
way that would protect against the spell checker crashing the chrome
process?

Ehsan Akhgari

unread,
Apr 18, 2017, 9:44:00 PM4/18/17
to Henri Sivonen, dev-platform
On 2017-04-18 2:38 AM, Henri Sivonen wrote:
> On Sat, Apr 15, 2017 at 8:06 PM, Ehsan Akhgari <ehsan....@gmail.com> wrote:
>> On 2017-03-27 3:30 AM, Henri Sivonen wrote:
>>> 2) We couldn't trigger a libvoikko autoupdate on Windows/Mac if there
>>> was a crasher bug in the library. (I'd expect the distros to take care
>>> of pushing an update in the Linux case. It's the same situation with
>>> e.g. PulseAudio and Gtk anyway.)
>>
>> It is also untrusted and unsigned code and can cause security and
>> unstability issues. We have done a lot of work to remove all such code
>> from our parent process. I don't think it's useful to make an analogy
>> between this code and things like gtk.
>
> I get it that libvoikko and gtk may (I haven't checked) have a
> different code quality level and, therefore, involve different parent
> process crash or exploitability risk. However, on e.g. Ubuntu and
> Debian the trust and signedness status is indeed the same as for gtk:
> both gtk and libvoikko are distro-provided code that is signed for
> delivery but signatures aren't checked when executing the code (i.e.
> the trust model of the OS doesn't treat root-owned libraries under
> /usr as adversarial in general) and the distro is responsible for
> pushing updates in case of critical bugs.

Sure, but why do you keep bringing up these two distros? What about
Windows, where presumably most of Finnish and Greenlandic speaking users
will be? :-)

> It would help me understand the issues if you could expand on your
> trust and signing concerns.

The security issues should be obvious. I don't trust the C++ code that
I write and by extension I don't trust the C++ code that anybody else
writes.

The stability issues: If you go to
https://crash-stats.mozilla.com/topcrashers/?product=Firefox&version=52.0.2&days=7
right now, you will see top crashers caused by untrusted binary code
that we don't control doing bad things (I spotted #11,
js::Proxy::construct based on a cursory look right now). We have years
of concrete hard evidence in terms of 100s of crash bug reports. What's
even worse about this particular case is that due to the smaller size of
the user base, the chances of issues like crashes raising to an extent
that they become visible under our radar is slim. So the concrete risk
would be the possibility of loading this code in the parent process
causing a startup crash that flies under the radar and costs us all
users in those locales.

>>> On Sat, Mar 25, 2017 at 8:48 PM, Ehsan Akhgari <ehsan....@gmail.com> wrote:
>>>> Another option would be talking to the maintainers of libvoikko and
>>>> seeing if they would be open to maintaining the Mozilla bindings, in
>>>> which case I think we should even consider doing something like what we
>>>> do to download the OpenH264 binary at runtime when we need to. We could
>>>> even build and sign it in the infrastructure ourselves if we imported it
>>>> into the tree, with task cluster this is possible today with a super
>>>> simple shell script (well, at least the building side of it!).
>>>
>>> If we are willing to do the engineering for that, that would be great!
>>> (Of course, just putting libvoikko into libxul would be simpler, but
>>> would cost an added 250 KB in libxul size for everyone who doesn't
>>> need libvoikko.)
>>
>> That's not an option. 250KB for essentially dead code for most of our
>> users is too much.
>
> Annoyingly, chances are that no one will be willing to say ahead of
> time how many kilobytes would be acceptable. :-/

Yup. :-( If you ask the Fennec, they'll rightly say 0 though.

> As for how many users this would benefit, there's a big difference
> between the immediate and the potential. The immediate is: very few
> relative to the entire Firefox user population. There exist
> dictionaries with clear licensing for Finnish, Northern Sami, Southern
> Sami and Lule Sami and a dictionary with unclear (at least to me)
> licensing for Greenlandic. The spell checking engine has broader
> applicability, though. Maybe if we made it available with the same
> ease as Hunspell, it would make it worthwhile for other languages that
> are too complex for Hunspell to get dictionaries made or maybe some
> languages that are unsatisfactorily supported by Hunspell would
> migrate leading to better UX for users whose language already seems to
> be covered by Hunspell but isn't actually handled well by Hunspell.
> Hard to say.

Don't get me wrong, I do think we definitely should not "drop" support
for spell checking for those users which means we need to figure out a
way to keep this working. We just need to figure out a plan that works
well both for Gecko and mozvoikko. So please take all of my objections
so far as purely technical, not at all objections to the idea of
supporting mozvoikko. :-) (In fact, I'd usually take this work on
myself given that I'm the closest person to the spell checker for the
better or worse, but unfortunately these days due to the Quantum Flow
stuff I really can't add anything to my plate...)

>> It may still be possible for them to provide a native library to us that
>> we can load on the background thread and call into but it may require
>> code changes on their side as well as our side to get that to work properly.
>
> In a background thread in the chrome process? I.e. not isolated in a
> way that would protect against the spell checker crashing the chrome
> process?

In a background thread in the chrome process, for now. But I'm very
interested to see if one day we can move hunspell into its own little
process based on this work to be isolated from any potential security
bugs in it. One advantage of Gecko being in control of loading this
library explicitly would be that we could make this all work seamlessly
for the library, as in, when it moves from the parent process to the
future spell checker process, the library should be blind to any changes!

Of course this is assuming the library is OK with being called on a
background thread (it will still only ever be access on a single thread
so it won't need to be thread safe, it just doesn't need to be tied to
any main thread assumptions, but I would be surprised for a spell
checker engine to have any such assumptions... Hunspell definitely
couldn't care less. :-)

Henri Sivonen

unread,
Apr 25, 2017, 8:41:53 AM4/25/17
to Ehsan Akhgari, dev-platform
On Wed, Apr 19, 2017 at 4:43 AM, Ehsan Akhgari <ehsan....@gmail.com> wrote:
> On 2017-04-18 2:38 AM, Henri Sivonen wrote:
>> On Sat, Apr 15, 2017 at 8:06 PM, Ehsan Akhgari <ehsan....@gmail.com> wrote:
>>> On 2017-03-27 3:30 AM, Henri Sivonen wrote:
>>>> 2) We couldn't trigger a libvoikko autoupdate on Windows/Mac if there
>>>> was a crasher bug in the library. (I'd expect the distros to take care
>>>> of pushing an update in the Linux case. It's the same situation with
>>>> e.g. PulseAudio and Gtk anyway.)
>>>
>>> It is also untrusted and unsigned code and can cause security and
>>> unstability issues. We have done a lot of work to remove all such code
>>> from our parent process. I don't think it's useful to make an analogy
>>> between this code and things like gtk.
>>
>> I get it that libvoikko and gtk may (I haven't checked) have a
>> different code quality level and, therefore, involve different parent
>> process crash or exploitability risk. However, on e.g. Ubuntu and
>> Debian the trust and signedness status is indeed the same as for gtk:
>> both gtk and libvoikko are distro-provided code that is signed for
>> delivery but signatures aren't checked when executing the code (i.e.
>> the trust model of the OS doesn't treat root-owned libraries under
>> /usr as adversarial in general) and the distro is responsible for
>> pushing updates in case of critical bugs.
>
> Sure, but why do you keep bringing up these two distros? What about
> Windows, where presumably most of Finnish and Greenlandic speaking users
> will be? :-)

I made the gtk/pulse comparison in the Linux context only.

>> It would help me understand the issues if you could expand on your
>> trust and signing concerns.
>
> The security issues should be obvious. I don't trust the C++ code that
> I write and by extension I don't trust the C++ code that anybody else
> writes.

I see. I thought about "trusted" in the usual sense. I.e. code is
"trusted" if it has been given the necessary privileges to mess
everything up.

> The stability issues: If you go to
> https://crash-stats.mozilla.com/topcrashers/?product=Firefox&version=52.0.2&days=7
> right now, you will see top crashers caused by untrusted binary code
> that we don't control doing bad things (I spotted #11,
> js::Proxy::construct based on a cursory look right now). We have years
> of concrete hard evidence in terms of 100s of crash bug reports. What's
> even worse about this particular case is that due to the smaller size of
> the user base, the chances of issues like crashes raising to an extent
> that they become visible under our radar is slim. So the concrete risk
> would be the possibility of loading this code in the parent process
> causing a startup crash that flies under the radar and costs us all
> users in those locales.

It's unclear to me if you are arguing that Mozilla shouldn't
distribute libvoikko, because it might have a crasher bug that we
might not detect despite having the ability to push updates, or if you
are arguing that we shouldn't load libvoikko that's present on the
user's system via non-Mozilla distribution mechanism, because it might
have a crasher bug that we could neither detect nor push a fix for.

Either way, I still don't see how code signing would address this
concern. Running spell checking in a separate process would.

What problem did you mean to address by code signing?

>>>> On Sat, Mar 25, 2017 at 8:48 PM, Ehsan Akhgari <ehsan....@gmail.com> wrote:
>>>>> Another option would be talking to the maintainers of libvoikko and
>>>>> seeing if they would be open to maintaining the Mozilla bindings, in
>>>>> which case I think we should even consider doing something like what we
>>>>> do to download the OpenH264 binary at runtime when we need to. We could
>>>>> even build and sign it in the infrastructure ourselves if we imported it
>>>>> into the tree, with task cluster this is possible today with a super
>>>>> simple shell script (well, at least the building side of it!).
>>>>
>>>> If we are willing to do the engineering for that, that would be great!
>>>> (Of course, just putting libvoikko into libxul would be simpler, but
>>>> would cost an added 250 KB in libxul size for everyone who doesn't
>>>> need libvoikko.)
>>>
>>> That's not an option. 250KB for essentially dead code for most of our
>>> users is too much.
>>
>> Annoyingly, chances are that no one will be willing to say ahead of
>> time how many kilobytes would be acceptable. :-/
>
> Yup. :-( If you ask the Fennec, they'll rightly say 0 though.

Fortunately, this is not a Fennec-relevant issue, since spell checking
is part of the input method on Android.
It's unclear to me what your position is if one one hand you entertain
the possibility of them providing "a native library to us" for us to
load in the chrome process (i.e. can crash everything) and on the
other hand you raise concerns about stability issues (especially ones
that wouldn't be global and could be missed when observing global
crash stats).

Bill McCloskey

unread,
Apr 25, 2017, 2:03:07 PM4/25/17
to Henri Sivonen, Ehsan Akhgari, dev-platform
On Tue, Apr 25, 2017 at 5:41 AM, Henri Sivonen <hsiv...@hsivonen.fi> wrote:

> What problem did you mean to address by code signing?


The reason I suggested code signing is because loading libvoikko would
provide an easy way for people to inject code into Firefox. For a while
we've been trying to make it difficult for semi-legit-but-not-quite-malware
parties to load crappy code into Firefox (I'm thinking of crappy antivirus
software, adware, etc.). Removing binary XPCOM components and NPAPI
support, and requiring add-on signing, are all facets of this. If we simply
load and run code from any file named voikko.dll on the user's computer,
then we've opened up another door. It's a less powerful door since we
probably (I hope) wouldn't give them access to XPCOM. But they could still
open windows that look like they came from Firefox and I imagine there's
other bad stuff I haven't thought of.

People often object to this argument by saying that, without libvoikko,
these bad actors could just replace libxul or something. But I think in
practice it would be harder for them to pull that off, both technically and
socially. From a technical perspective, it's harder to replace core parts
of Firefox while still leaving it in a working state, especially if the
updater is still allowed to run. And socially, I think it makes their
software look a lot more like malware if they replace parts of Firefox
rather than simply install a new DLL that we then load.

Overall, though, I agree with Ehsan that this discussion isn't very
worthwhile unless we what the voikko people want to do.

-Bill

Henri Sivonen

unread,
Apr 26, 2017, 7:03:20 AM4/26/17
to William McCloskey, Ehsan Akhgari, dev-platform
This concern applies to Windows but not to Linux, right? What about Mac?

To address that concern, the local system itself would have to be
treated as semi-hostile and the signature would have to be checked at
library load time as opposed to the usual library install time. Do we
have pre-existing code for that?

AFAIK, in the case of OpenH264 we check a hash at library install
time, but when we subsequently load the library, we don't check a hash
or signature. In the case of OpenH264, the library gets loaded into a
sandbox, which probably addresses the concern of a replacement
OpenH264 with dodgy additional code being able to open windows that
look like they came from Firefox.

Assuming that we don't already have code for validating library
provenance at library load time, wouldn't it make more sense to put
effort into reusing the facilities for spawning a GMP process to spawn
a low-privilege spell checking process than to try validate the
provenance of already-installed code in a way that still doesn't
address the crash impact concern in the case of the code being
legitimate?

> Overall, though, I agree with Ehsan that this discussion isn't very
> worthwhile unless we what the voikko people want to do.

It seems to me that this thread raises enough concerns on our side
that it doesn't make sense to ask a third party what they want to do
before we have an idea what we'd be OK with.

Suppose they'd say they'd want to include libvoikko in Firefox like Hunspell?
We'd have binary size and crash impact concerns.

Suppose they'd say they'd want to make libvoikko download on-demand
using Mozilla infra like OpenH264?
We'd have concerns of finding release engineers and front end
engineers with time to set it up, the crash impact concern and the
concern of another party dropping malware in libvoikko's place.

Suppose they'd say they'd want to install libvoikko somewhere on the
user's library path and have us dlopen() it?
We'd have concerns about crash impact, being able to remedy crashes,
directing people to install non-Mozilla software (though @firefox on
Twitter regularly does) and other parties dropping malware in
libvoikko's place.

Suppose they'd say they'd want to ship a back end for system spell
checking frameworks and have us use the system spell checking API[1]?
We'd have concerns of Windows 7 not being covered, directing people to
install non-Mozilla software and crash impact at least in the Linux
case (AFAICT, Enchant doesn't provide process isolation from the back
end; I *think* Apple's solution does; not sure about Windows 8+) and
having to write 3 system-specific spell checking integrations.

Suppose they'd say they'd want to ship it as Web Assembly in a Web Extension?
We'd have concern about allocating engineering time to enable a Web
Extension to act as a spell checking provider, when there's only one
extension that'd foreseeably use it.

- -

[1] Enchant on Linux (currently hard-codes the assumption that Voikko
is Finnish only, so at least for the time being (until a potential
future version of Enchant without that hard-coding makes its way
through the distros) would throw Greenlandic, Sami and hope of other
languages under the bus).
https://bugzilla.mozilla.org/show_bug.cgi?id=422399

ISpellChecker on Windows 8+. The Windows built-in Finnish spell
checker is competent, so there'd be no functional need for Voikko for
Finnish. Would still need a non-Microsoft (i.e. libvoikko +
hfst-ospell) spell checking provider for Greenlandic, Sami and
potential other languages.
https://bugzilla.mozilla.org/show_bug.cgi?id=741746

NSSpellChecker on Mac. The system spell checker for Finnish is bad
compared to Voikko and the Windows build-in spell checker at least as
of El Capitan. (My Mac doesn't run more recent macOS, so I can't
immediately check if newer versions of macOS contain a competent
Finnish spell checker.) However, a libvoikko-based back end is
installable separately: http://verteksi.net/lab/osxspell/ . (Seems to
be hard-coded to Finnish only at the moment.)
https://bugzilla.mozilla.org/show_bug.cgi?id=86886

(System frameworks have the added benefit of sharing the user's
personal dictionary across apps instead of Firefox being special and
requiring the maintenance of a Firefox-specific personal dictionary.)

Ehsan Akhgari

unread,
Apr 26, 2017, 2:49:53 PM4/26/17
to Henri Sivonen, William McCloskey, dev-platform
On 04/26/2017 07:02 AM, Henri Sivonen wrote:
> On Tue, Apr 25, 2017 at 9:02 PM, Bill McCloskey <wmccl...@mozilla.com> wrote:
>> On Tue, Apr 25, 2017 at 5:41 AM, Henri Sivonen <hsiv...@hsivonen.fi> wrote:
>>> What problem did you mean to address by code signing?
>> The reason I suggested code signing is because loading libvoikko would
>> provide an easy way for people to inject code into Firefox.

Yes, this is precisely what I'm worried about as well.
>> For a while
>> we've been trying to make it difficult for semi-legit-but-not-quite-malware
>> parties to load crappy code into Firefox (I'm thinking of crappy antivirus
>> software, adware, etc.). Removing binary XPCOM components and NPAPI support,
>> and requiring add-on signing, are all facets of this. If we simply load and
>> run code from any file named voikko.dll on the user's computer, then we've
>> opened up another door. It's a less powerful door since we probably (I hope)
>> wouldn't give them access to XPCOM. But they could still open windows that
>> look like they came from Firefox and I imagine there's other bad stuff I
>> haven't thought of.
>>
>> People often object to this argument by saying that, without libvoikko,
>> these bad actors could just replace libxul or something. But I think in
>> practice it would be harder for them to pull that off, both technically and
>> socially. From a technical perspective, it's harder to replace core parts of
>> Firefox while still leaving it in a working state, especially if the updater
>> is still allowed to run. And socially, I think it makes their software look
>> a lot more like malware if they replace parts of Firefox rather than simply
>> install a new DLL that we then load.
> This concern applies to Windows but not to Linux, right? What about Mac?
FTR my main concern is about Windows here. But that being said I think
we can probably do something similar for Linux and Mac (but if we don't
have the time or resources to address those first/now, that's probably
fine.)
> To address that concern, the local system itself would have to be
> treated as semi-hostile and the signature would have to be checked at
> library load time as opposed to the usual library install time. Do we
> have pre-existing code for that?
We should treat the local system as *hostile*. Because that's what it
is in the real world at least for our Windows users.

I was hoping that we can use the code that we use to sign and verify our
mar files for the updater here, see for example this header which we use
for signature verification
<http://searchfox.org/mozilla-central/source/modules/libmar/verify/cryptox.h>.
I'm suggesting to use this code as a *basis* for this work, so there
will be some new code to be written for sure.

The advantage of this code is that it's pretty self-contained, so for
example we can use it to create a small command line utility to give the
voikko folks to use for signing, etc.
> AFAIK, in the case of OpenH264 we check a hash at library install
> time, but when we subsequently load the library, we don't check a hash
> or signature. In the case of OpenH264, the library gets loaded into a
> sandbox, which probably addresses the concern of a replacement
> OpenH264 with dodgy additional code being able to open windows that
> look like they came from Firefox.
>
> Assuming that we don't already have code for validating library
> provenance at library load time, wouldn't it make more sense to put
> effort into reusing the facilities for spawning a GMP process to spawn
> a low-privilege spell checking process than to try validate the
> provenance of already-installed code in a way that still doesn't
> address the crash impact concern in the case of the code being
> legitimate?
>
>> Overall, though, I agree with Ehsan that this discussion isn't very
>> worthwhile unless we what the voikko people want to do.
> It seems to me that this thread raises enough concerns on our side
> that it doesn't make sense to ask a third party what they want to do
> before we have an idea what we'd be OK with.
>
> Suppose they'd say they'd want to include libvoikko in Firefox like Hunspell?
> We'd have binary size and crash impact concerns.
To make the concerns here more concrete, the core of the issue is that
our non-English builds are merely a repack of the en-US builds, so
currently it's not possible to ship extra code to those users. That
being said, it _may_ be an option to use the locale repackaging step as
a vehicle for delivering the library binary that the voikko project
provides us if we end up going that option. We should check with the
l10n team how easy/possible packaging this would be inside the locale as
a resource...
> Suppose they'd say they'd want to make libvoikko download on-demand
> using Mozilla infra like OpenH264?
> We'd have concerns of finding release engineers and front end
> engineers with time to set it up, the crash impact concern and the
> concern of another party dropping malware in libvoikko's place.
FTR, it may actually be useful to look into this... I sort of assumed
this isn't an option because I don't know how reusable the infra that we
have for OpenH264 is, but if it is easily reusable without a lot of
work, this may be a good option too.
> Suppose they'd say they'd want to install libvoikko somewhere on the
> user's library path and have us dlopen() it?
> We'd have concerns about crash impact, being able to remedy crashes,
> directing people to install non-Mozilla software (though @firefox on
> Twitter regularly does) and other parties dropping malware in
> libvoikko's place.
>
> Suppose they'd say they'd want to ship a back end for system spell
> checking frameworks and have us use the system spell checking API[1]?
> We'd have concerns of Windows 7 not being covered, directing people to
> install non-Mozilla software and crash impact at least in the Linux
> case (AFAICT, Enchant doesn't provide process isolation from the back
> end; I *think* Apple's solution does; not sure about Windows 8+) and
> having to write 3 system-specific spell checking integrations.
Does the above lay out a good alternative?

They'd install libvoikko somewhere on the user's system and they sign
it somehow and we verify that signature and load it upon successful
verification and call a known entry point in the loaded library.
> Suppose they'd say they'd want to ship it as Web Assembly in a Web Extension?
> We'd have concern about allocating engineering time to enable a Web
> Extension to act as a spell checking provider, when there's only one
> extension that'd foreseeably use it.
As far as I'm concerned, anything that involves us having to run JS for
this is overkill and puts us in a tough spot in the future, so I'd very
much like to avoid that at all costs if possible. Porting this to
WebAssembly is my least favorite option of all!

Cheers,
Ehsan

Henri Sivonen

unread,
Apr 27, 2017, 3:39:00 AM4/27/17
to Ehsan Akhgari, dev-platform, William McCloskey
As noted previously about how we load other libs on Linux, I think it
doesn't make sense to do load-time signature checking on Linux.

>> To address that concern, the local system itself would have to be
>> treated as semi-hostile and the signature would have to be checked at
>> library load time as opposed to the usual library install time. Do we
>> have pre-existing code for that?
>
> We should treat the local system as *hostile*. Because that's what it is in
> the real world at least for our Windows users.
>
> I was hoping that we can use the code that we use to sign and verify our mar
> files for the updater here, see for example this header which we use for
> signature verification
> <http://searchfox.org/mozilla-central/source/modules/libmar/verify/cryptox.h>.
> I'm suggesting to use this code as a *basis* for this work, so there will be
> some new code to be written for sure.
>
> The advantage of this code is that it's pretty self-contained, so for
> example we can use it to create a small command line utility to give the
> voikko folks to use for signing, etc.

So this would be a special Mozilla-specific code signing scheme and
not Authenticode for Windows.
This seems to imply that there exists no non-zero code size addition
that we'd take for a non-Web-facing feature that applies to a small
number of locales. (For Web-facing features, there is precedent for
non-zero-size code that applies e.g. to the Thai script only.)

>> Suppose they'd say they'd want to make libvoikko download on-demand
>> using Mozilla infra like OpenH264?
>> We'd have concerns of finding release engineers and front end
>> engineers with time to set it up, the crash impact concern and the
>> concern of another party dropping malware in libvoikko's place.
>
> FTR, it may actually be useful to look into this... I sort of assumed this
> isn't an option because I don't know how reusable the infra that we have for
> OpenH264 is, but if it is easily reusable without a lot of work, this may be
> a good option too.

Experience from the CDM case suggests that adding another downloadable
to Balrog isn't a big deal. On the client side, adding another
Firefox-managed plug-in to the add-on manager UI requires some
front-end developer effort to add another hard-coded thing. I don't
know about the reusability the infra that's used to build and stage
OpenH264 itself, since that infra wasn't applicable in the CDM case.

>> Suppose they'd say they'd want to install libvoikko somewhere on the
>> user's library path and have us dlopen() it?
>> We'd have concerns about crash impact, being able to remedy crashes,
>> directing people to install non-Mozilla software (though @firefox on
>> Twitter regularly does) and other parties dropping malware in
>> libvoikko's place.
>>
>> Suppose they'd say they'd want to ship a back end for system spell
>> checking frameworks and have us use the system spell checking API[1]?
>> We'd have concerns of Windows 7 not being covered, directing people to
>> install non-Mozilla software and crash impact at least in the Linux
>> case (AFAICT, Enchant doesn't provide process isolation from the back
>> end; I *think* Apple's solution does; not sure about Windows 8+) and
>> having to write 3 system-specific spell checking integrations.
>
> Does the above lay out a good alternative?

I take it that the above refers to what you said above and not to what
you quoted me saying immediately above about system frameworks.

> They'd install libvoikko somewhere on the user's system and they sign it
> somehow and we verify that signature and load it upon successful
> verification and call a known entry point in the loaded library.

How would your concern about locale-specific crashes be addressed in
this scenario?

Ehsan Akhgari

unread,
May 7, 2017, 1:25:26 PM5/7/17
to Henri Sivonen, dev-platform, William McCloskey
(Sorry for the continued laggy responses here...)
I don't see why we would do something different on Linux than we would
on any other OS. That seem more work than treating Linux as the same as
other OSes, and I doubt it would make sense to put more effort into
special casing Linux in any way. To make my perspective very clear, I
think it would make sense for Mozilla to have a single code path that
treats loading libvokko in the exact same way on all three desktop OSes
if at all possible. And if we need special code paths, having that on
Windows make more sense than having that on Linux.

I would appreciate if we can keep the rest of the discussion focused on
cross platform issues please. Even if we end up loading unsigned
binaries for this on Linux I honestly won't lose much sleep over it
given that in practice distros package our builds and they do what they
want in that process anyway.

>>> To address that concern, the local system itself would have to be
>>> treated as semi-hostile and the signature would have to be checked at
>>> library load time as opposed to the usual library install time. Do we
>>> have pre-existing code for that?
>> We should treat the local system as *hostile*. Because that's what it is in
>> the real world at least for our Windows users.
>>
>> I was hoping that we can use the code that we use to sign and verify our mar
>> files for the updater here, see for example this header which we use for
>> signature verification
>> <http://searchfox.org/mozilla-central/source/modules/libmar/verify/cryptox.h>.
>> I'm suggesting to use this code as a *basis* for this work, so there will be
>> some new code to be written for sure.
>>
>> The advantage of this code is that it's pretty self-contained, so for
>> example we can use it to create a small command line utility to give the
>> voikko folks to use for signing, etc.
> So this would be a special Mozilla-specific code signing scheme and
> not Authenticode for Windows.
That is correct. I doubt Authenticode buys us anything?
There exists no such policy in the general terms AFAIK, so I don't think
we should be looking for one to inform our decision based on here. As
I'm sure you know, these things are always decided on a case by case
basis (and sometimes not really decided, when someone just lands code
without thinking about the size implications. :-/)

Here, the comparison between these two cases is actually not all that
hard to analyze. It is totally reasonable to expect someone using an
en-US build to go a web page with some Thai script text and expect to
not see garbage text. It is not at all reasonable for someone using an
en-US build without installing a Finnish dictionary to expect Firefox to
spell check Finnish text (at least until a future date where Firefox
doesn't force one to install that dictionary in the first place.)
Therefore the second class of users can't expect all en-US build users
to be penalized with the additional code size required to support spell
checking for that language.

(That being said, I *personally* hate how we discriminate against all
languages except the default UI locale one like this, I'd much rather if
Firefox would do more to support your languages for you, but we live in
the current world...)
>>> Suppose they'd say they'd want to make libvoikko download on-demand
>>> using Mozilla infra like OpenH264?
>>> We'd have concerns of finding release engineers and front end
>>> engineers with time to set it up, the crash impact concern and the
>>> concern of another party dropping malware in libvoikko's place.
>> FTR, it may actually be useful to look into this... I sort of assumed this
>> isn't an option because I don't know how reusable the infra that we have for
>> OpenH264 is, but if it is easily reusable without a lot of work, this may be
>> a good option too.
> Experience from the CDM case suggests that adding another downloadable
> to Balrog isn't a big deal. On the client side, adding another
> Firefox-managed plug-in to the add-on manager UI requires some
> front-end developer effort to add another hard-coded thing. I don't
> know about the reusability the infra that's used to build and stage
> OpenH264 itself, since that infra wasn't applicable in the CDM case.
If we want to use this infra, someone needs to find out these answers.
:-) I sadly don't really have the bandwidth to do that myself these
days... Do you, Henri?

>
>>> Suppose they'd say they'd want to install libvoikko somewhere on the
>>> user's library path and have us dlopen() it?
>>> We'd have concerns about crash impact, being able to remedy crashes,
>>> directing people to install non-Mozilla software (though @firefox on
>>> Twitter regularly does) and other parties dropping malware in
>>> libvoikko's place.
>>>
>>> Suppose they'd say they'd want to ship a back end for system spell
>>> checking frameworks and have us use the system spell checking API[1]?
>>> We'd have concerns of Windows 7 not being covered, directing people to
>>> install non-Mozilla software and crash impact at least in the Linux
>>> case (AFAICT, Enchant doesn't provide process isolation from the back
>>> end; I *think* Apple's solution does; not sure about Windows 8+) and
>>> having to write 3 system-specific spell checking integrations.
>> Does the above lay out a good alternative?
> I take it that the above refers to what you said above and not to what
> you quoted me saying immediately above about system frameworks.

Yes. :-)
>> They'd install libvoikko somewhere on the user's system and they sign it
>> somehow and we verify that signature and load it upon successful
>> verification and call a known entry point in the loaded library.
> How would your concern about locale-specific crashes be addressed in
> this scenario?
By opening up a path for us to run libvoikko out of firefox.exe. I
don't think any proposal that doesn't provide support for this really
addresses that concern (with all due respect to that project, but it's
still a pile of C++ code and all...)
0 new messages