Maintaining the en-US dictionary that ships with Mozilla products

Jörg Knobloch

unread,

Dec 28, 2015, 2:32:01 PM12/28/15

to dev-platform

Recently I was browsing some bugs in "Core::Spelling checker" and much
to my surprise found four bugs where people complained about wrong or
missing words in the en-US dictionary. There were two bugs where people
complained about words in the German and the French dictionaries.

The German and French bugs were finally closed as "wontfix" and
"invalid" and referred back to the respective dictionary maintainers.
For French there is a very good a approach: The French dictionaries are
maintained via this site: http://www.dicollecte.org/ and imported for
distribution with the French version of Firefox. The situation for
German is not as good, but there is a maintainer whose work is then
turned into an add-on (in fact, sadly, two competing ones).

I was extremely surprised that Mozilla maintains a version of the en-US
dictionary, and you can see the movements here:
https://hg.mozilla.org/mozilla-central/log/tip/extensions/spellcheck/locales/en-US/hunspell/en-US.dic

Basically Ekanan Ketunuti does merges from upstream providers (SCOWL)
but also adds words individually and Ehsan reviews each change.

I think this situation is less than ideal. Firstly, I don't think we
should spend time on individual additions, and secondly, this process
creates quite some unwanted variations (to avoid using the word "mess").
For example the en-US dictionary add-on available at AMO contains many
accented words loaned from other languages, like "Bogotá" or "cliché"
(both with Wikipedia entries), which the Mozilla dictionary is missing.
Also, subtle differences are created, for example, the add-on dictionary
has "(in/un)feasible" and "(un)feasibly", whereas the Mozilla version
only had "(un)feasible" and "feasibly" (no prefix). A bug is necessary
to correct this. Thirdly, the add-on dictionary contains 13% more words
than the Mozilla maintained dictionary, and I think in dictionaries,
bigger is better. For example, the Mozilla dictionary only knows
"zucchini", whereas the add-on dictionary also knows "Zulu" and other
words starting with "zu". I'd hate to think that we'd need to create
7265 bugs to add all the missing words.

Is there a better way to do this? I think this is tedious business and
Mozilla should get out of it.

Jorg K.

Aryeh Gregor

unread,

Dec 28, 2015, 2:48:28 PM12/28/15

to Jörg Knobloch, dev-platform

On Mon, Dec 28, 2015 at 9:31 PM, Jörg Knobloch <jo...@jorgk.com> wrote:
> Thirdly, the add-on dictionary contains 13% more words than the Mozilla maintained dictionary, and I think in dictionaries, bigger is better.

This is not always true for spelling correction, because there may be
common words that can be misspelled the same as uncommon words. E.g.,
the word "fro" is perfectly good English ("to and fro"), but in real
user input it's probably a misspelling of "for", and it's not obvious
that it's good to have in a spellchecking dictionary. Likewise,
uncommon variant spellings are best omitted. Uncommon words that are
not likely to be misspellings of any more common words are good to
have.

I don't have any strong feelings about the main point of your post.

Jim Mathies

unread,

Dec 28, 2015, 5:12:00 PM12/28/15

to

We could research using native spell checking apis if the platform supports them. For example Windows added spell checking apis in Windows 8.

https://msdn.microsoft.com/en-us/library/windows/desktop/hh869748%28v=vs.85%29.aspx

Jörg Knobloch

unread,

Dec 28, 2015, 5:33:37 PM12/28/15

to dev-pl...@lists.mozilla.org

On 28/12/2015 20:31, Jörg Knobloch wrote:
> For example, the Mozilla dictionary only knows "zucchini", whereas the
> add-on dictionary also knows "Zulu" and other words starting with
> "zu". I'd hate to think that we'd need to create 7265 bugs to add all
> the missing words.

OK, I was wrong, this is more confusing. The respective files are not
sorted, so Zulu is in fact in the Mozilla version. The difference in
numbers is not only due to more words but also to different affix rules.

However, after sorting I was able to spot a few words the Mozilla
version is lacking:
residuary reproachfulness relict reformism enforceability makefile lycopod
(I'm using a en-GB dictionary while writing this post, and all the words
listed above are actually in that dictionary, with the exception of
lycopod which should be lycopodium).

Looking for en-US.dic files on my system, I can see that for Thunderbird
31 a dictionary that was slightly bigger than the current one was
shipped, and this dictionary included residuary reproachfulness relict
enforceability, so I wonder how that got lost.

I think I've made my point: We're investing time and effort to
ultimately ship an inferior product. Mozilla should leave the
maintenance of dictionary to a third party.

Jorg K.

Mike Hommey

unread,

Dec 28, 2015, 5:46:09 PM12/28/15

to Jörg Knobloch, dev-pl...@lists.mozilla.org

We're not investing time and effort, that's the core problem... we're
essentially letting it rot, without updating with newer upstream
versions, which is time and effort on its own.

Mike

Jörg Knobloch

unread,

Dec 28, 2015, 5:51:38 PM12/28/15

to dev-platform

On 28/12/2015 23:45, Mike Hommey wrote:
> We're not investing time and effort, that's the core problem... we're
> essentially letting it rot, without updating with newer upstream
> versions, which is time and effort on its own.

Well,
https://hg.mozilla.org/mozilla-central/log/tip/extensions/spellcheck/locales/en-US/hunspell/en-US.dic
says something else.

I'm still looking how the size of the en-US.dic file shrank from 610KB
to 586KB. Someone invested time and effort to lose content and make
things worse.

Jorg K.

Mike Hommey

unread,

Dec 28, 2015, 6:15:50 PM12/28/15

to Jörg Knobloch, dev-platform

On Mon, Dec 28, 2015 at 11:51:01PM +0100, Jörg Knobloch wrote:
> On 28/12/2015 23:45, Mike Hommey wrote:
> >We're not investing time and effort, that's the core problem... we're
> >essentially letting it rot, without updating with newer upstream
> >versions, which is time and effort on its own.
> Well, https://hg.mozilla.org/mozilla-central/log/tip/extensions/spellcheck/locales/en-US/hunspell/en-US.dic
> says something else.

58 changesets in 2 years, that's a little more than 2 changesets a month,
I wouldn't call that time and effort, when a lot of them are adding one
word.

> I'm still looking how the size of the en-US.dic file shrank from 610KB to
> 586KB. Someone invested time and effort to lose content and make things
> worse.

Because, we have, in fact, imported a new upstream version. The drop in
size comes from bug 1137544, which is an upstream import...

Mike

Jörg Knobloch

unread,

Dec 28, 2015, 6:43:20 PM12/28/15

to dev-pl...@lists.mozilla.org

On 29/12/2015 00:15, Mike Hommey wrote:
> Because, we have, in fact, imported a new upstream version. The drop in
> size comes from bug 1137544, which is an upstream import...

Hmm, something went wrong from here
https://mxr.mozilla.org/mozilla-central/source/extensions/spellcheck/locales/en-US/hunspell/en-US.dic?rev=44969aaf686d
(contains relict residuary) to here:
https://mxr.mozilla.org/mozilla-central/source/extensions/spellcheck/locales/en-US/hunspell/en-US.dic?rev=efdb01d21f4d
(missing relict residuary).

Jorg K.

Philip Chee

unread,

Dec 29, 2015, 2:17:31 AM12/29/15

to

Or someone accidentally replaced a newer version with an older version.
Or there are two (or more) forks and somebody accidentally updated to a
less maintained fork with a newer timestamp.

Phil

--
Philip Chee <phi...@aleytys.pc.my>, <phili...@gmail.com>
http://flashblock.mozdev.org/ http://xsidebar.mozdev.org
Guard us from the she-wolf and the wolf, and guard us from the thief,
oh Night, and so be good for us to pass.

Philip Chee

unread,

Dec 29, 2015, 2:28:26 AM12/29/15

to

(In reply to Kevin Atkinson from comment #34)
> I looked up these words in the upstream tool to see why a word is not
> in the dictionary:
> http://app.aspell.net/lookup?dict=en_US&words=relict%0D%0Aresiduary%0D%0Aenforceability%0D%0Aadvisor%0D%0Ainfeasible

> "advisor" is a spelling variant of adviser and per my policy I
> generally only include one variant of a spelling to promote
> consistent spelling in the official dictionary. If Mozilla wants to
> include common variants this can likely be fixed with a little
> effort in the build scripts.

Time to fork! Like the en-GB dictionary on AMO has vastly more words
(190.000+)that the official upstream.

Marco A.G.Pinto

unread,

Dec 29, 2015, 2:52:23 AM12/29/15

to dev-pl...@lists.mozilla.org

On 29/12/2015 07:28, Philip Chee wrote:
> Time to fork! Like the en-GB dictionary on AMO has vastly more words
> (190.000+)that the official upstream.
>

Philip, not sure if you are referring to my en_GB fork.

When I grabbed the project around two years ago, the original dictionary
(also the updated one) had 136'404 words.

The version I will upload on Thursday (V2.32) has 153'347 words which
means it has a total of 16'943 new words.

I noticed that the "updated" dictionary has been deleted by its author.

On Thursday I will also update the official project site (Proofing Tool
GUI) but he who has access to this post can already read the "hidden" FAQ:
http://marcoagpinto.cidadevirtual.pt/faq.html

Kind regards,
>Marco A.G.Pinto
------------------------

Jörg Knobloch

unread,

Dec 29, 2015, 2:54:01 AM12/29/15

to dev-pl...@lists.mozilla.org

On 29/12/2015 08:28, Philip Chee wrote:
> Time to fork

I disagree. It's time to get it right:

https://bugzilla.mozilla.org/show_bug.cgi?id=1235506

Jorg K.

Jörg Knobloch

unread,

Dec 29, 2015, 8:02:11 AM12/29/15

to dev-pl...@lists.mozilla.org

On 29/12/2015 08:51, Marco A.G.Pinto wrote:
> I noticed that the "updated" dictionary has been deleted by its author.

There are still two British dictionaries:

https://addons.mozilla.org/en-US/firefox/addon/british-english-dictionary/
- Maintainer: Mark Tyndall
https://addons.mozilla.org/en-US/firefox/addon/british-english-dictionary-2/
- Maintainer: Marco A.G.Pinto

Deleted:
https://addons.mozilla.org/en-US/firefox/addon/british-english-dictionary-/

Bug 1235506 should re-establish a decent en-US dictionary.

Jorg K.

Ehsan Akhgari

unread,

Dec 29, 2015, 12:44:20 PM12/29/15

to Jörg Knobloch, dev-platform

On 2015-12-28 2:31 PM, Jörg Knobloch wrote:
> Recently I was browsing some bugs in "Core::Spelling checker" and much
> to my surprise found four bugs where people complained about wrong or
> missing words in the en-US dictionary. There were two bugs where people
> complained about words in the German and the French dictionaries.
>
> The German and French bugs were finally closed as "wontfix" and
> "invalid" and referred back to the respective dictionary maintainers.
> For French there is a very good a approach: The French dictionaries are
> maintained via this site: http://www.dicollecte.org/ and imported for
> distribution with the French version of Firefox. The situation for
> German is not as good, but there is a maintainer whose work is then
> turned into an add-on (in fact, sadly, two competing ones).

As you have discovered, we don't ship any non-en-US dictionaries with
Firefox, so the above is off topic for this mailing list.

> I was extremely surprised that Mozilla maintains a version of the en-US
> dictionary, and you can see the movements here:
> https://hg.mozilla.org/mozilla-central/log/tip/extensions/spellcheck/locales/en-US/hunspell/en-US.dic
>
>
> Basically Ekanan Ketunuti does merges from upstream providers (SCOWL)
> but also adds words individually and Ehsan reviews each change.

That's incorrect. I periodically merge from SCOWL, and Ekanan regularly
submits patches for words missing from SCOWL (and our en-US dictionary.)

> I think this situation is less than ideal. Firstly, I don't think we
> should spend time on individual additions

I disagree! The quality of the dictionary we ship with Firefox matters,
and the process for adding new words seems to be working well.

> and secondly, this process
> creates quite some unwanted variations (to avoid using the word "mess").
> For example the en-US dictionary add-on available at AMO contains many
> accented words loaned from other languages, like "Bogotá" or "cliché"
> (both with Wikipedia entries), which the Mozilla dictionary is missing.

That is not a problem with the word list, it's an issue with the en-US
dictionary being encoded in ISO8859-1.

> Also, subtle differences are created, for example, the add-on dictionary
> has "(in/un)feasible" and "(un)feasibly", whereas the Mozilla version
> only had "(un)feasible" and "feasibly" (no prefix). A bug is necessary
> to correct this.

Not sure what you mean. Of course, a word list can have bugs. Once you
find these issues, you can report bugs, and/or submit patches. (The
same goes for SCOWL, FWIW.)

> Thirdly, the add-on dictionary contains 13% more words
> than the Mozilla maintained dictionary, and I think in dictionaries,
> bigger is better.

I'm not sure what the "add-on dictionary" is. But FWIW you're wrong in
assuming that bigger is better, both for the reason that Aryeh described
and also because the format of hunspell dictionaries is not a simple
list of words, so comparing two dictionaries sizes gives you no
information about which one contains more words.

> For example, the Mozilla dictionary only knows
> "zucchini", whereas the add-on dictionary also knows "Zulu" and other
> words starting with "zu". I'd hate to think that we'd need to create
> 7265 bugs to add all the missing words.

Filing a single bug for all of those words and attaching them works just
fine.

> Is there a better way to do this? I think this is tedious business and
> Mozilla should get out of it.

As the de facto maintainer of our en-US dictionary, I'm not sure where
you're getting this information from, but your conclusions are
unjustified. The current process seems to be working well, and I think
the summary of your objections is essentially that you have found some
missing words, which is a great thing to file a bug about (please CC
Ekanan.)

I see no reason for Mozilla to stop maintaining and shipping the en-US
dictionary.

Cheers,
Ehsan

Jörg Knobloch

unread,

Dec 29, 2015, 3:34:52 PM12/29/15

to dev-platform

On 29/12/2015 18:23, Ehsan Akhgari wrote:
> I see no reason for Mozilla to stop maintaining and shipping the en-US
> dictionary.

Agreed. But we should take a different approach. I disagree that the
current process is working well since it carries forward legacy errors.

I must admit that my original post was somewhat unfortunate since I
wasn't fully aware of the Mozilla process. It would be great if Mozilla
could just obtain a suitable dictionary from a third party and ship it.
Sadly that's not the case.

The practise is that Mozilla uses the SCOWL/Aspell word list and adds
Mozilla "special" words to it. Details can be found in bug 1235506.

My first point is: We're currently using SCOWL's "small" dictionary from
which recently a bunch of words disappeared. So we get bugs asking for
words to be added, words that were previously included and are also
included in the "large" dictionary that is available.

The second point is that we're not managing Mozilla specific additions
well. There are about 12000 (questionable) proper names that Mozilla
adds and about 1000 extra terms which are partly grossly wrong. Here
just a random excerpt:
derail's
derange's
deride's
desalt's
descale's
describe's
deserve's
deskill's
despoil's
detest's
dethrone's
detract's
devalue's
devote's
All these are wrong! You can write: "This remind's me of you" without
that being flagged as a mistake! Most likely there were imported once
upon a time, corrected at the source, but never removed from Mozilla's
version.
All extra content in
https://dxr.mozilla.org/mozilla-central/source/extensions/spellcheck/locales/en-US/hunspell/dictionary-sources/5-mozilla-added
should be reviewed and classified. Again, details in bug 1235506.

I am proposing to change the way the Mozilla dictionary is maintained,
to keep manual intervention to a minimum and the quality to a maximum.
I'm glad that Ehsan agrees that the quality is important. Sadly, we're
currently not delivering a quality dictionary.

Just one more remark: The "large" dictionary I'm proposing to use is
ISO8859-1 encoded (like the "small" one) and contains many words with
accents, including all the ones mentioned in the original post. So there
is no problem.

Jorg K.

Ehsan Akhgari

unread,

Dec 29, 2015, 3:38:43 PM12/29/15

to Jim Mathies, dev-pl...@lists.mozilla.org

On 2015-12-28 5:11 PM, Jim Mathies wrote:
> We could research using native spell checking apis if the platform supports them. For example Windows added spell checking apis in Windows 8.

It's not obvious to me why that would be an improvement. The
cross-platform nature of our spell checking engine is definitely a plus.

Ehsan Akhgari

unread,

Dec 29, 2015, 3:54:22 PM12/29/15

to Jörg Knobloch, dev-platform

On 2015-12-29 3:34 PM, Jörg Knobloch wrote:
> On 29/12/2015 18:23, Ehsan Akhgari wrote:
>> I see no reason for Mozilla to stop maintaining and shipping the en-US
>> dictionary.
> Agreed. But we should take a different approach. I disagree that the
> current process is working well since it carries forward legacy errors.
>
> I must admit that my original post was somewhat unfortunate since I
> wasn't fully aware of the Mozilla process. It would be great if Mozilla
> could just obtain a suitable dictionary from a third party and ship it.
> Sadly that's not the case.
>
> The practise is that Mozilla uses the SCOWL/Aspell word list and adds
> Mozilla "special" words to it. Details can be found in bug 1235506.

They are not Mozilla special words. They are words that we want to add
to our spell checking dictionary that don't exist in the upstream SCOWL
word list.

IOW, our en-US dictionary is a super-set of the SCOWL en-US dictionary.

> My first point is: We're currently using SCOWL's "small" dictionary from
> which recently a bunch of words disappeared. So we get bugs asking for
> words to be added, words that were previously included and are also
> included in the "large" dictionary that is available.

AFAIK the SCOWL project recommends against using the large word list for
spell checking. If you find evidence to the contrary, I would like to
know more about that.

(About the words that disappeared, please file a bug and attach the list
of the words.)

I welcome someone going through the list and reviewing the additions.
No matter what process we use for maintaining our word list, that will
be needed and appreciated.

> I am proposing to change the way the Mozilla dictionary is maintained,
> to keep manual intervention to a minimum and the quality to a maximum.
> I'm glad that Ehsan agrees that the quality is important. Sadly, we're
> currently not delivering a quality dictionary.

I'm really lost on what problem in the process you are talking about.
Looks like you have found some issues in the word list, which is great.
But I don't see any of these having anything to do with the process
for updating the word list. If you're suggesting that we should not
maintain any additions on top of SCOWL, that is effectively asking for a
regression to the quality of our word list, and as such is unacceptable.

> Just one more remark: The "large" dictionary I'm proposing to use is
> ISO8859-1 encoded (like the "small" one) and contains many words with
> accents, including all the ones mentioned in the original post. So there
> is no problem.

If you find a way to encode these accented characters properly, we can
add them to the word list that we maintain.

Pascal Chevrel

unread,

Dec 29, 2015, 4:50:39 PM12/29/15

to

Le 29/12/2015 18:23, Ehsan Akhgari a écrit :
> On 2015-12-28 2:31 PM, Jörg Knobloch wrote:
>> Recently I was browsing some bugs in "Core::Spelling checker" and much
>> to my surprise found four bugs where people complained about wrong or
>> missing words in the en-US dictionary. There were two bugs where people
>> complained about words in the German and the French dictionaries.
>>
>> The German and French bugs were finally closed as "wontfix" and
>> "invalid" and referred back to the respective dictionary maintainers.
>> For French there is a very good a approach: The French dictionaries are
>> maintained via this site: http://www.dicollecte.org/ and imported for
>> distribution with the French version of Firefox. The situation for
>> German is not as good, but there is a maintainer whose work is then
>> turned into an add-on (in fact, sadly, two competing ones).
>
> As you have discovered, we don't ship any non-en-US dictionaries with
> Firefox, so the above is off topic for this mailing list.
>

We do ship dictionaries with Firefox localized builds when their licence
allows it.

Pascal

Mike Hommey

unread,

Dec 29, 2015, 5:41:48 PM12/29/15

to Ehsan Akhgari, dev-platform, Jörg Knobloch

On Tue, Dec 29, 2015 at 12:23:05PM -0500, Ehsan Akhgari wrote:
> On 2015-12-28 2:31 PM, Jörg Knobloch wrote:
> >Recently I was browsing some bugs in "Core::Spelling checker" and much
> >to my surprise found four bugs where people complained about wrong or
> >missing words in the en-US dictionary. There were two bugs where people
> >complained about words in the German and the French dictionaries.
> >
> >The German and French bugs were finally closed as "wontfix" and
> >"invalid" and referred back to the respective dictionary maintainers.
> >For French there is a very good a approach: The French dictionaries are
> >maintained via this site: http://www.dicollecte.org/ and imported for
> >distribution with the French version of Firefox. The situation for
> >German is not as good, but there is a maintainer whose work is then
> >turned into an add-on (in fact, sadly, two competing ones).
>
> As you have discovered, we don't ship any non-en-US dictionaries with
> Firefox, so the above is off topic for this mailing list.
>

> >I was extremely surprised that Mozilla maintains a version of the en-US
> >dictionary, and you can see the movements here:
> >https://hg.mozilla.org/mozilla-central/log/tip/extensions/spellcheck/locales/en-US/hunspell/en-US.dic
> >
> >
> >Basically Ekanan Ketunuti does merges from upstream providers (SCOWL)
> >but also adds words individually and Ehsan reviews each change.
>
> That's incorrect. I periodically merge from SCOWL, and Ekanan regularly
> submits patches for words missing from SCOWL (and our en-US dictionary.)

Note that was true for a very long time. The regular merges from SCOWL
are fairly recent.

Mike

Jörg Knobloch

unread,

Dec 29, 2015, 6:55:19 PM12/29/15

to dev-pl...@lists.mozilla.org

On 29/12/2015 21:54, Ehsan Akhgari wrote:
> They are not Mozilla special words. They are words that we want to add
> to our spell checking dictionary that don't exist in the upstream SCOWL
> word list.

In bug 1235506 I suggest to maintain three lists:
1) proper names of which we have about 12.000.
2) Special Mozilla words, like "XUL" of which we have exactly 37.
3) A mixed bag of 1000 extra words, mostly internet related terms.
There are many errors in those. Many of those words should be
requested upstream and removed from the Mozilla maintained part
in due course, example: datasheet:
http://app.aspell.net/lookup?dict=en_US-large&words=datasheet
has a likeliness to be added one day.

> IOW, our en-US dictionary is a super-set of the SCOWL en-US dictionary.

Yes, minus two exceptions:
https://dxr.mozilla.org/mozilla-central/source/extensions/spellcheck/locales/en-US/hunspell/dictionary-sources/5-mozilla-removed
(which IMHO make no sense at all).

> AFAIK the SCOWL project recommends against using the large word list for
> spell checking. If you find evidence to the contrary, I would like to
> know more about that.

I NI'ed Kevin Atkinson, as you had requested. The maintainer of the GB
dictionary states: Bigger is better:
http://marcoagpinto.cidadevirtual.pt/faq.html

> (About the words that disappeared, please file a bug and attach the list
> of the words.)

No. That is the part that makes no sense. These words were in the SCOWL
data before the merge at the end of April 2015. Now there are no longer
in the "small" dictionary. It makes absolutely no sense to administer
that part of the dictionary. We either take the SCOWL data or we don't.
We don't want to be in the business of adding words that SCOWL have
removed. Therefore my suggestion is to use the "large" dataset which has
these words: Example:
http://app.aspell.net/lookup?dict=en_US-large&words=relict%0D%0Aresiduary%0D%0Aenforceability%0D%0Aadvisor%0D%0Ainfeasible%0D%0Aclich%E9%0D%0ABogot%E1%0D%0Ainfeasible%0D%0Aunfeasible

> I welcome someone going through the list and reviewing the additions. No
> matter what process we use for maintaining our word list, that will be
> needed and appreciated.

Yes.

> I'm really lost on what problem in the process you are talking about.
> Looks like you have found some issues in the word list, which is great.
> But I don't see any of these having anything to do with the process for
> updating the word list.

I'll try again. The Mozilla dictionary consists of two sources: SCOWL
and Mozilla's words, which should be maintained separately. We want to
be in a position to replace the SCOWL data easily. Mozilla should
administer its own additions, not general English terms. A recent
Mozilla addition, Fukushima, should for example be added to the third
list, the mixed bag that we wish were in the SCOWL data but aren't
(http://app.aspell.net/lookup?dict=en_US-large&words=Fukushima).

Another example: If SCOWL decide to change feasible/U to feasible/UI and
then back to feasible/U, Mozilla should not hang on to the /I part as we
currently do. Mozilla should not administer the plain English
dictionary, it should administer its specific well chosen additions.

The faulty process has led to the unfortunate situation we're in. The
current process accumulates all SCOWL errors forever unless some files a
bug. For example: Somehow "remind's" got into the Mozilla data the only
way to get it out with the current process is to file a bug.

If we were only to maintain carefully chosen additions, then
mind/remind/reminds/etc. would not be part of the Mozilla maintained
list. Mozilla would just follow SCOWL on this word/stem.

> If you're suggesting that we should not
> maintain any additions on top of SCOWL, that is effectively asking for a
> regression to the quality of our word list, and as such is unacceptable.

As I said many times in the thread: We should carefully maintain any
Mozilla additions on top of the SCOWL data. We should leave it to SCOWL
to manage the plain English dictionary and only manage the Mozilla
additions (for which I see three classes, see above).

Let me try a comparison: The SCOWL data is a holiday rental place and
Mozilla is the holiday maker. It moves on with thongs, sunscreen and
shorts. It keeps track of its belongings. Of course Mozillians in the
flat take pictures of each other which feature the things which belong
to the flat. After a week Mozilla returns home. I takes its thongs,
sunscreen and shorts with it. It does not hang on to the flat's carpet
or couch. Neither does is take an inventory of the holiday flat. Next
year Mozilla visits the holiday flat again. The owner has changed the
carpet, removed a picture from the hallway but added a statue in the
living room. The holiday snaps will look different to the ones from the
previous year, but Mozilla doesn't have to guarantee that the same items
of furniture appear on the photos.

> If you find a way to encode these accented characters properly, we can
> add them to the word list that we maintain.

They are already in the "large" SCOWL dataset. Adding accented
characters to en-US.dic works today. Try it: Add "naïve" or "résumé" to
the data and make sure the file gets saved as ANSI/ISO8859-1 and not
UTF-8. Then type/paste those words into a text field with spell
checking. Works!

Jorg K.

Ehsan Akhgari

unread,

Dec 29, 2015, 7:49:30 PM12/29/15

to Jörg Knobloch, dev-pl...@lists.mozilla.org

On 2015-12-29 6:54 PM, Jörg Knobloch wrote:
> On 29/12/2015 21:54, Ehsan Akhgari wrote:
>> They are not Mozilla special words. They are words that we want to add
>> to our spell checking dictionary that don't exist in the upstream SCOWL
>> word list.
> In bug 1235506 I suggest to maintain three lists:
> 1) proper names of which we have about 12.000.
> 2) Special Mozilla words, like "XUL" of which we have exactly 37.
> 3) A mixed bag of 1000 extra words, mostly internet related terms.
> There are many errors in those. Many of those words should be
> requested upstream and removed from the Mozilla maintained part
> in due course, example: datasheet:
> http://app.aspell.net/lookup?dict=en_US-large&words=datasheet
> has a likeliness to be added one day.

First things first, let's correct something here. We do _not_ maintain
three word lists. We maintain one list: the list of words that the
Firefox spellchecker accepts. The
extensions/spellcheck/locales/en-US/hunspell/dictionary-sources/5-mozilla-*
files are there purely for historical reasons, and should only be used
in order to triage the diff of our dictionary as the SCOWL upstream.

FWIW,
<https://dxr.mozilla.org/mozilla-central/source/extensions/spellcheck/locales/en-US/hunspell/dictionary-sources/README>
explains exactly how these files are used and generated thoroughly.

>> IOW, our en-US dictionary is a super-set of the SCOWL en-US dictionary.
> Yes, minus two exceptions:
> https://dxr.mozilla.org/mozilla-central/source/extensions/spellcheck/locales/en-US/hunspell/dictionary-sources/5-mozilla-removed
>
> (which IMHO make no sense at all).

Yes. At the risk of sounding like a broken record, please file a bug?
I think it makes sense to add these two words back. :-)

>> AFAIK the SCOWL project recommends against using the large word list for
>> spell checking. If you find evidence to the contrary, I would like to
>> know more about that.
> I NI'ed Kevin Atkinson, as you had requested. The maintainer of the GB
> dictionary states: Bigger is better:
> http://marcoagpinto.cidadevirtual.pt/faq.html

Cool. FWIW I'm happy to trust Kevin's judgement here, unless if the
larger wordlist increases the size of the code we ship significantly.

>> (About the words that disappeared, please file a bug and attach the list
>> of the words.)
> No. That is the part that makes no sense. These words were in the SCOWL
> data before the merge at the end of April 2015. Now there are no longer
> in the "small" dictionary. It makes absolutely no sense to administer
> that part of the dictionary. We either take the SCOWL data or we don't.
> We don't want to be in the business of adding words that SCOWL have
> removed. Therefore my suggestion is to use the "large" dataset which has
> these words: Example:
> http://app.aspell.net/lookup?dict=en_US-large&words=relict%0D%0Aresiduary%0D%0Aenforceability%0D%0Aadvisor%0D%0Ainfeasible%0D%0Aclich%E9%0D%0ABogot%E1%0D%0Ainfeasible%0D%0Aunfeasible

I'm afraid you're misunderstanding what's happened here. We only
maintain one word list, and our process of merging upstream changes is
purely additive. As a result, it doesn't handle the case where a word
disappears from SCOWL.

This is clearly a bug, and should be fixed. We may still decide to keep
individual words that SCOWL drops if we decide that we want the Firefox
spell checker to accept them, but as a general rule we should probably
follow upstream.

I believe this should be relatively simple to fix in make-new-dict.

>> I'm really lost on what problem in the process you are talking about.
>> Looks like you have found some issues in the word list, which is great.
>> But I don't see any of these having anything to do with the process for
>> updating the word list.
> I'll try again. The Mozilla dictionary consists of two sources: SCOWL
> and Mozilla's words, which should be maintained separately. We want to
> be in a position to replace the SCOWL data easily. Mozilla should
> administer its own additions, not general English terms. A recent
> Mozilla addition, Fukushima, should for example be added to the third
> list, the mixed bag that we wish were in the SCOWL data but aren't
> (http://app.aspell.net/lookup?dict=en_US-large&words=Fukushima).

Please see the above. I believe fixing the above bug will make you happy!

> Another example: If SCOWL decide to change feasible/U to feasible/UI and
> then back to feasible/U, Mozilla should not hang on to the /I part as we
> currently do. Mozilla should not administer the plain English
> dictionary, it should administer its specific well chosen additions.

Again, please see the above.

> The faulty process has led to the unfortunate situation we're in. The
> current process accumulates all SCOWL errors forever unless some files a
> bug. For example: Somehow "remind's" got into the Mozilla data the only
> way to get it out with the current process is to file a bug.

FWIW please realize that even with the above bug fixed and us not
magically holding zombie SCOWL entries alive, there will still be
examples of embarrassing things that our spell checker gets wrong. The
right thing to do for that is _always_ to file a bug. So, note that
there are two orthogonal issues here.

>> If you're suggesting that we should not
>> maintain any additions on top of SCOWL, that is effectively asking for a
>> regression to the quality of our word list, and as such is unacceptable.
> As I said many times in the thread: We should carefully maintain any
> Mozilla additions on top of the SCOWL data. We should leave it to SCOWL
> to manage the plain English dictionary and only manage the Mozilla
> additions (for which I see three classes, see above).

I disagree. I think we should accept the words that we want, and then
try to upstream them to SCOWL, without holding Firefox back until that
happens. I experimented with this once
<https://github.com/kevina/wordlist/issues/117> but unfortunately I
haven't had the time to go through all of the list. (As a non-native
speaker this task requires me to spend weeks looking things up in
dictionaries!)

> Let me try a comparison: The SCOWL data is a holiday rental place and
> Mozilla is the holiday maker. It moves on with thongs, sunscreen and
> shorts. It keeps track of its belongings. Of course Mozillians in the
> flat take pictures of each other which feature the things which belong
> to the flat. After a week Mozilla returns home. I takes its thongs,
> sunscreen and shorts with it. It does not hang on to the flat's carpet
> or couch. Neither does is take an inventory of the holiday flat. Next
> year Mozilla visits the holiday flat again. The owner has changed the
> carpet, removed a picture from the hallway but added a statue in the
> living room. The holiday snaps will look different to the ones from the
> previous year, but Mozilla doesn't have to guarantee that the same items
> of furniture appear on the photos.

I'm well capable of understanding technical arguments, and I'm not sure
if I appreciate this kind of simplifications. Let's please stick to
technical terms. :-)

>> If you find a way to encode these accented characters properly, we can
>> add them to the word list that we maintain.
> They are already in the "large" SCOWL dataset. Adding accented
> characters to en-US.dic works today. Try it: Add "naïve" or "résumé" to
> the data and make sure the file gets saved as ANSI/ISO8859-1 and not
> UTF-8. Then type/paste those words into a text field with spell
> checking. Works!

Wonderful! If you have a list of words using these types of characters
that we need to add, please file a bug, and let's do that!

Jörg Knobloch

unread,

Dec 30, 2015, 4:20:50 AM12/30/15

to dev-pl...@lists.mozilla.org

On 30/12/2015 01:46, Ehsan Akhgari wrote:
> First things first, let's correct something here. We do _not_ maintain
> three word lists. We maintain one list: the list of words that the
> Firefox spellchecker accepts.

I know I sound like a broken record: I suggested to change the process
and maintain three lists.

> I'm afraid you're misunderstanding what's happened here. We only
> maintain one word list, and our process of merging upstream changes is
> purely additive. As a result, it doesn't handle the case where a word
> disappears from SCOWL.
>
> This is clearly a bug, and should be fixed.

I came to realise that my argument has a hole. On one had I'm
complaining that at the beginning of May 2015 words got removed, see:
https://hg.mozilla.org/mozilla-central/diff/bcb133a3cdca/extensions/spellcheck/locales/en-US/hunspell/dictionary-sources/orig/en_US.dic
(don't open in Firefox, it will hang, bug 1235321):
-relict
-residuary
-enforceability
(all still included in the "large" dataset).

On the other hand I'm complaining that wrong entries, like "remind's"
are maintained in the Mozilla data. "remind's" is not a valid word in
SCOWL (http://app.aspell.net/lookup?dict=en_US&words=remind%27s), but it
is in Mozilla. So there is a bug in the removal process.

Frankly, I can't understand how the current system could manage SCOWL
removals yet not remove words Mozilla specifically added. How does it
know that a word came from SCOWL and can be removed or it didn't come
from SCOWL and should be maintained? Broken record: Maintaining Mozilla
words differently (three lists) would fix this.

> We may still decide to keep
> individual words that SCOWL drops if we decide that we want the Firefox
> spell checker to accept them, but as a general rule we should probably
> follow upstream.

It is pretty much unmanageable to do this. On every refresh you would
have to add removed words manually. Broken record: Mozilla should not
manage the general English words (apart from some exceptions, see below).

>> We should leave it to SCOWL
>> to manage the plain English dictionary and only manage the Mozilla
>> additions (for which I see three classes, see above).
> I disagree. I think we should accept the words that we want, and then
> try to upstream them to SCOWL, without holding Firefox back until that
> happens. I experimented with this once
> <https://github.com/kevina/wordlist/issues/117> but unfortunately I
> haven't had the time to go through all of the list. (As a non-native
> speaker this task requires me to spend weeks looking things up in
> dictionaries!)

I sound like a broken record but you ignored my proposal: To facilitate
the process of having more words than SCOWL, I proposed to split these
"more words" into three files. The third file would contain "general"
words we request upstream.

> Wonderful! If you have a list of words using these types of characters
> that we need to add, please file a bug, and let's do that!

No I won't do that. I filed a a bug to use the "large" dictionary, but
you even changed the summary and hijacked it for something else. It
makes no sense to request a heap of words to be added to the Mozilla
dictionary, like "résumé", "née" and so on, which already exist in the
"large" dataset. Broken record: Mozilla doesn't want to be in the
business of managing this. Mozilla should be in the business of managing
Mozilla specific additions, and perhaps a small amount of general words
that get added (third list), which will then be requested upstream.

The current system can't do this, you resist changing it, so I just give
up, since I'm not using the defective en-US spelling anyway.

Jorg K.

Jörg Knobloch

unread,

Dec 30, 2015, 12:53:05 PM12/30/15

to dev-pl...@lists.mozilla.org

On 30/12/2015 10:20, Jörg Knobloch wrote:
> I can't understand how the current system could manage SCOWL removals
> yet not remove words Mozilla specifically added. How does it know that a
> word came from SCOWL and can be removed or it didn't come from SCOWL and
> should be maintained?

Oops, I was wrong again. It knows by comparing the previous SCOWL
version with the new one. I hope I'm excused, since I even managed to
confuse Ehsan ;-)

> I'm not using the defective en-US spelling anyway.

I'm not using it and it is defective, but I'm happy to help fixing it,
and I already made a start with this discussion ;-)

Let's continue the discussion in bug 1235506.

I see the following problems which we need to solve:
1) Guarantee a "rich" dictionary for the users where no "useful"
words get removed without review. IMHO too many "good" words
were removed by the April/May 2015 merge with SCOWL data.
2) Clean-up Mozilla additions with contain many errors.
3) Feed useful Mozilla additions back to SCOWL.

Jorg K.

Jim Mathies

unread,

Dec 30, 2015, 2:10:27 PM12/30/15

to

- no dictionary maintenance overhead for Mozilla
- I'm guessing a better, more robust dictionary in general
- a database that is standardized across multiple applications (including custom dictionary settings) for the same system
- less data in our install.. it might only amount to kilobytes, but when you multiply that by millions of downloads it adds up.

It's not obvious to me what an open source database engine provides for us. Can our current engine support 3rd party data providers? That's really what we want to do here.

Ehsan Akhgari

unread,

Dec 30, 2015, 4:39:05 PM12/30/15

to Jim Mathies, dev-pl...@lists.mozilla.org

On 2015-12-30 2:10 PM, Jim Mathies wrote:
> On Tuesday, December 29, 2015 at 2:38:43 PM UTC-6, Ehsan Akhgari wrote:
>> On 2015-12-28 5:11 PM, Jim Mathies wrote:
>>> We could research using native spell checking apis if the platform supports them. For example Windows added spell checking apis in Windows 8.
>>
>> It's not obvious to me why that would be an improvement. The
>> cross-platform nature of our spell checking engine is definitely a plus.
>
> - no dictionary maintenance overhead for Mozilla

We would still need to do that since not all of the platforms we target
have spell checking support.

> - I'm guessing a better, more robust dictionary in general

I would like to see some data backing that claim. I won't be surprised
if our hunspell based spell checker does better than for example the
Win8+ spell checker, at least for some of the languages we support.

> - a database that is standardized across multiple applications (including custom dictionary settings) for the same system

That is a good point.

> - less data in our install.. it might only amount to kilobytes, but when you multiply that by millions of downloads it adds up.

This will require us to be able to exclude parts of the default
installed package based on the OS version (since we need to support
hunspell for Win7 and below on Windows, for example) which we don't
have, AFAIK.

> It's not obvious to me what an open source database engine provides for us. Can our current engine support 3rd party data providers? That's really what we want to do here.

Our spell checking backend doesn't support non-hunspell based checkers
currently.

What I was referring to in terms of advantages of a cross platform spell
checking backend was having the same experience on all platforms, which
eases things such as dealing with bug reports, and also the ability to
address bug reports, etc.

Marco A.G.Pinto

unread,

Jan 2, 2016, 2:29:50 AM1/2/16

to Jörg Knobloch, dev-pl...@lists.mozilla.org

On 28/12/2015 22:32, Jörg Knobloch wrote:
>
> However, after sorting I was able to spot a few words the Mozilla
> version is lacking:
> residuary reproachfulness relict reformism enforceability makefile
> lycopod
> (I'm using a en-GB dictionary while writing this post, and all the
> words listed above are actually in that dictionary, with the exception
> of lycopod which should be lycopodium).
>
>

Jorg K, I have just added:
10041) lycopod (+plural)

--

Jesper Kristensen

unread,

Jan 2, 2016, 4:53:21 AM1/2/16

to

Den 28-12-2015 kl. 20:31 skrev Jörg Knobloch:
> Thirdly, the add-on dictionary contains 13% more words
> than the Mozilla maintained dictionary,

While bigger may not be better, I don't see why Mozilla should offer an
en-US dictionary for localized Firefox builds that is different than the
en-US dictionary for en-US builds.

The en-US dictionary for localized Firefox was last updated in March
2013 according to
https://addons.mozilla.org/da/firefox/addon/united-states-english-spellche/versions/

But the en-US dictionary for en-US Firefox was updated 61 times since
then according to
https://hg.mozilla.org/mozilla-central/log/tip/extensions/spellcheck/locales/en-US/hunspell/en-US.dic

Jörg Knobloch

unread,

Jan 2, 2016, 6:08:32 AM1/2/16

to dev-pl...@lists.mozilla.org

On 2/01/2016 10:53, Jesper Kristensen wrote:

> The en-US dictionary for localized Firefox was last updated in March
> 2013 according to
> https://addons.mozilla.org/da/firefox/addon/united-states-english-spellche/versions/

It is very unfortunate that this add-on maintained by "jooliaan" is so
badly out of date. I don't know how to contact the author. I suggest
that he synchronise the add-on with the Mozilla maintained en-US
dictionary once this has been improved, see below.

> But the en-US dictionary for en-US Firefox was updated 61 times since
> then according to
> https://hg.mozilla.org/mozilla-central/log/tip/extensions/spellcheck/locales/en-US/hunspell/en-US.dic

I'm working on another update which will remove 352 erroneous entries,
see bug 1235506 comment #37 and #38. We're also considering removing
(some of the) 6000 proper names of doubtful quality in bug 301712.

Furthermore we're planning to improve the dictionary in collaboration
with SCOWL by recovering some/most of the 5670 words removed by bug
1137544 in May 2005 through a refresh from an improved SCOWL dataset,
see bug 1235506 comment #39 and
https://github.com/kevina/wordlist/issues/138. The next update from
upstream will also include common variants (like "advisor").

Jorg K.

Jörg Knobloch

unread,

Jan 2, 2016, 6:31:35 AM1/2/16

to dev-pl...@lists.mozilla.org

On 2/01/2016 12:07, Jörg Knobloch wrote:
> Furthermore we're planning to improve the dictionary in collaboration
> with SCOWL by recovering some/most of the 5670 words removed by bug
> 1137544 in May 2005

There is always a mistake, spellcheck doesn't detect. ;-(

Make that May *2015*.

Jorg K.

Pascal Chevrel

unread,

Jan 2, 2016, 6:37:13 AM1/2/16

to

Le 02/01/2016 12:07, Jörg Knobloch a écrit :
> On 2/01/2016 10:53, Jesper Kristensen wrote:
>
>> The en-US dictionary for localized Firefox was last updated in March
>> 2013 according to
>> https://addons.mozilla.org/da/firefox/addon/united-states-english-spellche/versions/
>>
>
> It is very unfortunate that this add-on maintained by "jooliaan" is so
> badly out of date. I don't know how to contact the author. I suggest
> that he synchronise the add-on with the Mozilla maintained en-US
> dictionary once this has been improved, see below.
>

AFAIK, jooliaan (Giuliano Masseroni) is no longer contributing to the
Mozilla project, he was part of the Italian volunteer community.

Pascal

Jörg Knobloch

unread,

Jan 3, 2016, 7:06:24 AM1/3/16

to dev-pl...@lists.mozilla.org

I believe that a "rich" up-to-date US English dictionary should be
provided to Mozilla users. I have therefore published the current large
SCOWL dictionary as an add-on:
https://addons.mozilla.org/en-US/firefox/addon/us-english-dictionary/

I intend to refresh this add-on as new SCOWL versions become available.

Jorg K.

Jesper Kristensen

unread,

Jan 3, 2016, 9:19:09 AM1/3/16

to

Creating a second add-on with a different extension ID will not fix
things, only make them worse. Now users have two en-us dictionaries to
choose from, with no information telling which one is better. All
existing users are stranded on the old version.

And from the description of your new add-on, it seems it is not
identical to the one shipped with en-US Firefox, so users of localized
Firefox still don't have that dictionary available.

Mozilla should officially maintain the en-US dictionary on
https://addons.mozilla.org/en-US/firefox/language-tools/ , like Mozilla
officially maintains the language packs.

Jörg Knobloch

unread,

Jan 3, 2016, 11:03:01 AM1/3/16

to dev-pl...@lists.mozilla.org

On 3/01/2016 15:19, Jesper Kristensen wrote:
> Creating a second add-on with a different extension ID will not fix
> things, only make them worse. Now users have two en-us dictionaries to
> choose from, with no information telling which one is better. All
> existing users are stranded on the old version.

Clearly there is information on which one is "better".
Giuliano's version is from 2013, it gives no information of how it was
derived. My add-on gives exact details of which underlying data is used.

> And from the description of your new add-on, it seems it is not
> identical to the one shipped with en-US Firefox, so users of localized
> Firefox still don't have that dictionary available.

My add-on is different from the en-US dictionary shipped with Mozilla
products, and it clearly states what the differences are.

Users of localised Firefox can search for a dictionary and will find it.

Giuliano's version is also different from the en-US dictionary shipped
with Mozilla products and no one knows what the differences are. It may
represent the Mozilla en-US dictionary from an earlier date.

> Mozilla should officially maintain the en-US dictionary on

> https://addons.mozilla.org/en-US/firefox/language-tools/, like Mozilla

> officially maintains the language packs.

The language packs don't contain dictionaries. Links to dictionaries on
that page lead to add-ons maintained by third parties, who, as in the
case of Giuliano, may not keep their add-on up-to-date. Some languages
offer more than one dictionary which is totally confusing. For example,
German offers three links, two lead to dictionaries using the reformed
orthography. KaiRo explained to me personally why there need to be two.

As the only dictionary maintained by Mozilla, Mozilla's en-US dictionary
is a special case. I agree that Mozilla should make it available
somehow. Perhaps Giuliano's add-on could be "adopted" by Mozilla or
transferred to another willing contributor and kept in sync with the
currently released dictionary. Frankly, it's a 10 minute job every six
weeks to sync with the current Mozilla version.

My add-on uses the "large" SCOWL dataset (SCOWL size 70) and is an
alternative to Mozilla's en-US dictionary. It registers itself as
"en-US-large", so users can even use both at the same time. I created
the add-on because Giuliano's add-on is out-of-date and because
Mozilla's en-US dictionary has a number of problems (many invalid
entries, 6000 doubtful proper names, not rich enough due to 5670 words
removed in May 2015, no accented words). We're working on fixing some of
the problems, but Ehsan decided to use the "normal" SCOWL dataset (SCOWL
size 60) as base data which may still not be "rich" enough for some users.

There seem to be two approaches to spell checking: Some people believe
that "bigger is better", like myself and the maintainer of the (forked)
en-GB dictionary, Marco. Others believe that a restricted dictionary
will not mask spelling errors and is more useful. Ideally both versions
should be offered with a high level of quality, as SCOWL offers these
two sizes. I believe I made the first step with a "large" size while
we're still working on improving the "normal" size.

Jorg K.

Jesper Kristensen

unread,

Jan 3, 2016, 1:11:18 PM1/3/16

to

Den 03-01-2016 kl. 17:02 skrev Jörg Knobloch:
> As the only dictionary maintained by Mozilla, Mozilla's en-US dictionary
> is a special case.

I don't think it is that special. Some Firefox locales other than en-US
ship with built in dictionaries. For those, the add-on could be derived
from the source of the Firefox locale.

I maintain the Danish dictionary add-on on AMO. Whenever upstram
releases a new version, I commit it to the Firefox localization source,
and from there I have a script to generate an identical add-on for AMO:
http://hg.mozilla.org/releases/l10n/mozilla-aurora/da/file/tip/extensions/spellcheck/hunspell/extension.sh

Jörg Knobloch

unread,

Jan 3, 2016, 2:59:44 PM1/3/16

to dev-pl...@lists.mozilla.org

On 3/01/2016 19:11, Jesper Kristensen wrote:
> I don't think it is that special. Some Firefox locales other than en-US
> ship with built in dictionaries. For those, the add-on could be derived
> from the source of the Firefox locale.

It is special since Mozilla maintain the dictionary, they don't just
copy an upstream source:
https://hg.mozilla.org/mozilla-central/log/tip/extensions/spellcheck/locales/en-US/hunspell/en-US.dic

> I maintain the Danish dictionary add-on on AMO. Whenever upstream

> releases a new version, I commit it to the Firefox localization source,
> and from there I have a script to generate an identical add-on for AMO:

I think this is a very good approach. Your script shows that it is
simple to ship a dictionary as an add-on. As I said: Someone should
start updating the "official" en-US add-on on AMO again.

A little more background so that you see that English is much more
complicated than Danish:

When I started this thread, my aim was to stop maintaining the Mozilla
en-US dictionary and use whatever the upstream source, SCOWL in this
case, provides.

This was met fierce opposition. Mozilla use SCOWL data, but carry
forward additional words: Currently 6000 (doubtful) proper names, 37
Mozilla terms, 337 extra words and also 354 erroneous words which I am
about to remove.

If I had to decide, I'd use the SCOWL data, perhaps add the 37 Mozilla
terms for the geeks, so if they write "SpiderMonkey", they don't get an
error, and be done with it.

Since as an add-on author I can decide, I did exactly that. I took the
SCOWL data and put it into an add-on. The end. I didn't add the Mozilla
terms, simply because most users have never heard of "SpiderMonkey" and
won't use this word. Those who do, can add it to their personal dictionary.

SCOWL provide various "sizes". I used their "large" size, especially
since I know that SCOWL moved many useful and common words from their
"size 60" ("normal") to "size 70" ("large"). I also know that SCOWL
"size 60" doesn't contain common variants, like "advisor" instead of
"adviser". I have proposed to use the "large" size, but that was also
rejected. Ehsan's approach is to keep using the "normal" size, but do a
customised version to include the common variants. There are also
efforts to recover some of the 5670 words lost due to SCOWL-internal
changes, either by getting SCOWL to reclassify them or be adding them
back independently of SCOWL.

In other words, we're getting deeper entangled in a business I think
Mozilla shouldn't be in.

Jorg K.

Jörg Knobloch

unread,

Jan 3, 2016, 3:36:46 PM1/3/16

to dev-pl...@lists.mozilla.org, giuliano....@gmail.com

On 3/01/2016 15:19, Jesper Kristensen wrote:

> Mozilla should officially maintain the en-US dictionary on
> https://addons.mozilla.org/en-US/firefox/language-tools/ , like
> Mozilla officially maintains the language packs.

I've raised https://bugzilla.mozilla.org/show_bug.cgi?id=1236375.
BTW, I found Giuliano Masseroni on BMO ;-)

Jorg K.