Hello,
I am in the process of open-sourcing a cross-variety English
spellchecker project at
http://github.com/divec/en-global .
This "global" spellchecker allows all national variant English
spellings (so "color" and "colour" are both accepted) and has better
global placename and personal name support than region-specific
spellcheckers. It is helpful whenever cross-regional forms should be
supported; e.g. for English learners or international collaboration.
I am packaging it as an XPI extension. Currently I have to do this by
creating multiple identical files, which each pretend to be a national
variant; i.e.:
en-AU.aff en-AU.dic
en-CA.aff en-CA.dic
...
en-US.aff en-US.dic
This is necessary because Mozilla identifies hunspell dictionary
languages via a xx-YY naming convention where YY is the ISO-3166
country code.
Obviously this is inefficient and misleading (because we're checking
"en", not "en-AU" or "en-CA"). This same issue would affect "global"
(multi-variety) spellcheckers for other languages such as Spanish.
At the moment, I think the best way to fix this would be for mozilla
to allow BCP 47-style codes for hunspell dictionaries. It would be an
opportunity to support language varieties such as these:
en
nan
sr-Latn
es-419
sl-IT-nedis
az-Arab-x-AZE-derbend
Then "global" spellcheckers would work via BCP 47's fallback rules
(en-US -> en etc). At the same time, other varieties such as Serbian
written in Latin would be supported.
Does everyone agree that this would be a good approach?
Thanks,
--
David Chan