Morphological and grammar parser

Hugh Paterson

unread,

May 9, 2017, 3:24:07 AM5/9/17

to flex...@googlegroups.com, writing-sys...@groups.sil.org

Greetings,

I have been reading about digital infrastructure for minority languages [1 & 2]. One of the infrastructure components to be built before a language is alleged to have sufficient digital support "for text based processes like grammar checking" is morphological and grammar parsers. To what extent does working with hermitcrab or any of the FLEx parsers bring one a step closer to building a reusable parser that can be embedded in other applications?

If I am not mistaken there has been some work going from FLEx to hunspell for spell checking. But what about grammar checking?

[1] Trosterud, Trond. 2012. A restricted freedom of choice: Linguistic diversity in the digital landscape. Nordlyd (Tromsø University Working Papers on Language and Linguistics) 39.2: 89-104.

[2] Arppe, Antti, Jordan Lachler, Trond Trosterud, Lene Antonsen and Sjur N. Moshagen. 2016. Basic Language Resource Kits for Endangered Languages: A Case Study of Plains Cree. In Soria, Claudia, et al. (eds.), In Proceedings of LREC 2016 Collaboration and Computing for Under-Resourced Languages: Towards an Alliance for Digital Language Diversity (CCURL) Workshop. Portorož, Slovenia, 23 May, 1-8.

all the best,

- Hugh Paterson III

Paul Nelson

unread,

May 9, 2017, 7:28:41 AM5/9/17

to flex...@googlegroups.com, writing-sys...@groups.sil.org

Great articles, Hugh.

Hunspell dictionaries are possible to make from a word list. We have not put the work into morphological parsing so we can check casing. Thus, all permutations are required for agglutinative languages at this point.

Grammar checking is definitely on any list of work for the Language Software Development team at this time. There are two key ingredients needed:

1. the data about the grammar necessary to build a grammar checker

2. the people resources to build and maintain the infrastructure to build grammar checkers.

#1 is totally dependent upon the individuals developing each language's data.

#2 Volunteers who know how to do this work are most welcome. :-)

Paul

--
You are subscribed to the publicly accessible group "FLEx list".
Only members can post but anyone can view messages on the website.
To change your status, please write to flex_d...@sil.org.
You can join this group by going to http://groups.google.com/group/flex-list.
---
You received this message because you are subscribed to the Google Groups "FLEx list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to flex-list+unsubscribe@googlegroups.com.
To post to this group, send email to flex...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/flex-list/CAB0NEmyqMO_TigmPsHFmU3Ldm-GHnT6%2B6ZaLKGVxwYWhdVLKTA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

maxwell

unread,

May 9, 2017, 12:00:05 PM5/9/17

to flex...@googlegroups.com, writing-sys...@groups.sil.org

On 2017-05-09 03:23, Hugh Paterson wrote:
> I have been reading about digital infrastructure for minority languages
> [1
> & 2]. One of the infrastructure components to be built before a
> language is
> alleged to have sufficient digital support "for text based processes
> like
> grammar checking" is morphological and grammar parsers. To what extent
> does
> working with hermitcrab or any of the FLEx parsers bring one a step
> closer
> to building a reusable parser that can be embedded in other
> applications?

I can't speak for the AMPLE parser. Hermit Crab (the other
morphological parser in FLEx) was designed to support grammar
development, with built-in debugging abilities, like being able to see
intermediate stages in derivation. While HC could in principle be used
as a stand-alone morphological parser for grammar checking, it is
probably not ideal; the state-of-the-art for that kind of thing is
finite state transducers (FSTs), such as Stuttgart FST, Xerox XFST, and
FOMA.

It would take some re-engineering to allow the morphological and
phonological grammar produced by FLEx to be used by the FST. That said,
I have already built a converter that takes an XML-based grammar and
turns it into SFST (Stuttgart FST) code; and the XML schema I use is
very similar to the model that was built into FLEx. So it might not
take much to convert a FLEx grammar into SFST code. There are two
stumbling blocks: my time (I'm not an SIL member), and getting FLEx to
output the grammar in an XML format. I'm not familiar with how the
model has been implemented in FLEx, so I don't know how easy the latter
step is. (Exporting the dictionary is also necessary, but from what
I've heard that is already doable.)

If there is indeed a need to enable morphological parsing as part of
infrastructures for minority languages, it might be possible to get NSF
funding for such a project. I'd be happy--delighted!--to work on that.
Someone (or better, several someones) using FLEx for morphologically
"interesting" languages would also need to be on the project, and
perhaps someone from the FLEx team, so we could figure out how to
extract the grammar in suitable form.

This of course does not touch on syntactic parsing. But frankly, I
doubt that anyone's grammar checker really does full syntactic parsing.
Rather, they look at chunks of text (I'm guessing with finite state
technology, or even something simpler) for finding the kinds of grammar
errors that they actually check for. But that's a different issue.

Mike Maxwell
University of Maryland

maxwell

unread,

May 9, 2017, 12:03:17 PM5/9/17

to flex...@googlegroups.com

On 2017-05-09 11:59, maxwell wrote:
> I can't speak for the AMPLE parser. Hermit Crab (the other...

This msg bounced from the writing systems mailing list, because I'm not
subscribed to it. If someone thinks it's relevant to that list, perhaps
you could forward it.

Mike Maxwell

Craig

unread,

May 31, 2017, 9:05:46 PM5/31/17

to flex...@googlegroups.com

FLEx 8.3 includes a utility to write the FLEx model to an XML file. This
can be used by a stand-alone Hermit Crab exe (which I have been using as
a stand-alone spell checker.) You can generate the XML file and see
whether it is useful for your purposes.

The tool is installed into the FieldWorks install directory. It is
called "GenerateHCConfig". You just provide it with the path to the FW
project file (.fwdata) and the path to HC config file to generate (.xml).

> GenerateHCConfig.exe -i <project>.fwdata -o <project>HCInput.xml

Craig.

Reply all

Reply to author

Forward