On 2017-05-09 03:23, Hugh Paterson wrote:
> I have been reading about digital infrastructure for minority languages
> [1
> & 2]. One of the infrastructure components to be built before a
> language is
> alleged to have sufficient digital support "for text based processes
> like
> grammar checking" is morphological and grammar parsers. To what extent
> does
> working with hermitcrab or any of the FLEx parsers bring one a step
> closer
> to building a reusable parser that can be embedded in other
> applications?
I can't speak for the AMPLE parser. Hermit Crab (the other
morphological parser in FLEx) was designed to support grammar
development, with built-in debugging abilities, like being able to see
intermediate stages in derivation. While HC could in principle be used
as a stand-alone morphological parser for grammar checking, it is
probably not ideal; the state-of-the-art for that kind of thing is
finite state transducers (FSTs), such as Stuttgart FST, Xerox XFST, and
FOMA.
It would take some re-engineering to allow the morphological and
phonological grammar produced by FLEx to be used by the FST. That said,
I have already built a converter that takes an XML-based grammar and
turns it into SFST (Stuttgart FST) code; and the XML schema I use is
very similar to the model that was built into FLEx. So it might not
take much to convert a FLEx grammar into SFST code. There are two
stumbling blocks: my time (I'm not an SIL member), and getting FLEx to
output the grammar in an XML format. I'm not familiar with how the
model has been implemented in FLEx, so I don't know how easy the latter
step is. (Exporting the dictionary is also necessary, but from what
I've heard that is already doable.)
If there is indeed a need to enable morphological parsing as part of
infrastructures for minority languages, it might be possible to get NSF
funding for such a project. I'd be happy--delighted!--to work on that.
Someone (or better, several someones) using FLEx for morphologically
"interesting" languages would also need to be on the project, and
perhaps someone from the FLEx team, so we could figure out how to
extract the grammar in suitable form.
This of course does not touch on syntactic parsing. But frankly, I
doubt that anyone's grammar checker really does full syntactic parsing.
Rather, they look at chunks of text (I'm guessing with finite state
technology, or even something simpler) for finding the kinds of grammar
errors that they actually check for. But that's a different issue.
Mike Maxwell
University of Maryland