the main trade-off is between using phonological rules versus using allomorphs conditioned by environments. See appendix B.3 in the "Intro to Parsing" document for the known limitations of the Hermit Crab parser.
There are several known limitations of the current implementation of the new experimental phonological rule-based parser.
The default parser for FieldWorks Language Explorer has a way for you to see what steps the parser took while parsing a word (see the Try a Word tool). This has not been implemented yet for the new experimental phonological rule-based parser. Our apologies for the fact that this new parser is thus a “black box” with no way to see inside. We simply ran out of time to do all that we would have liked to do in this area. A later version of FieldWorks Language Explorer will include this capability.
While the default FieldWorks Language Explorer parser will try a given affix as many times as its form is found within a single word, the new experimental phonological rule-based parser tries a given form (or affix process) only once per word. This is normally not an issue since it is quite rare for an affix to be repeated several times within a word. There are cases, however, where this is an issue. For example, Coward & Coward (2000) note that in Selaru, “It is possible to reduplicate /nini/, /soso/ and others basically without limit. As many as eight reduplication levels have been encountered in natural text.”
If you run into this limitation, a possible work-around is to add extra allomorphs for the affix involved or to add the form as a distinct lexical entry. You also, of course, have the option of just allowing the new parser to fail to parse such words.
When you define a natural class by listing the segments (as opposed to using phonological features), the new experimental phonological rule-based parser may not treat this natural class exactly as you expect. If you do not have any phonological features defined, then the new parser will treat the class as consisting solely of the segments listed in the class.
If, on the other hand, you have defined phonological features, then the new experimental phonological rule-based parser converts all the segments listed in the natural class into their respective feature sets. It then takes the set intersection of all those features and uses that to determine if a given segment is in that natural class. Normally, this is not an issue. In one case, however, when I was trying to deal with the recalcitrant case of the meN- prefix in Bahasa Indonesia (see section B.1.2.1.4) where a following p, t, k, or s, deletes, I knew that I was not aware of a real natural class that would cover these segments and not also include the other voiceless obstruents that do not delete. So I tried to by-pass this by creating a segment-based natural class that just included these four segments. Since I had also defined phonological features, this approach did not work for me. I had to create a special phonological feature whose value was + for these four segments and - for all other segments.
The kind of allomorphy described in section 3.8 is currently not handled by the new experimental phonological rule-based parser.
| [2] |
This new parser is an enhanced and updated version of Mike Maxwell's Hermit Crab parser. See http://www.sil.org/computing/HermitCrab/. We are deeply indebted to Mike for his pioneering work on this parser. |
I am using the Hermit Crab parser on Phuien (Puguli), a Gur language from Burkina Faso. In this language, there are also a few orthographic conventions that deviate from the phonological reality. My experience to this point has been that, if I write my rules to reflect the orthography instead of the phonology, they work.
Here's a simple (I think) example. The past tense suffix, which is a single vowel, when appended to a verb root, will assimilate to the features of the "rightmost" root vowel in a number of ways. One of them is nasality, so the suffix vowel becomes nasalized when the root vowel is nasalized. There is an orthographic convention, however, that nasalization (marked by a tilde over the vowel in the orthography) only needs to be indicated on the first vowel of a sequence, this being unambiguous in the language since Phuien does not have vowel sequences that are of mixed nasality (oral+nasal or nasal+oral), but only oral+oral or nasal+nasal.
Thus,
So, I had to make sure that any nasal-assimilation rule that I had for vowels would *not* apply in this particular case, contrary to the phonological reality.
Kevin Warfel
Thanks, Andy. I'm really glad to hear that Hermit Crab can handle phrases.
the main trade-off is between using phonological rules versus using allomorphs conditioned by environments. See appendix B.3 in the "Intro to Parsing" document for the known limitations of the Hermit Crab parser.
That's helpful. I'm appending that content below for those of us who are using older versions of FW or just want quick access. Section B.3.2 says you can "add extra allomorphs", so apparently Hermit Crab does make use of allomorphs (but not their enviroments?).
If the two parsers are going to coexist for quite some time, it would be helpful to document a concrete example, explaining how each parser would be set up to handle it.
In the language I'm working on, we have hardly any inflection (just n- m- and p- prefixes), and most derivations are phonologically straightforward. ......
Our teammate's language is basically the same, but its enclitics misbehave a bit:
I'd like to handle both languages with the same parser. Overall, which parser do you think would handle this data better?