I am having several problems getting Flex to parse correctly. The language I am working with has lots of different morphemes that have the same surface forms. Flex is often suggesting the wrong one. For example the prefix “ni-“ can be the noun class prefix for noun class 5, the direct object marker on verbs, and the first person plural subject marker on verbs. I have made affix templates and I have parsed hundreds of words manually but Flex still puts the noun class 5 prefix on verbs. What could I be missing in my configuration that Flex wants to put a noun prefix on a verb stem?
The prefix “mu-“ is even more problematic because it can be even more things. It can be noun class 1, noun class 3, noun class 18, direct object marker, or second person plural subject marker for verbs. The word “muthu” is noun class 1 and means “person” or “human being”. I looked at the Sena sample database and have tried to follow their example. I assigned the inflection features “Bantu noun class 1/2 “ to the root “thu”. Unfortunately, in Word analysis it proposes an analysis for mu-, nounclass 1 and another one for mu- nounclass 3. If I have assigned noun class 1\2 agreement to the root why would the parser even suggest a prefix from another noun class?
I would appreciate any insights or suggest anyone might have on how to get the inflectional features to work better.
Jeff S.
> If I have assigned noun class 1\2 agreement to the root why would
> the parser even suggest a prefix from another noun class?
Do you have the parser turned on?
If you don't explicitly use the "Start Parser" command (I think it's
in the Parser menu), then you are using the "manual
interlinearizer". That is a "brute force" interlinearizer that
allows you to do anything you want. For a lot of situations, that is
good, especially when someone is at the beginning of learning how the
language works, and they haven't worked out all the constraints yet.
For this kind of "parsing", the only thing FLEx is testing is whether
prefixes come before roots/stems come before suffixes (etc.), and
whether the morphemes you have split something into are in the
lexicon. It can suggest a way to split a word only in the case where
that precise fully inflected word has been manually parsed somewhere
else in your corpus, but it cannot guess morpheme breaks for new
combinations of morphemes, even if it knows all of the morphemes.
All of the grammatical stuff you specify (environments for
allomorphs, restrictions about what can attach to what, positions in
a template, adhoc rules, various other things) only apply if you have
the parser turned on. When the parser has guessed something, you'll
see a tan background on that word, rather than the blue background
that indicates the manual interlinearizer suggestion.
One reason having the parser off is the default is that FLEx tends to
run slower when it is on. I also find that it takes a really long
time to load the grammar, but I guess that depends a lot on the
database--some people don't have that problem. (In some versions of
FLEx, the parser got turned off every time you did a refresh, also.)
But for languages with a lot of morphology, the parser can be really
nice. It can also serve as a spell-checker, once you get it set up
really well. (That is, a more effective spell-checker than just
checking against a list of fully inflected words--it's easy to get a
misspelled word into a list like that.)
Sorry if you knew all of that, and that is not what is going on....
-Beth
Some of us do, yes.
> For "maximum number of
> roots" My configuration was set as "(null)". The help file suggests that 1
> or 2 is typical for this parameter. What are the implications of having
> this set as "(null)"?
>
It should then be treated as 1 unless you have defined at least one
compound root. In that case, it will treat it as 2.
If you have single words that contain more than 2 root/stem lexical
entries, then you will need to make this number be the maximum number of
root/stem entries a single word can contain.
--Andy