Andreas Joswig wrote:
"One thing confuses me:"
Don't feel bad. We are all confused by these things. They sometimes get so
complicated that I despair of ever knowing what is really going on in the
brain and how we can model it in a dictionary or parser. There have also
been some changes to FLEx recently, so I can't speak with any certainty. (In
order to write this email I had to test certain points in FLEx. <sigh>)
First I will speak to some general principles. Then I will talk about the
difference between the "Variants" section and the "Allomorphs" section. Then
I will give some "how to" instructions about how to handle various kinds of
forms.
There are three general principles that we need to keep in mind:
(1) What the parser needs and what the (published) dictionary needs are
often two different things. But we maintain everything in a single database.
(Result: occasional confusion, but it is far more efficient to maintain a
single database.)
(2) The way we capture information in the database and how it is presented
in a published dictionary are often very different. (Result: occasional
confusion, but it is far better to keep the information in a standardized
database and enable you to publish multiple dictionaries in a variety of
formats.)
(3) FLEx is an attempt to model linguistic behavior in a way that enables
you to investigate, analyze, and document a language's lexicon, morphology,
texts, etc. But obviously no one can completely model linguistic behavior
with a computer program. (Result: occasional limitations and inadequacies,
but FLEx is still an excellent tool for these purposes.)
Next an explanation of the "Variants" section and "Allomorphs" section":
The "Variants" section is where you create entries in the database for
variants, such as dialectal and spelling variants (isn't/ain't,
color/colour) and for irregularly inflected forms (were, went, ran). These
entries are generally formatted in a published dictionary as minor entries.
(Note that these entries are full entries in the FLEx database. I call them
"Variant" entries merely because they are for variants and irregularly
inflected forms, and are linked to the entry for the primary form. But as
far as the database is concerned, they are all just "entries".) Once you set
up a "Variant" entry, FLEx can do some nice things with it. For instance it
can use it to create a minor entry in your published dictionary. In addition
the parser will find the variant 'colour' and use the sense information from
the primary entry 'color'. You don't have to duplicate all the senses for
both entries. However if you want, you can also add one or more senses
(gloss, definition, example sentences, etc) for the variant 'colour' and the
parser will give you the option of using either the gloss(es) for 'color' or
the gloss(es) for 'colour'. It doesn't make much sense to do this for
variants, but it can be very useful for certain kinds of irregularly
inflected forms.
The "Allomorphs" section is where you enter allomorphs of a morpheme. You
can also enter "allomorphs" of a stem if you only want to parse down to the
stem level (e.g. re-gen-er-ate-d is parsed to the root level, regenerate-d
is parsed to the stem level). FLEx does not create an entry for an
allomorph. In order for the parser to deal with "errusthen", it must split
it into morphemes and label each morpheme. Since e-rrus-the-n contains an
allomorph "rrus" of the root "rus", I have to add "rrus" in the "Allomorphs"
section.
So the "Variants" section is used for both the published dictionary and the
parser. "Variant" entries are used to produce minor entries in the published
dictionary and the parser looks in them to find forms. On the other hand the
"Allomorphs" section is only for the parser. I don't know of any published
dictionary that includes allomorphs. (Although someone could include them.)
Next, how to handle XYZ:
How to handle an allomorph:
Enter the form of the allomorph in the Allomorph section of the Lexicon
Edit-Entry pane. Specify the conditioning environment in the Environments
field (e.g. / _ i). For instance for the English verb bend/bent you would
enter the primary form 'bend' in the Lexeme Form field and the allomorph
'ben' in the Allomorphs section.
How to handle a variant:
Enter the form of the variant in the Variants section of the Lexicon
Edit-Entry pane. Specify the type of variant it is in the Variant Type field
(e.g. spelling variant). For instance you would create an entry for primary
form "color". Then in the Variants section you would enter "colour" and
specify that it is a "British spelling variant". FLEx automatically creates
an entry for "colour" and links it to the primary entry "color". If you had
"colour" in your text corpus, the parser would find the form "colour" in the
entry for "colour" but use the sense information (grammatical category and
gloss) from the entry for "color".
How to handle an irregularly inflected form:
Enter the inflected form in the Variants section of the Lexicon Edit-Entry
pane. Specify the inflectional category in the Variant Type field (e.g. past
tense). For instance for feel/felt you would create a (main) entry for
"feel". Then in the Variants section you would enter "felt" and specify that
it is the "past tense" in the Variant Type field. (You would first have to
add "past tense" to the list of "Variant Types" in the Lists area.) FLEx
automatically creates an entry for "felt" and links it to the primary entry
"feel". Next you need to add an allomorph "fel" in the Allomorphs section
and specify the environment (/ _ t). If you had "felt" in your text corpus,
the parser would find the form "felt" in the entry for "felt" and would
suggest the analysis "feel + pst" (assuming that you gave "pst" as the
abbreviation for the Variant Type "past tense". The parser would also find
the allomorph "fel" of "feel" and the allomorph "-t" of "-ed", and would
suggest the analysis "feel -ed" with the appropriate glosses for each
morpheme. For the English word "felt" I would pick the second analysis.
How to handle suppletion:
Enter the inflected form in the Variants section of the Lexicon Edit-Entry
pane. Specify the inflectional category in the Variant Type field (e.g. past
tense). For instance for go/went you would create a (main) entry for "go".
Then in the Variants section you would enter "went" and specify that it is
the "past tense" in the Variant Type field. FLEx automatically creates an
entry for "went" and links it to the primary entry "go". (I would not add an
allomorph "wen" in the Allomorphs section, but you could. If you did, the
parser would offer two analyses as with "felt" above.) If you had "went" in
your text corpus, the parser would find the form "went" in the entry for
"went" and would suggest the analysis "go + pst". (The parser finds "went"
and sees that it is linked to "go". It finds that the Variant Type is "past
tense" and supplies the abbreviation "pst".)
How to handle a portmanteau morpheme (root + affix):
Enter the form of the portmanteau morpheme in the entry for the root in the
Variants section of the Lexicon Edit-Entry pane. Specify the inflectional
category in the Variant Type field (e.g. past tense). For instance you would
handle "were" under the main entry "be". You would enter "were" in the
Variants section and specify that it is the "past tense" in the Variant Type
field. (I don't know how to indicate that "were" must agree with "you(sg)"
"you(pl)" "we", or "they"). You would not enter "were" as an allomorph
because you cannot break "were" into morphemes and link one allomorph to
"be" and another to "pst". If you had "were" in your text corpus, the parser
would find the form "were" in the entry for "were" and would suggest the
analysis "be + pst". (The parser finds "were" and sees that it is linked to
"be". It finds that the Variant Type is "past tense" and supplies the
abbreviation "pst".) You would do the same for all the other forms of "be"
(was, are, am, is, etc). The parser uses the sense information from the main
entry for "be". It is best just to set up one primary entry for "be",
describe all the senses there, and create "Variant" entries for all the
other forms.
How to handle a portmanteau morpheme (affix + affix):
Enter the portmanteau morpheme as a main entry. Specify the combined meaning
in the gloss field. (Sorry, but I cannot illustrate this from English.) In
Greek the noun case suffixes are single morphemes (e.g. logo-i
'word-dative'). The plural is formed by suffixing the plural morpheme "-s"
after the case suffixes (e.g. logo-i-s 'word-dative-plural'). However the
genitive plural "-on" is a portmanteau morpheme. It cannot be split into two
morphemes, one meaning 'genitive' and the other meaning 'plural'. So it must
be entered as a main entry and given the gloss '
gen.pl'. (You might need to
set up a separate Affix Template in the Grammar-Category Edit area in order
to handle this.)
So if a language (such as Stephanie's) has a set of stem forms for a
particular verb, you have to ask several questions. (1) Can the parser
correctly identify the stem? If not, do I need to add allomorphs? (2) Can
the user of the published dictionary find the correct entry? If not, do I
need to add a minor entry (for errusthen) that will direct the user to the
main entry (ruomai)? (3) Is this a portmanteau morpheme, that is, does it
combine two morphemes into a single form that cannot reasonably be divided
(e.g. were, was)? If so, I need to create a "Variant" entry for the
portmanteau morpheme and may need to provide grammatical information and
semantic information for it.
This problem is also apparent in English verbs such as choose/chose and
run/ran (known as "vowel replacement" verbs or "ablaut"). A non-native
English speaker would not know where to find the past tense forms "chose"
and "ran". So you must create "Variant" entries for them and format them as
minor entries in the published dictionary. Since we cannot make up rules
that account (synchronically) for these forms, the parser has to look at the
"Variant" entries in order to find them.
[Note on go/went. Technically "wen" is not an allomorph of "go". It is not
derived from it historically and is not related to it by some morphophonemic
rule, that is, it is not related to "go" phonologically and you cannot
specify a phonological environment that governs its use. You can make up an
"ad hoc" rule, but that is a different matter. Historically "went" was the
past tense of "wend", but today "wended" is the past tense of "wend" and
"went" is the past tense of "go". So in cases of suppletion such as this you
have to somehow get the parser to interpret "went" as "go" + "Pst". Notice
that it is not "wen" + "-t", or even "wend" + "-ed". This is why I recommend
handling "went" as a "Variant" (i.e. an irregularly inflected form) rather
than an allomorph. Morphology gets really complicated sometimes. Isn't it
fun?]
No virus found in this incoming message.
Checked by AVG -
www.avg.com
Version: 9.0.801 / Virus Database: 271.1.1/2814 - Release Date: 04/15/10
23:31:00