Verb forms

83 views
Skip to first unread message

Heidi James Rosendall

unread,
Sep 29, 2008, 9:03:19 AM9/29/08
to flex...@googlegroups.com

Hi,

I have a FLEx user who has been “entering different forms of verbs as “inflectional variants.”” She was hoping to be able to search for all entries with the same variant condition to look for similarities, but there doesn’t seem to be any way of doing this!

Rather than trying to figure it out myself and recommending a non-standard practice, I’m posting her question here. Is she recording her verbs the best way or is there a better way? Having done it this way, how should she search for them?

This is a problem I am coming up against in FLEx rather often. Many times I see written that the “proper” way for recording the data is… But most of my folks are NOT recording in FLEx data which has been analyzed but data which is being analyzed. Thus, what might be proper once we know what it is, is not helpful at the beginning or middle stages. What she wants is a way to store the variant forms of her verbs before she has quite figured out what they mean, and a way to gradually categorize them and record information about them and sort them so that she CAN figure out what they mean. Of course, at the end of the analysis, I expect that most of these variant forms will be described by a phonological, morphological, or grammatical rule with only truly unpredictable forms will stay as variants. But meanwhile, how does she enter the data in a way which will be most helpful for her?

 

Heidi Rosendall

Language Software and Publications Manager

Wycliffe Nigeria

Heidi_R...@Wycliffe.org

http://download.skype.com/share/skypebuttons/buttons/call_blue_transparent_70x23.gif

 

 

 

image001.gif

Andy Black

unread,
Sep 29, 2008, 6:58:39 PM9/29/08
to flex...@googlegroups.com
On 9/29/2008 6:03 AM, Heidi James Rosendall wrote:

I have a FLEx user who has been “entering different forms of verbs as “inflectional variants.”” She was hoping to be able to search for all entries with the same variant condition to look for similarities, but there doesn’t seem to be any way of doing this!


The next version of FLEx has the ability to see and filter the variant conditions in the lexicon browse pane.  So it's coming soon!

--Andy

Ronald Moe

unread,
Sep 29, 2008, 7:11:56 PM9/29/08
to flex...@googlegroups.com

Heidi Rosendall wrote:

I have a FLEx user who has been “entering different forms of verbs as “inflectional variants.”

 

If I remember correctly there was a thread on this list about this topic. But maybe it was somewhere else. In any case the problem is that we need a way to elicit and record the paradigms of verbs and nouns and other grammatical categories that can be inflected. FLEx currently does not a have a tool to do this. I would recommend that you set up a custom field for each inflected form. If your language is highly inflected you will have to select a few inflected forms that are representative of the entire paradigm. In English you would need a plural field for nouns. For the verbs you need a field for the 3rd singular (breaks), the participle (breaking), the past tense (broke), and the past participle (broken). Copy the lexeme form into each of these paradigm fields. Use Bulk Replace to add the appropriate affix. Use the check boxes to eliminate exceptions. You may have to enter some of the irregular forms by hand. As you work, you should indicate the inflection classes and/or inflection features that you encounter. You should also create an entry for each affix. When you are done, you should have a pretty good idea of how the morphology works, including morphophonemic rules and allomorphs. At that point you can make decisions about which irregularly inflected forms need a minor entry in the published dictionary. You would then create an entry for each of these irregular forms, set the Morph Type to ‘Irregularly Inflected Form’, and link them to the primary entry in the Primary Entry Reference field. (Note that irregularly inflected forms are not “Inflectional Variants”. They are not variants of a basic form. So I don’t like calling them variants.)

 

If you just add irregular forms as you encounter them, you won’t be able to deal with them systematically or efficiently. Nor will you be able to develop a wise and well-informed strategy for dealing with them in your published dictionary. Only by dealing with them all at once can you be sure that you have identified all the patterns and exceptions. You can also build all the information into the parser. Doing it this way enables you to sort or filter for particular patterns.

 

Ron Moe

 



No virus found in this incoming message.
Checked by AVG - http://www.avg.com
Version: 8.0.169 / Virus Database: 270.7.5/1696 - Release Date: 9/28/2008 1:30 PM

Allan Johnson

unread,
Oct 1, 2008, 12:45:50 AM10/1/08
to flex...@googlegroups.com
What Ron describes is a valuable way to organize the lexical data, once you understand things well enough to decide which forms are inflections of what and which forms are better viewed as derivations. A general rule that Ron is following is to let derivations, but not inflected forms, be stored as separate entries. (Though stored as separate entries, these derived forms don't have to be displayed as separate entries). We should have a place in the database to put any inflected form that we encounter, but Ron is saying that the best place would be in a particular field of the stem/root entry rather than giving it an entry of its own. I agree with this I think - but when we're first recording these words, there's a lot we may not know. For some languages, the decision of what to consider inflection and what to consider derivation is not at all clear cut. So however carefully we try to analyze things from the start, we will likely change our mind more than once before finishing the dictionary.

So I also see value in the straightforward way that Heidi's users have been entering their data. Letting each word simply go into its own entry. And then linking these entries as their relationships are determined, by using the "Entry Type" and "Primary Entry Reference" fields.

So something we should think about is, after having linked related forms as "Derivations" or "Inflectional Variants", having analyzed and studied them in this way, and having determined which ones really don't belong in their own independent entries, do we have, or can we develop, a way for FLEx to transform this data into a form more like what Ron is suggesting? This would be another case of stealth-to-wealth methodology, which FLEx already uses in a number of places.

Allan


Ronald Moe wrote:

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the discussion group "FLEx list". This group is hosted by Google Groups and is open for anyone to browse.
To post to this group, send email to flex...@googlegroups.com
To unsubscribe from this group, send email to flex-list-...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/flex-list?hl=en
-~----------~----~----~----~------~----~------~--~---

Ronald Moe

unread,
Oct 1, 2008, 1:50:28 AM10/1/08
to flex...@googlegroups.com

Allan Johnson wrote:

do we have, or can we develop, a way for FLEx to transform this data into a form more like what Ron is suggesting?

 

Not yet, but this is something we’ve thought about. There is a need for a tool that would make it very easy to merge or split entries. We already have some of this functionality. But we need a way to call up multiple entries that may be related in various ways, for instance a “root” and its complex forms, and decide what entries we need and what information should go under each. What we initially thought should be handled as separate entries might better be handled under a single entry. Likewise, we might decide that a particular sense should be handled as a separate entry. What we initially analyzed as a case of collocation might turn out to be an idiom. What we initially analyzed as derivation might turn out to be inflection and vice versa. It should be easy to fix things up.

 

If we had such a tool, it wouldn’t matter (as much) whether we got it right at first. During the initial stages of our work we could create as many entries as we wanted and link them. Or we could lump lots of data together under a single headword. Later when we understand the language better, we could work through our entries and make well informed decisions. If you are familiar with my five stages of dictionary development, this process belongs to stage 4, but could be done at any point when the user feels a need for it.

 

The way I envisage this tool, it would allow us to bring up one or more entries and drag senses from one entry to another and to reorder the senses within an entry. It would allow us to easily create a second new entry that would be displayed beside the first. The user could then select senses (or other fields) and drag them over into the new entry. Alternatively the user could choose to duplicate an entry and then delete information from each copy until what is left in each belongs to the appropriate entry. This sort of tool could be used to split homonyms and reorder senses. We can currently move a sense up or down, but this is neither efficient nor especially easy.

 

We can already move a sense to another entry. We can also create two windows by duplicating the database. (You do this by clicking Window—New Window.) But this is cumbersome. We need to be able to see all the senses of both entries on screen at one time. So the senses need to be displayed in an abbreviated view, not like what we have in Lexicon Edit—Entry view. We also need a nice handle for each sense that we can easily click and drag with a mouse. We could also design keyboard shortcuts for those who prefer to use the keyboard. The important thing is to be able to massage our entries by quickly and easily splitting entries, merging entries, splitting senses, merging senses, moving senses, moving example sentences, etc.

 

For starters it would be nice to have a version of Lexicon Edit that would permit two Entry views next to each other instead of an Entries (browse) view on the left and an Entry (edit) view on the right. We could use the ‘Show Hidden Fields’ feature to just show the bare minimum of fields. Then we could use the click and drag feature that already exists to drag senses from one entry to the other. I guess the programmers will have to tell us how hard it would be to implement this.

 

Ron Moe

 


No virus found in this incoming message.
Checked by AVG - http://www.avg.com
Version: 8.0.169 / Virus Database: 270.7.5/1696 - Release Date: 9/28/2008 1:30 PM

 

No virus found in this incoming message.
Checked by AVG - http://www.avg.com

Version: 8.0.173 / Virus Database: 270.7.5/1700 - Release Date: 9/30/2008 11:03 AM

Ronald Moe

unread,
Oct 1, 2008, 1:59:26 AM10/1/08
to flex...@googlegroups.com

Allan Johnson wrote:

So I also see value in the straightforward way that Heidi's users have been entering their data. Letting each word simply go into its own entry. And then linking these entries as their relationships are determined, by using the "Entry Type" and "Primary Entry Reference" fields.

 

At one point the FLEx programmers were toying with the idea of combining the wordforms inventory and the lexicon into a single list. The user would link inflected forms to the “primary” entry. We’ve already seen that there are ways to semi-automate this process with the Bulk Edit tools or the parser. This idea would do away with some of the problems that we have in the initial stages of language learning and analysis. Unfortunately it also creates a few other problems. But it is an interesting proposal. It would be a radical approach and would actually more closely model the way the mental lexicon works (maybe). It is worth exploring the possibilities.

 

Ron Moe

 


From: flex...@googlegroups.com [mailto:flex...@googlegroups.com] On Behalf Of Allan Johnson


Sent: Tuesday, September 30, 2008 9:46 PM
To: flex...@googlegroups.com
Subject: [FLEx] Re: Verb forms

 

What Ron describes is a valuable way to organize the lexical data, once you understand things well enough to decide which forms are inflections of what and which forms are better viewed as derivations. A general rule that Ron is following is to let derivations, but not inflected forms, be stored as separate entries. (Though stored as separate entries, these derived forms don't have to be displayed as separate entries). We should have a place in the database to put any inflected form that we encounter, but Ron is saying that the best place would be in a particular field of the stem/root entry rather than giving it an entry of its own. I agree with this I think - but when we're first recording these words, there's a lot we may not know. For some languages, the decision of what to consider inflection and what to consider derivation is not at all clear cut. So however carefully we try to analyze things from the start, we will likely change our mind more than once before finishing the dictionary.



So I also see value in the straightforward way that Heidi's users have been entering their data. Letting each word simply go into its own entry. And then linking these entries as their relationships are determined, by using the "Entry Type" and "Primary Entry Reference" fields.

So something we should think about is, after having linked related forms as "Derivations" or "Inflectional Variants", having analyzed and studied them in this way, and having determined which ones really don't belong in their own independent entries, do we have, or can we develop, a way for FLEx to transform this data into a form more like what Ron is suggesting? This would be another case of stealth-to-wealth methodology, which FLEx already uses in a number of places.

Allan


Ronald Moe wrote:

Heidi Rosendall wrote:

I have a FLEx user who has been “entering different forms of verbs as “inflectional variants.”

 

If I remember correctly there was a thread on this list about this topic. But maybe it was somewhere else. In any case the problem is that we need a way to elicit and record the paradigms of verbs and nouns and other grammatical categories that can be inflected. FLEx currently does not a have a tool to do this. I would recommend that you set up a custom field for each inflected form. If your language is highly inflected you will have to select a few inflected forms that are representative of the entire paradigm. In English you would need a plural field for nouns. For the verbs you need a field for the 3rd singular (breaks), the participle (breaking), the past tense (broke), and the past participle (broken). Copy the lexeme form into each of these paradigm fields. Use Bulk Replace to add the appropriate affix. Use the check boxes to eliminate exceptions. You may have to enter some of the irregular forms by hand. As you work, you should indicate the inflection classes and/or inflection features that you encounter. You should also create an entry for each affix. When you are done, you should have a pretty good idea of how the morphology works, including morphophonemic rules and allomorphs. At that point you can make decisions about which irregularly inflected forms need a minor entry in the published dictionary. You would then create an entry for each of these irregular forms, set the Morph Type to ‘Irregularly Inflected Form’, and link them to the primary entry in the Primary Entry Reference field. (Note that irregularly inflected forms are not “Inflectional Variants”. They are not variants of a basic form. So I don’t like calling them variants.)

 

If you just add irregular forms as you encounter them, you won’t be able to deal with them systematically or efficiently. Nor will you be able to develop a wise and well-informed strategy for dealing with them in your published dictionary. Only by dealing with them all at once can you be sure that you have identified all the patterns and exceptions. You can also build all the information into the parser. Doing it this way enables you to sort or filter for particular patterns.

 

Ron Moe

 


From: flex...@googlegroups.com [mailto:flex...@googlegroups.com] On Behalf Of Heidi James Rosendall
Sent: Monday, September 29, 2008 6:03 AM
To: flex...@googlegroups.com
Subject: [FLEx] Verb forms

 

Hi,

I have a FLEx user who has been “entering different forms of verbs as “inflectional variants.”” She was hoping to be able to search for all entries with the same variant condition to look for similarities, but there doesn’t seem to be any way of doing this!

Rather than trying to figure it out myself and recommending a non-standard practice, I’m posting her question here. Is she recording her verbs the best way or is there a better way? Having done it this way, how should she search for them?

This is a problem I am coming up against in FLEx rather often. Many times I see written that the “proper” way for recording the data is… But most of my folks are NOT recording in FLEx data which has been analyzed but data which is being analyzed. Thus, what might be proper once we know what it is, is not helpful at the beginning or middle stages. What she wants is a way to store the variant forms of her verbs before she has quite figured out what they mean, and a way to gradually categorize them and record information about them and sort them so that she CAN figure out what they mean. Of course, at the end of the analysis, I expect that most of these variant forms will be described by a phonological, morphological, or grammatical rule with only truly unpredictable forms will stay as variants. But meanwhile, how does she enter the data in a way which will be most helpful for her?

 

Heidi Rosendall

Language Software and Publications Manager

Wycliffe Nigeria

Heidi_R...@Wycliffe.org

http://download.skype.com/share/skypebuttons/buttons/call_blue_transparent_70x23.gif

 

 

 

 

No virus found in this incoming message.
Checked by AVG - http://www.avg.com
Version: 8.0.169 / Virus Database: 270.7.5/1696 - Release Date: 9/28/2008 1:30 PM

 

No virus found in this incoming message.
Checked by AVG - http://www.avg.com

max...@umiacs.umd.edu

unread,
Oct 1, 2008, 9:14:44 AM10/1/08
to flex...@googlegroups.com, flex...@googlegroups.com
Allan wrote:
> ...A general rule that Ron is following is to let derivations,

> but not inflected forms, be stored as separate entries.

An alternative that ought at least to be discussed (maybe it already has
been) is that irregular/ unpredictable forms should be stored as separate
entries, regardless of whether they are derivational or inflectional (or
compounds). This would differ from the above in two ways:

1) Regular/ predictable derivational forms would not be stored. Examples
in English would include gerunds, which are arguably derived nouns. For
other languages, causatives (which are usually thought of as derived, not
inflected forms) might be regular and therefore not needing to be stored.

2) Irregular inflected forms would be stored as separate entries.

The alternative to (2) is:

> should have a place in the database to put any inflected form that we
> encounter, but Ron is saying that the best place would be in a
> particular field of the stem/root entry rather than giving it an entry
> of its own.

As Allen writes, one problem with this is:

> ...when we're first recording


> these words, there's a lot we may not know. For some languages, the
> decision of what to consider inflection and what to consider derivation
> is not at all clear cut. So however carefully we try to analyze things
> from the start, we will likely change our mind more than once before
> finishing the dictionary.

I'm not sure how much difference there actually is between storing
irregular inflected forms as separate entries vs. storing them in a field
of their "parent" entry/ lexeme. One diff will be that for most irregular
forms, the irregularity is confined to certain lexemes, and there's a long
tail of what forms might be irregular. For English verbs, the verb 'feed'
has a singular irregular form, 'fed' (with the past participle defaulting
to the past tense); 'break' has two irregular forms, 'broke' and 'broken';
and 'be' is irregular in almost every form. You don't want to have fields
for every verb that cover all the forms for which 'be' is irregular;
having separate entries for irregular forms allows you to handle this (a
verb like 'walk' just doesn't have any irregular forms linked to it).

Another possible diff would be accounting for defective paradigms. Verbs
like 'smite', 'dive' and 'stride' in English just don't have past
participles (well, you can look in a dictionary or the KJV, but if English
were an unwritten language you'd be hard pressed to find a speaker who
would use those forms--people generally use circumlocutions to avoid
them). If you use a field for the irregular forms, you could have a
special way to mark the field if there is no form for it (and this has to
be different from the form being regular, and therefore not listed). I'm
not sure how you'd do that with the separate entries method.

You also need to handle the case where there are two (or more) irregular
forms in common use, or one irregular form and one regular form ('burned'
and 'burnt'). This can of course be due to dialectal variation, but it
can be just idiolectal.

Another approach would be to combine the "one field per irregular form"
and the "separate lexical entry" methods, by having the thing in the field
be a pointer to the separate lex entry. (Actually, I would have thought
that the model would have worked this way already. Maybe it does, Ron?)

Mike Maxwell
CASL/ U MD

Michael Boutin

unread,
Oct 1, 2008, 9:32:55 AM10/1/08
to flex...@googlegroups.com
Allan wrote:
> ...A general rule that Ron is following is to let derivations,
> but not inflected forms, be stored as separate entries.

Mike Maxwell wrote:
An alternative that ought at least to be discussed (maybe it already has
been) is that irregular/ unpredictable forms should be stored as separate
entries, regardless of whether they are derivational or inflectional (or
compounds). This would differ from the above in two ways:

1) Regular/ predictable derivational forms would not be stored. Examples
in English would include gerunds, which are arguably derived nouns. For
other languages, causatives (which are usually thought of as derived, not
inflected forms) might be regular and therefore not needing to be stored.

The problem with this is:
Derivation is usually unpredictable. Even in languages with very productive
morphological causatives, you can't slap a causative verb on any verb.

Michael Boutin

max...@umiacs.umd.edu

unread,
Oct 1, 2008, 11:22:23 AM10/1/08
to flex...@googlegroups.com, flex...@googlegroups.com
Michael Boutin wrote:
> The problem with this is:
> Derivation is usually unpredictable. Even in languages with very
> productive morphological causatives, you can't slap a causative
> verb on any verb.

We could debate these points (see e.g. The Handbook of Morphology pp.
226-227), but regardless, the main point holds: whether you list a form in
the dictionary normally depends more on whether it is formed productively
or not, as opposed to whether it is derivational or inflectional. (This
is somewhat different from the mental lexicon, where there is some
evidence that very common regular forms are also "listed".)

Ronald Moe

unread,
Oct 1, 2008, 1:45:09 PM10/1/08
to flex...@googlegroups.com
Mike Maxwell wrote:
"Another approach would be to combine the "one field per irregular form"
and the "separate lexical entry" methods, by having the thing in the field
be a pointer to the separate lex entry. (Actually, I would have thought
that the model would have worked this way already. Maybe it does, Ron?)"

No, but it should. Actually it appears that there has been a
misunderstanding in this discussion. We can (and generally should) document
inflected forms in the entry for the stem *and* create separate entries for
irregularly inflected forms. There are some complexities here that need
explaining:

In a previous email in this thread I described a discovery procedure in
which you would create a custom field for every paradigm form (or selected
representatives of an extensive paradigm). These inflected forms would be
stored under the stem. The data could then be "harvested" for insights,
rules, irregularly inflected forms, etc. But the data in the custom fields
would not go into the printed dictionary unless the language had
sufficiently irregularity or unpredictability that each entry needed to
indicate how the stem should be inflected. For instance most Bantu languages
need to indicate the plural of nouns because it is not always predictable.
Other Bantu languages are very regular and only need to occasionally
indicate the plural. English dictionaries only indicate the plural when it
is irregular (child, pl: children). English verbs are mostly regular, but we
often need to indicate the past and past participle forms:

break (broke, broken) v. To ...

However this method of recording and possibly displaying inflected forms in
the entry for the stem does not solve the problem that users face in trying
to find irregularly inflected forms. A non-native speaker looking for
'broke' would look on page 168 of my American Heritage Dictionary. But the
entry for 'break' is on page 162. So the practice is to create a minor entry
for 'broke' on page 168 that points the user to the entry 'break'. In FLEx
you do this by creating a separate entry in the database for the irregularly
inflected form 'broke'. In the Entry Type field you specify that it is an
'Irregularly Inflected Form' (a.k.a. 'Inflectional Variant'). Then in the
'Primary Entry Reference' field you link it to the stem entry 'break'.

FLEx requires you to create a separate database entry for variants and
irregularly inflected forms if you want to create a minor entry for them in
the published dictionary. (If you don't need a minor entry, then there is no
need to create an entry in the database.) The reason for this is that some
users want to indicate additional information about the variant or
irregularly inflected form. For instance sometimes it is helpful to add a
short definition or gloss. In some cases the user wants to indicate the
pronunciation or include an example sentence. So the FLEx team decided that
the entries for variants and irregularly inflected forms should have all the
regular fields available. In other words, all entries in the database are
full entries. But you can decided how much information to put in the entries
for variants and irregularly inflected forms. You can also decide what
fields to export to the published dictionary. This gives you the freedom to
record as much information as you want to in the database, but to format
these entries as minor entries in the published dictionary with minimal
information (or lots of information if you wish).

There has also been discussion about tools that would make it more efficient
to do all this. I would like to see a tool that makes it easy to investigate
paradigms and then generate (minor) entries for irregularly inflected forms.
The tool could also make it easy to decide which irregular or regular forms
to indicate in the stem entry as in the example 'break' above. These kinds
of tools would make it much easier to use FLEx as a tool for investigating a
language rather than just a tool to record the information after you have it
all figured out. I know that the designers of FLEx wanted it to be a tool
for investigation as well as documentation. So it has a lot of tools and
features that were designed to facilitate investigation. However the
priority had to be on documentation. There is no point in discovering
something if you have no place to document your discovery.

I could go on and talk about how to apply this to derivation, but I'd better
stop.

Ron Moe

-----Original Message-----
From: flex...@googlegroups.com [mailto:flex...@googlegroups.com] On

Behalf Of max...@umiacs.umd.edu
Sent: Wednesday, October 01, 2008 6:15 AM
To: flex...@googlegroups.com
Cc: flex...@googlegroups.com
Subject: [FLEx] Re: Verb forms

No virus found in this incoming message.


Checked by AVG - http://www.avg.com

Version: 8.0.173 / Virus Database: 270.7.5/1702 - Release Date: 10/1/2008
9:05 AM

Ronald Moe

unread,
Oct 1, 2008, 2:06:44 PM10/1/08
to flex...@googlegroups.com
Michael Boutin wrote:
"The problem with this is..."

Mike Maxwell wrote:
"We could debate these points..."

Rock the boat. Sprain my brain. Michael is right and Mike is right, and we
really do need to debate these points. We generally do not create separate
entries for inflection, which is generally regular, but we do when it is
irregular. We generally create separate entries for derivation, which is
generally irregular, but we don't when it is regular. So what to do? Each of
us has to come up with policy decisions for our dictionary and then apply
the policy to all the wordforms we collect. What is the easiest and most
efficient way to accomplish this? What would be most intuitive and match the
mental lexicon the best? I think I'll take a nap and rest my overtaxed
brain.

Ron Moe

-----Original Message-----
From: flex...@googlegroups.com [mailto:flex...@googlegroups.com] On

Behalf Of max...@umiacs.umd.edu
Sent: Wednesday, October 01, 2008 8:22 AM
To: flex...@googlegroups.com
Cc: flex...@googlegroups.com
Subject: [FLEx] Re: Verb forms

No virus found in this incoming message.


Checked by AVG - http://www.avg.com

max...@umiacs.umd.edu

unread,
Oct 1, 2008, 3:56:36 PM10/1/08
to flex...@googlegroups.com
Ron Moe wrote:
> Rock the boat. Sprain my brain. Michael is right and Mike is right, and we
> really do need to debate these points. We generally do not create separate
> entries for inflection, which is generally regular, but we do when it is
> irregular. We generally create separate entries for derivation, which is
> generally irregular, but we don't when it is regular.

I guess the main point of my post was that while derivation is frequently
less than productive and/or is often irregular, and while inflection is
(only) sometimes irregular, the fact that there is this partial
correlation misses the point. But we want to create separate entries if
and only if some aspect of morphology is less than fully productive, or is
irregular.

(And we want to create separate entries for idioms, which you can think of
as semantically irregular. Derivational morphology is also sometimes
semantically irregular; inflectional morphology is never semantically
irregular, more or less by definition.)

> What would be most intuitive and match the
> mental lexicon the best?

I'm leery of this, because there is good evidence that the mental lexicon
stores frequent forms regardless of whether they're regular. And I don't
believe a dictionary should store frequent regular forms.

Randy Regnier

unread,
Oct 1, 2008, 4:19:12 PM10/1/08
to flex...@googlegroups.com
>> Mike Maxwell wrote
>>
>> "Another approach would be to combine the "one field per irregular
>> form" and the "separate lexical entry" methods, by having the thing in
the
>> field be a pointer to the separate lex entry. (Actually, I would have
>> thought that the model would have worked this way already. Maybe it
does,
>> Ron?)"
>
> Ron Moe responded:
>
> No, but it should.

Huh? Flex has had these associations between minor and major entries since
the beginning of time. Mike is right on this point.

Randy Regnier

Ronald Moe

unread,
Oct 1, 2008, 5:15:48 PM10/1/08
to flex...@googlegroups.com
Randy Regnier wrote:
"Flex has had these associations between minor and major entries since
the beginning of time."

Yes, it has. You are correct. But I was suggesting a different feature and
(if I understand him correctly) Mike was commenting on my suggestion. But it
is getting a little confusing because we have the current feature (which you
mention) and my suggestion. I'm very likely the one who is confused. Here
are the two situations:

Situation#1 [this is how FLEx currently works]

\lx break

\lx broke
\tp Irregularly Inflected Form [Entry Type field]
\mn break [Primary Entry Reference field]

FLEx links 'broke' to 'break' via the \mn field. As you noted, FLEx has done
this since the beginning.

Situation#2 [this is my suggestion]

\lx break
\pst broke [this is a custom field to hold the past tense form]

\lx broke

Nothing in FLEx links the \pst broke field to the \lx broke entry. This is
what Mike was talking about (if I understood him correctly). Even though
'broke' is the contents of both fields, FLEx does not link them. I've been
suggesting that FLEx could and should enable us to work this way. I should
be able to fill in paradigm fields and then create (and link) another entry
for any irregularly inflected form that is encountered in the paradigm
fields. A number of people have requested a tool that would enable us to
easily create entries for variants, irregularly inflected forms, and complex
forms while working on the root/stem/primary entry.

I'm struggling to figure out how we can help the linguist who is just
starting out on his project and is wanting to use FLEx to investigate
inflection (and other features such as derivation). What procedure would
work best? What tools would he need? What functionality would FLEx need to
take the raw data that the user would be entering and build the lexicon and
grammar? What functionality would FLEx need to process texts and build the
lexicon and grammar? The answers to these questions are not at all clear to
me. We already have some great features, but users seem to be saying that
these features assume a greater understanding of the language than they have
at first. They need something more basic.

I remember 20 years ago being confused by verb paradigms in Maguindanao. I
needed to fill out a number of paradigms for various verb classes because
they didn't all follow the same pattern. I had no idea how many patterns
there were or which verbs belonged to which class. I suspected that some of
the differences were semantically based, but I didn't know how. How could
FLEx have helped me?

Ron Moe


-----Original Message-----
From: flex...@googlegroups.com [mailto:flex...@googlegroups.com] On
Behalf Of Randy Regnier
Sent: Wednesday, October 01, 2008 1:19 PM
To: flex...@googlegroups.com
Subject: [FLEx] Re: Verb forms

Randy Regnier

No virus found in this incoming message.

Marlin_...@gial.edu

unread,
Oct 1, 2008, 7:36:14 PM10/1/08
to flex...@googlegroups.com


Ron Moe in flex...@googlegroups.com wrote on 10/01/2008 04:15:48 PM:
>
> I remember 20 years ago being confused by verb paradigms in Maguindanao. I
> needed to fill out a number of paradigms for various verb classes because
> they didn't all follow the same pattern. I had no idea how many patterns
> there were or which verbs belonged to which class. I suspected that some of
> the differences were semantically based, but I didn't know how. How could
> FLEx have helped me?
>
> Ron Moe

Yes. I too was confused when I went to Tuwali Ifugao a northern Philippines language.
Early on I often couldn't figure out what the root was to look up in the dictionary
what the meaning could be. It has so many affixes and morphophonemics. Lou Hohulin
said she didn't understand the morphology well until she took a bunch of verbs,
elicited and studied the full paradigms. There were over 100 different affix
combinations with no verb having all of them. It also showed semantic
classes. Her grammar classes includes both semantics and form.

This could be where a search of the FLEx Words and Texts, and Grammar areas could
come in. It could at least show possible inflections based on the form, that is,
if it is in a text. In my opinion, all forms need to be somehow accessible.
New learners including children won't know much of the more complicated morphology
and grammar.

Additional problems come in with words where the root form doesn't even occur
in the word. In Tuwali, Pangayam means "Where are y'all going?". The verb form
for "go" is e, yet no e even occurs in Pangayam. The morphophonemics obscured it.
Pang- -an is the circumfix for questioning locations "where", -m is the clitic =mu
meaning 2.pl, so the root e goes to an ayin this environment. All suppletives
will cause this discrepancy problem between form and meaning.

This is where the internet is so useful. On major languages Google
has programmed their search tool to sift out various forms of inflections to understand
the semantic root of the words people are searching for. Many dictionaries/lexical databases
are coming online for searching, study, and collaboration (like wikis). It sure would
be nice if FLEx and our databases head in a similar direction. We need to think outside
the books. If the grammar tool could be linked to the electronic search tool then we could
find more forms. If we could also search the classified dictionary and put a semantic
domain linked to forms in the search that would be helpful too. The parser might
even become stronger.

Lexical Functional grammar, and Role and Reference grammars talk about verb classes
based on the semantic arguments (like agent, patient, theme) and the grammatical
relation they are linked to (like subject, object). The verbs also form semantic class
sets based on the complement clauses they allow, the complementizer they allow, and the
tense-aspect-mood (TAM) marking the main verb causes on the subordinated clause verb.
A sentence level grammar parser would be an extremely useful tool, as many have mentioned before.

Marlin

Mike Maxwell

unread,
Oct 1, 2008, 6:09:04 AM10/1/08
to flex...@googlegroups.com
Let me take a stab at this. I say "stab" because I'm a bit confused by
your latest posting, Ron.

Ronald Moe wrote:
> ...we have the current feature (which you mention) and my suggestion.


> I'm very likely the one who is confused. Here are the two
> situations:
>
> Situation#1 [this is how FLEx currently works]
>
> \lx break
>
> \lx broke
> \tp Irregularly Inflected Form [Entry Type field]
> \mn break [Primary Entry Reference field]
>
> FLEx links 'broke' to 'break' via the \mn field. As you noted, FLEx
> has done this since the beginning.
>
> Situation#2 [this is my suggestion]
>
> \lx break
> \pst broke [this is a custom field to hold the past tense
> form]
>
> \lx broke
>
> Nothing in FLEx links the \pst broke field to the \lx broke entry.

It's hard to use the analogy with SFMs, because FLEx is not SFM-based.

To the best of my understanding (and Randy can correct me if I'm wrong),
FLEx currently combines the above two ways of linking entries. That's
what I meant when I wrote


> Another approach would be to combine the "one field per
> irregular form" and the "separate lexical entry" methods,
> by having the thing in the field be a pointer to the
> separate lex entry.

--and I asked if that wasn't what the model already did.

Maybe we can clarify this. I'll refer to 'break' as the 'major entry'
and 'broke' as the 'minor entry', although in some respects FLEx doesn't
make that distinction--both would just be entries. In that respect,
situation 1 and 2 don't differ, they both have entries for both 'break'
and 'broke'. Then there is a link between the two entries, with the
link representing the fact that the entry for 'broke' is an inflected
form of 'break'. Crucially, this link can be followed in both
directions; so that from the entry for 'broke' you can find out what its
main entry is (namely, the entry for 'break'); and from the entry for
'break', you can find its listed inflected forms (namely, 'broke', and
presumably another entry 'broken').

If that's the way FLEx works (and Randy indicated that it is), then you
have the best (and maybe the worst :-)) of both situation 1 and 2. The
question is then one of user interface: to what extent does FLEx _look_
like it handles both of these situations?

(That this is a UI issue is suggested by Ron's later comment:
> ...A number of people


> have requested a tool that would enable us to easily create entries
> for variants, irregularly inflected forms, and complex forms while
> working on the root/stem/primary entry.

So I think we're in agreement on this.)

Situation 1 is pretty much a given; it implies that there is some
minimal information in the minor entry, namely that it's spelled
'broke', that it is a past tense, and that it's a form of the verb
(lexeme) 'break'. And that's there in FLEx's UI.

Situation 2 implies some kind of a paradigm display in the major entry.
At a minimum it would allow you to jump to any minor entries for a
particular word. In English, some such paradigms don't need to have
anything (like "walk", which is completely regular), while other verbs
need one ("feed" --> "fed") or two ("break" --> "broke", "broken") or
more ("be") links to forms in their paradigm.

A more sophisticated paradigm tool would automatically fill in forms
that weren't listed because they were regular, using a morphological
generator (like STAMP) or transducer to create the forms on the fly.

There are some user interface issues here, such as the fact that some
POSs in some languages have literally thousands of forms. You probably
don't want to routinely display all those forms.

For languages in which there are hundreds or thousands of forms, one way
to avoid the need to display all such forms would be to allow the user
to set up a set of "principle parts" for a given POS--this basically
corresponds to the set of "stem names" in the model, but with the
principle part for each stem name bearing the inflectional affixes
appropriate to some cell of the paradigm. Example: in Spanish, some
verbs have special forms for the 1sg. present indicative and for all
persons of the present subjunctive. The 1sg. present indicative is often
used to represent all these forms. This principle part can normally be
generated from the possibly irregular stem and the regular inflection,
and of course suppletive forms (like 'wept' or 'broke') are simply
listed as described above.

At any rate, the answer is, I think, that FLEx stores listed irregular
forms in separate (minor) lex entries, but they can be reached from a
paradigm-like structure in the major entry.

(There was discussion a year ago about the fact that some Bantu
languages--and I suppose other language as well--have completely
suppletive plurals of nouns, for virtually all nouns. I think, but am
not sure, that noun plurals are probably the only situation where this
happens--where virtually every lexeme of a given POS has an irregular
form. FLEx does not automatically set up a paradigm cell for plurals of
nouns, that is it does not come with a built-in plural field, but it
does allow you to link to irregular forms, plural or otherwise, from the
"main" lexeme.)

> What functionality would FLEx need to take the raw data that
> the user would be entering and build the lexicon and grammar?

I believe the functionality is there in the sets of morphological
"things" that FLEx provides for. What's lacking is mainly the UI--and
that's understandable, because the UI is far more work than the basic
building blocks.
--
Mike Maxwell
"We signify something too narrow when we say:
Man is a grammatical animal. For although there
is no animal except man with a knowledge of grammar,
yet not every man has a knowledge of grammar."
--Martianus Capella, "The Seven Liberal Arts"

Randy Regnier

unread,
Oct 2, 2008, 10:54:06 AM10/2/08
to flex...@googlegroups.com
> Ron Moe wrote:
>
> Situation#2 [this is my suggestion]
>
> \lx break
> \pst broke [this is a custom field to hold the past tense form]
>
> \lx broke
>
> Nothing in FLEx links the \pst broke field to the \lx broke entry.
> This is what Mike was talking about (if I understood him correctly).
> Even though 'broke' is the contents of both fields,
> FLEx does not link them. I've been suggesting that FLEx could
> and should enable us to work this way.

If I understand you correctly, you want the ability to create custom fields
and you want to have some special behavior associated with those new fields.
If you also supply the behavior at the time you create the custom field,
then it can be done. You need to understand however, that you have gone far
beyond the ability of most users, since implementing that behavior
necessarily involves programming, perhaps very exotic programming. Without
that user-supplied custom behavior associated with the user-defined custom
field, the software has no hope (zip, zero, zilch, nada) of being able to
guess that you wanted to connect 'broke' in the new '\pst' field with
'broke' in the '\lx' field.

Randy Regnier

max...@umiacs.umd.edu

unread,
Oct 2, 2008, 11:15:59 AM10/2/08
to flex...@googlegroups.com, flex...@googlegroups.com
Ron Moe wrote:
>>
>> Situation#2 [this is my suggestion]
>>
>> \lx break
>> \pst broke [this is a custom field to hold the past tense form]
>>
>> \lx broke
>>
>> Nothing in FLEx links the \pst broke field to the \lx broke entry.
>> This is what Mike was talking about (if I understood him correctly).
>> Even though 'broke' is the contents of both fields,
>> FLEx does not link them. I've been suggesting that FLEx could
>> and should enable us to work this way.

To which Randy Regnier replied:


> If I understand you correctly, you want the ability to create custom
> fields and you want to have some special behavior associated
> with those new fields.

Personally, I don't think any new "field" is required. There is already a
way to see the links to listed inflected forms from a (major) entry,
right? That gives you part of the functionality Ron is talking about, and
probably all the functionality most people need.

The cases where some additional functionality (but not, I think, a custom
field) may be needed are:

1) Languages where, for *every* lexeme of a given POS, there is an
irregular/ suppletive form. The only situation that I'm aware of where
this arises is in languages for which the plural of a noun is always
suppletive. It's possible that some languages have special verbal forms
for every verb, too--maybe the imperfective aspect is always suppletive,
or some such. You might want a way to ensure that for every noun lexeme,
there is such a link to a suppletive form; and one way of ensuring this
would be to have a "field" that would show up for all and only nouns.

2) Languages where, for *every* lexeme of a given POS, there are multiple
"principle parts." This is probably a special case of (1), and calls for
some way to ensure that all the 'stem names' defined for the POS appear.

3) The situation where you want to show the paradigm of every word (or
more likely, the paradigm of every word of a particular POS). This calls
for a paradigm interface, and probably some way for the morphological
generator/ transducer to automagically fill in the paradigm with the
expected regular forms.

David J Weber

unread,
Oct 2, 2008, 10:23:48 AM10/2/08
to flex...@googlegroups.com
From David Weber

Ron wrote:

> However this method of recording and possibly displaying inflected forms in
> the entry for the stem does not solve the problem that users face in trying
> to find irregularly inflected forms. A non-native speaker looking for
> 'broke' would look on page 168 of my American Heritage Dictionary. But the
> entry for 'break' is on page 162. So the practice is to create a minor entry
> for 'broke' on page 168 that points the user to the entry 'break'. In FLEx
> you do this by creating a separate entry in the database for the irregularly
> inflected form 'broke'. In the Entry Type field you specify that it is an
> 'Irregularly Inflected Form' (a.k.a. 'Inflectional Variant'). Then in the
> 'Primary Entry Reference' field you link it to the stem entry 'break'.
>
> FLEx requires you to create a separate database entry for variants and
> irregularly inflected forms if you want to create a minor entry for them in
> the published dictionary.

Perhaps this requirement is unfortunate. If a **form** is
irregular, I would prefer to store it in the "main entry" (so store
"broke" under "break"). Then, at the point of rendering, I would
generate a cross reference, something like:
BROKE See BREAK.

If the **meaning** is irregular, as with many derived forms, then a
separate database entry is --perhaps-- more justified, although my
preference would be to have it stored with the main entry.

For the Huallaga Quechua dictionary (using a standard format file in
the last decade) I stored the variant forms of the lemma along with
the lemma, and render them following the lemma. I also automatically
generated a cross-reference for each variant. (I worked out a way to
suppress the cross-reference if the variant was too close to the main
entry, to avoid something like "BROKE See BREAK" directly following
the BREAK entry, but this required some human judgement.)

Also, unlike FLEX, I stored all derived forms under the lexical entry
of the lexical item from which it is derived, so "break-causative-"
under the entry for "break". For the publication, I elected to
render these sub-entries and sub-sub-entries under the main entry,
indenting them.

However, for the San Martin Quechua dictionary, which was stored in
the same way, the committee wanted a "flat" dictionary. It was a
simple matter to convert the (sub-)...sub-entries into main entries
as part of the rendering process.

I guess what I'm trying to say is that we should not let how we
**render** information weigh too heavily on how we **store** it.


Ronald Moe

unread,
Oct 3, 2008, 7:07:57 PM10/3/08
to flex...@googlegroups.com
David Weber wrote:
"Perhaps this requirement is unfortunate. If a **form** is
irregular, I would prefer to store it in the "main entry" (so store
"broke" under "break")."

(This issue keeps coming up. So the FLEx team is planning on re-evaluating how the program handles variants and irregularly inflected forms. In the meantime...)

Actually variant forms and irregularly inflected forms are "stored" (in a way) in the primary entry in the Variant Forms field. Likewise complex forms are stored in the primary entry in the Complex Forms field. However these are "virtual" fields and all they contain is the headword of the other entry. There is actually no data in these fields and you cannot type in them. Instead FLEx is merely displaying the headword of the entries for variants and complex forms that you have linked to the primary entry. (You can right-click on one of these headwords and jump to the other entry.) But David has a valid point. FLEx stores allomorphs in the primary entry. It stores pronunciation variants in the primary entry. So why doesn't it store other kinds of variants in the primary entry? Because users (like us!!!) requested the ability to enter various kinds of information about variants. Note all the information in the following "minor entry" for 'went'.

go (pst: went) v. To move... [ety. Middle English: gon, Old English: gan, PIE: ghe]

went (pst of go) [wĕnt] v. (A word used as the past tense of 'go', but based on a different root.) John went home, but I'm going to the store first. [ety. originally the past tense of wend, PIE: wendh]

So "storing" a variant in the primary entry would be impossible without seriously complicating the structure of an entry. The programmers had to choose between requiring the user to create a separate entry for a variant or duplicating all the standard fields for the variant inside the primary entry, essentially creating an entry within an entry. Pay your money, take your choice. You either have one kind of complication or the other.

I must confess that I, too, would prefer to deal with variants and irregularly inflected forms in the main entry. I feel that variants and irregularly inflected forms are "part of" a lexeme. Creating a separate entry "feels" like I am treating the form as a separate lexeme. On the other hand I disagree with David that derived forms (that is, their entries) should be stored with the root. Derived forms are separate lexemes and when they are combined into a single entry I "feel" like I am treating them as the same lexeme. In many cases roots are not easily identified by native speakers and are not a good basis for organizing the information in a dictionary. Instead I prefer how FLEx does it--each lexeme goes into a separate entry, but the user can choose to format the dictionary as a stem dictionary or a root dictionary. If he chooses the "root dictionary" option, he can see all the complex forms that have been linked to the root in the dictionary preview pane in the top right corner of the Lexicon Edit view.

Having said all this, I must add that David has hit on one of the key issues in lexicography when he says, "we should not let how we **render** information weigh too heavily on how we **store** it." The problem is that we must standardize the way we store information if we want a powerful program that can act on the information. Gary Simons has pointed out that only when we standardize data does the data become "actionable" (meaning that the software can act on the data). FLEx standardizes the way we store information, but offers multiple ways to render the information either by choosing from a variety of views in FLEx, or by choosing from a variety of export options, or by choosing from a variety of print options. For more on this topic see section 2.2.3 "Data Versus Presentation" in my paper on lexicography available in the FLEx Help files (Help--Resources--Introduction to Lexicography).

There was a previous thread on this list about an alternative way to add variants and irregularly inflected forms. This alternative would involve adding a dialog box to the UI in which the user would type the form of the variant and specify how it is related to the primary entry (dialectal variant, spelling variant, irregular past tense, etc.). The program would then create a separate entry for the variant. The user would access this dialog box from the primary entry. So the user would "feel" like he was adding the variant to the primary entry. But all this option is really doing is providing a new option for creating the variant entry. If the user ever wanted to add more information to the "minor entry" for the variant, he would have to go to the entry for the variant and do it there. So we are right back where we started.

Ron Moe

-----Original Message-----
From: flex...@googlegroups.com [mailto:flex...@googlegroups.com] On Behalf Of David J Weber
Sent: Thursday, October 02, 2008 7:24 AM
To: flex...@googlegroups.com
Subject: [FLEx] Re: Verb forms


No virus found in this incoming message.
Checked by AVG - http://www.avg.com
Version: 8.0.173 / Virus Database: 270.7.5/1703 - Release Date: 10/2/2008 7:46 AM

Ronald Moe

unread,
Oct 3, 2008, 7:46:52 PM10/3/08
to flex...@googlegroups.com
Randy Regnier wrote:
"If I understand you correctly..."

Sorry, I wasn't clear enough. I'm not a computer programmer and I don't want
to be able to program functionality into a custom field. I couldn't even if
I wanted to. Custom fields are permitted in FLEx (and other programs) in
order to provide a way for the user to capture data that the program
designers did not foresee a need for. But functionality can only be
programmed for built-in fields.

In my recent postings I've been trying to address the needs of users who
don't yet understand the morphology of their language. One technique for
investigating morphology is to fill out paradigms. *Currently* the only way
such a user can add paradigm forms to the lexical database is to set up
custom fields--one field for each paradigm form. This is entirely
unacceptable to me. I recommend that users do this because it's all we have.
But it is not enough. So I've been arguing for something more. I've been
trying to imagine a tool that could be used to *discover* and *document* the
paradigms of words. Once the user has filled out the paradigm (using the
tool that has not yet been designed), then the user should be able to *also*
generate minor entries for irregularly inflected forms. The user cannot
*currently* do this from custom fields. There is no functionality available
for custom fields because FieldWorks doesn't know what the custom field
contains. Custom fields in FLEx are always text fields (as opposed to list
fields). So all FieldWorks can do is allow us to use the Bulk Edit tools to
edit them. That's not functionality, it's editability.

There is a very good reason why FLEx contains a set of built-in fields.
Toolbox has no built-in lexical fields and therefore has absolutely no
functionality related to fields. (It sort of has built-in fields for
interlinearizing and has some functionality there.) In contrast FLEx has
lots of functionality related to fields because the programmers know with
absolute certainty what data is contained in each built-in field. Of course
the user can misuse a field by putting the wrong kind of data in it. But
that is a different problem. As long as the user follows the instructions
for each field, FLEx can do powerful things with the data. This is the
blessing (and curse) of standardization. Standardizing our data means we
have to follow instructions. Although this might seem tyrannical, it is
actually incredibly freeing. Because now I can do fantastic things that were
impossible with unconstrained programs like Toolbox.

The problem we have right now in FLEx with paradigm forms is that there are
no built in fields to handle them. So there is no standardization and there
are no powerful tools that we can use to facilitate the development of this
aspect of our dictionaries. All we have is custom fields. We are no better
off than with Toolbox. In fact we are worse off because at least MDF had a
few paradigm fields.

We need the programmers to give us some built in paradigm fields and to
design some tools with powerful functionality that will enable us to (1)
efficiently fill in paradigms, (2) discover the morphological patterns of
our language, (3) apply our newfound insights to the parser and grammar
sketch, (4) handle the irregularly inflected forms that we discover as we
fill in the paradigms.

Look again at my Situation#2:

\lx break
\pst broke [this is a custom field to hold the past tense form]

\lx broke

There is no built-in \pst field in FLEx. I set this up as a custom field. It
can't be used to create a minor entry such as:

broke, see break

Why? Because it is a custom field and FLEx doesn't know what to do with
custom fields. There is no link between "\pst broke" and "\lx broke". Why?
Because FLEx knows what is in the Lexeme Form (\lx) field, but it has no
idea what is in the custom Past tense (\pst) field. The programmers can't
provide functionality for your custom fields because they don't know what
custom fields you are going to create.

I don't know what the ideal system would be for investigating and
documenting paradigms. I've thought a lot about it, but the FLEx team has
not yet held any extensive discussions on the topic. Generally what happens
is that users request some functionality. The FLEx team collects the ideas
that have been suggested, holds discussions with some experts, then works
out the best solution they can. As a user I'm suggesting some new
functionality.

Ron Moe

-----Original Message-----
From: flex...@googlegroups.com [mailto:flex...@googlegroups.com] On
Behalf Of Randy Regnier
Sent: Thursday, October 02, 2008 7:54 AM
To: flex...@googlegroups.com
Subject: [FLEx] Re: Verb forms

Randy Regnier

No virus found in this incoming message.


Checked by AVG - http://www.avg.com

Ronald Moe

unread,
Oct 3, 2008, 10:43:43 PM10/3/08
to flex...@googlegroups.com
Mike Maxwell wrote:
"Personally, I don't think any new "field" is required."

Yes, we do need a new field. MDF has a set of fields to handle paradigm
forms. (The following are taken from the MDF description of each field.)

\pd Paradigm. Used for specifying the noun or verb class, gender, or other
paradigm set that the lexeme or headword is associated with. These classes
are generally given labels or numbers to differentiate them. Use the Range
Set feature for consistency.

\pdl Paradigm label. Used to label the paradigm form given in the \pdv
field. This is useful for paradigm sets that are incomplete or irregular.
Use a Range Set.

\pdv Paradigm form. Used to give the vernacular paradigm form specified by
the label in the \pdl field. Used mostly for irregular or incomplete
paradigm sets.

\pde Paradigm form gloss (E). Used for glossing the vernacular paradigm form
in English. [also \pdn (national) and \pdr (regional)]

These fields would be used together as in the following Greek example:

\lx pinoo
\ps v
\de To drink.
\pd 2Aor
\pdl aor
\pdv epion
\pde drank
\pdl perf
\pdv pepooka
\pde had drunk

This would print as follows. (I couldn't find any documentation in the MDF
manual, so I'm guessing.)

pinoo v. To drink. ... Prdm: 2Aor, aor: epion 'drank', perf: pepooka 'had
drunk'.

FLEx already has a system to indicate what MDF calls the "paradigm" in the
\pd field. FLEx captures this in the Inflection Class and Inflection Feature
fields.

So what we need in FLEx is a bundle of three fields that would correspond to
the \pdl \pdv and \pde fields. (I personally don't think we need the \pde
field, but some users might want it, especially those that want to import
MDF databases.) There is one problem with the MDF system. The MDF designers
recognized that it would be more efficient to have a single field that would
combine the \pdl and \pdv fields. In fact they created a number of fields
that incorporated the paradigm label into the field marker:

\sg singular form
\pl plural form
\rd reduplication form
\1s 1st singular form
\2s 2nd singular form
(and \3s \4s \1d \2d \3d \4d \1p \1i \1e \2p \3p \4p)

The problem with these fields is that most languages don't need these fields
(except the \pl field) and many languages need other fields. In fact there
is no way to predict which paradigm forms will need to be documented for a
given language. But it would greatly simplify the system if we could do
something similar to this in FLEx. Otherwise all the paradigm forms would be
combined in the \pdv field and there would be no way to filter out just one
set in browse view. Each paradigm form needs to be in a separate field so
that it can be displayed in a separate column in browse view.

Lexeme Form: sing
Third singular: sings
Participle: singing
Past: sang
Past Participle: sung

I'm not exactly sure how the FLEx programmers would make this work, but I'm
sure they could do it. :)

Ron Moe


-----Original Message-----
From: flex...@googlegroups.com [mailto:flex...@googlegroups.com] On
Behalf Of max...@umiacs.umd.edu
Sent: Thursday, October 02, 2008 8:16 AM
To: flex...@googlegroups.com
Cc: flex...@googlegroups.com
Subject: [FLEx] Re: Verb forms

No virus found in this incoming message.


Checked by AVG - http://www.avg.com

David J Weber

unread,
Oct 4, 2008, 6:36:43 PM10/4/08
to flex...@googlegroups.com
Ron,

Why not define "entry" as recurcive, that is, let an entry have--as a
part--an entry, a "subentry". This "subentry", since it is an entry,
could also have an entry (a "sub-sub-entry"), and so on.

I know little about how FLEx is implemented so I don't know what
technical challenges this would present, but I'm guessing that it
would be possible to (simply?) add "entry" as one sort of "field" in
an entry.

Perhaps this is simplistic but it seems such an obvious way to allow
a dictionary to be structured in an interesting --and perhaps
intuitive-- way, in which derived forms have their natural home
within the entry that corresponds to the lexeme from which they are
derived (with no implication that this is how speakers are wired!).

And, NO, this would not solve all the problems.

Best, --David

> David Weber wrote:
> "Perhaps this requirement is unfortunate. If a **form** is
> irregular, I would prefer to store it in the "main entry" (so store
> "broke" under "break")."
>
> (This issue keeps coming up. So the FLEx team is planning on re-evaluating how the program handles variants and irregularly inflected forms. In the meantime...)
>
> Actually variant forms and irregularly inflected forms are "stored" (in a way) in the primary entry in the Variant Forms field. Likewise complex forms are stored in the primary entry in the Complex Forms field. However these are "virtual" fields and all they contain is the headword of the other entry. There is actually no data in these fields and you cannot type in them. Instead FLEx is merely displaying the headword of the entries for variants and complex forms that you have linked to the primary entry. (You can right-click on one of these headwords and jump to the other entry.) But David has a valid point. FLEx stores allomorphs in the primary entry. It stores pronunciation variants in the primary entry. So why doesn't it store other kinds of variants in the primary entry? Because users (like us!!!) requested the ability to enter various kinds of information about variants. Note all the information in the following "minor entry" for 'went'.
>
> go (pst: went) v. To move... [ety. Middle English: gon, Old English: gan, PIE: ghe]
>
> went (pst of go) [went] v. (A word used as the past tense of 'go', but based on a different root.) John went home, but I'm going to the store first. [ety. originally the past tense of wend, PIE: wendh]

Randy Regnier

unread,
Oct 4, 2008, 8:29:33 PM10/4/08
to flex...@googlegroups.com
David Weber wrote:
>
> Why not define "entry" as recurcive, that is, let an entry have--as a
> part--an entry, a "subentry". This "subentry", since it is an entry,
> could also have an entry (a "sub-sub-entry"), and so on.

Entries are recursive in Flex. Furthermore, these dependent entries can
connect to multiple entries, if you wish (i.e., 'blackbird' could be
associated with both 'black' and 'bird').

Randy Regnier

Mike Maxwell

unread,
Oct 4, 2008, 8:58:44 PM10/4/08
to flex...@googlegroups.com
(Warning: technical details about the guts of FLEx ahead, together with
speculation unfettered by realities of programming)

Ronald Moe wrote:
> Mike Maxwell wrote: "Personally, I don't think any new "field" is
> required."
>
> Yes, we do need a new field. MDF has a set of fields to handle

> paradigm forms...


>
> \pd Paradigm. Used for specifying the noun or verb class, gender, or
> other paradigm set that the lexeme or headword is associated with.

> ...


> \pdl Paradigm label. Used to label the paradigm form given in the

> \pdv field...

>
> \pdv Paradigm form. Used to give the vernacular paradigm form
> specified by the label in the \pdl field. Used mostly for irregular
> or incomplete paradigm sets.
>
> \pde Paradigm form gloss (E). Used for glossing the vernacular
> paradigm form in English.

> ...


> FLEx already has a system to indicate what MDF calls the "paradigm"
> in the \pd field. FLEx captures this in the Inflection Class and
> Inflection Feature fields.
>
> So what we need in FLEx is a bundle of three fields that would
> correspond to the \pdl \pdv and \pde fields.

The good news is that all this information is, as far as I know, already
captured in FLEx. The bad news is that what is needed for the sort of
work you're describing is a tool that brings it together in the form of
a paradigm, preferably together with a morphological generator (similar
to STAMP, although probably built as a transducer that combines a parser
like AMPLE with a generator like STAMP). And that is bad news
because--I suspect--this is a lot of work.

Specifically, each cell of a lexeme's paradigm can consist of a lex
entry (a minor entry linked to the main lexeme). The \pdv is the form
of such a lex entry, and the \pdl is the the glosses of the affixes (I'm
assuming you've done a morphological parse of the word, perhaps by
hand), and the \pde is the gloss of the main entry plus the \pdl.

There's actually one additional piece of information that the tool would
need, in order to generate a complete table for the paradigm: the set of
morphosyntactic features for which the given POS inflects. This is also
included in FLEx's model. For POSs where there are more than two such
features, generating a table becomes clumsy, if not impossible, because
a table can only be two dimensional. There are ways to include more
than one feature along a given axis, e.g. by combining feature values
(1Sg, 2Sg, 3Sg, 1Pl, ...). But at some point this becomes too messy,
and you want separate tables (e.g. for present, past and future tenses).
Such options can be suggested by the tool, but the user will need to
choose among them.

The tool can start out by populating the table(s) for a particular
lexeme with any stored forms already linked to the given lexeme (and, if
the morphological grammar is up to it, using the generator). As the
user fills in empty cells or corrects generated cells, the tool would
create the additional stored minor entries. And where the generator
gives the correct form, the user can erase the stored entry (if any).

(I said above that the label of the inflectional form--Ron's \pdl
field--is generated from the morphological parse of the word. That
would be true for minor entries that are already in the dictionary when
you start up the tool. But if the user fills in a particular cell with
a new form, this label can instead come from the labels of the cell in
the table. It can then get stored as part of the parse of that minor
entry.)

Creating a lex entry for every cell in the paradigm is something you
might do when you're initially studying the paradigm, but clearly you
don't want to do that for every word in the dictionary if you can avoid
it, at least not if you have much in the way of inflectional morphology.
And once you have a reliable inflectional morphological grammar, you
should be able to dispense with all the stored regular forms and allow
the morphological generator to create them.

In sum, I believe the necessary fields are there, but a lot of thought
(and programming) would have to go into how to create a useful tool for
the user.

Allan Johnson

unread,
Oct 4, 2008, 10:43:36 PM10/4/08
to flex...@googlegroups.com
Thank you for the good thinking on these questions. It's complicated, and I'm not sure that I've adequately processed all your responses. But in reviewing the thread, I see one thing that maybe is worth adding into the discussion. Going back to one of your posts Ron:

Ronald Moe wrote:
> Randy Regnier wrote:
> "Flex has had these associations between minor and major entries since
> the beginning of time."
>
> Yes, it has. You are correct. But I was suggesting a different feature and
> (if I understand him correctly) Mike was commenting on my suggestion. But it
> is getting a little confusing because we have the current feature (which you
> mention) and my suggestion. I'm very likely the one who is confused. Here
> are the two situations:
>
> Situation#1 [this is how FLEx currently works]
>
> \lx break
>
> \lx broke
> \tp Irregularly Inflected Form [Entry Type field]
> \mn break [Primary Entry Reference field]
>
> FLEx links 'broke' to 'break' via the \mn field. As you noted, FLEx has done
> this since the beginning.
>
> Situation#2 [this is my suggestion]
>
> \lx break
> \pst broke [this is a custom field to hold the past tense form]
>
> \lx broke
>

I think what we're looking for here can be accomplished at least to some
degree with current FLEx functionality. Looking at the entry types in
FLEx, three different "types" of "entry types" are available:
- Normal: This includes just the "Main Entry" type
- Complex Form: This includes "Derivations", "Compounds", and several
others
- Variant Form: This includes "Inflectional Variants" and several others

For a given entry, if the entry type selected is a "Complex Form" type,
a "Primary Entry References" field is added into the entry. This
provides the mechanism needed for the 2-way linking, such as the link
from a derived form to its source form and back.

For a given entry, if the entry type selected is a "Variant Form" type,
again a "Primary Entry References" field is added into the entry, to
provide for the 2-way linking between the variant and its more basic
form. In addition, a "Condition" field is added into the entry. One
example of a common use of this field would be to mark the condition as
"plural", when the lexeme is a plural noun form. Then the "Primary Entry
Reference" would point to the singular form of the noun.

For a verbal entry, if we want to add an inflectional paradigm, we could
do the following:
- Add a new Entry Type called "Inflected Form"
- Make this a "Variant Form" type of "Entry Type" (in order to make
the "Condition" field available)
- Add each member of the paradigm as a separate entry, and set its
entry type to "Inflected Form"
- Set its "Primary Entry Reference" field to point to the basic
uninflected form
- Set its "Condition" field to a custom label that represents this
member of the paradigm

Applying this to Ron's example, I get this:

Lexeme Form: break
Entry Type: Main Entry
Variant Forms: broke

Lexeme Form: broke
Entry Type: Inflected Form
Primary Entry References: break
Condition: PST

Lexeme Form: breaks
Entry Type: Inflected Form
Primary Entry References: break
Condition: XXX

Lexeme Form: breaking
Entry Type: Inflected Form
Primary Entry References: break
Condition: YYY

This seems to provide all the functionality shown in both Situation#1
and Situation#2. I've also added two more inflections, in order to see a
fuller paradigm. I've just used the dummy labels XXX and YYY for
"breaks" and "breaking", because I don't know the proper linguistic
labels (both sort of a present tense but not exactly...)

But it's not as easy as it could be to enter this information into FLEx.
It would be good to be able to enter all of this info just from the
"break" entry, not having to navigate around to different entries for
each inflected form. And it would be good to be able to readily view the
members of the paradigm all in one place, in the "break" entry, and
showing the "Condition" label that has been chosen for each member of
the paradigm. Ok, never mind - this part already works fine. In the
formatted view of the break entry, I get the following info on inflected
forms:
(PST broke; XXX breaks; YYY breaking)

But it would still be nice to be able to more readily edit the parts of
the paradigm from the "break" entry, without having to navigate around
to the various entries. Anyway, as Mike said, it seems that we have
reasonable places for the data, but just need to work on the User
Interface for editing and viewing these things.

With this, I think we've gone full circle back to Heidi's original post.
If I understand correctly, this is just what Heidi's users have been
doing in order to handle their data - using the "Condition" field to
mark each type of inflection that they find, and wanting to be able to
sort the data according to these condition fields. And Andy told us that
this capability is coming soon. So maybe this -should- be considered the
proper way to record inflected forms. Just put each one in its own
entry. Choose the most basic form to be the "Primary Entry Reference"
and link the others to it as "Inflected Forms". And then we need to work
on developing the User Interface to enable the members and labels of the
paradigm to be readily edited from a single place (the "break" entry),
and to enable the relevant forms to be more readily entered into this
paradigm for other verbs.

Allan

Andy Black

unread,
Oct 6, 2008, 11:52:41 AM10/6/08
to flex...@googlegroups.com
On 10/3/2008 4:46 PM, Ronald Moe wrote:
> In my recent postings I've been trying to address the needs of users who
> don't yet understand the morphology of their language. One technique for
> investigating morphology is to fill out paradigms. *Currently* the only way
> such a user can add paradigm forms to the lexical database is to set up
> custom fields--one field for each paradigm form. This is entirely
> unacceptable to me. I recommend that users do this because it's all we have.
> But it is not enough. So I've been arguing for something more. I've been
> trying to imagine a tool that could be used to *discover* and *document* the
> paradigms of words. Once the user has filled out the paradigm (using the
> tool that has not yet been designed),

Well, the WordWorks team a long time ago did some thinking on this.
I've uploaded a small portion of Larry Hayashi's power point
presentation at the Computer Technical Conference held in the year
2000. This portion includes a vision for what such a tool might look
like. It can be found at
http://groups.google.com/group/flex-list/files?hl=en. Please note that
Larry was trying to be careful to communicate that what he was
demonstrating was all "smoke and mirrors" - none of it had been
implemented yet back then. So he included a little joke to try to
communicate that... Run the power point and click through it - the
screens showing the proposed tool actually change as you click.

--Andy

Ronald Moe

unread,
Oct 6, 2008, 7:19:22 PM10/6/08
to flex...@googlegroups.com
David Weber wrote:
"Why not define "entry" as recurcive, that is, let an entry have--as a
part--an entry, a "subentry"."

This might be doable from a computational point of view, but since I'm not a
programmer, I don't know what the issues would be from that side of things.
From my (ignorant) point of view I think we are already doing this by
linking entries for complex forms to their roots.

The more critical issue for me is what the UI would look like. The UI can
always hide the way the data is actually stored and present it to the user
in a format that the user can easily understand and interact with. In
Toolbox you can make a "subentry" part of a root entry by putting the
"subentries" at the bottom of the root entry and replacing the \lx of the
subentry with \se. But it makes an entry horribly long when there are a lot
of complex forms being appended to a root. So I don't really like the idea
of appending complex forms to the bottom of a root entry. I can imagine a UI
in FLEx in which the user could right click on a list of complex forms (such
as the Complex Forms field) and a window for a particular complex form would
pop up. But that's not a whole lot better than the current way FLEx works.

One of the bigger problems with trying to come up with a solution to these
kinds of things is that Shoebox chose to put each field on a separate line
and FLEx followed this convention in the Entry pane. Some fields (like the
Definition field) can get long. But others are always going to be short. You
simply don't need a whole line for the grammatical category. I've always
wished for a UI that looked more like the ultimate dictionary entry. For
instance each field could consist of a field label (e.g. Definition) that
the user could abbreviate (e.g. Def) followed by a small box for the data.
All the fields would appear on screen in paragraph format. The data boxes
would automatically expand to fit the length of the data entered. This kind
of UI would radically reduce the space needed to display an entry and would
enable us to be able to see all the fields in an entry instead of having to
scroll down to get to a field that we want to edit. By condensing the space
needed to display a single entry we could make it possible to see a root
entry and one or more subentries on a single screen without having to
scroll. I've attached a mockup screen shot. I'm sure we could improve on
this and make it a lot less cluttered and more clear.

In this prototype I've provided a way to add paradigm forms with the option
to create a minor entry for irregular forms. I've also provided a way to
capture a variant form with the option to create a minor entry for it. I've
also provided a way to add an entry for a complex form, which would also
automatically link it the (primary) root entry. Note that this view uses the
"root dictionary" option that presents all complex forms as subentries. In
this view you could edit the entries for the complex forms in the same
window as the root entry. I believe some of these features would answer some
of the complaints and wishes that many of you have expressed.

Ron Moe


Ron,

Best, --David

Version: 8.0.173 / Virus Database: 270.7.6/1710 - Release Date: 10/6/2008
9:23 AM

Edit view.bmp

Allan Johnson

unread,
Oct 6, 2008, 9:27:12 PM10/6/08
to flex...@googlegroups.com
Ronald Moe wrote:
> David Weber wrote:
> "Why not define "entry" as recurcive, that is, let an entry have--as a
> part--an entry, a "subentry"."
>
> This might be doable from a computational point of view, but since I'm not a
> programmer, I don't know what the issues would be from that side of things.
> >From my (ignorant) point of view I think we are already doing this by
> linking entries for complex forms to their roots.
>

Yes, you're right Ron. As far as linking entries, we can have any number
of levels of entry nested within entry. This one a derivation of that
one, which happens to also be derived from that other one. We can also
configure the dictionary for a root-based view, in which a derived form
is displayed as a subentry of its root. But the display is more limited
than the linking capability. We currently can view only one level of
subentry. Subentries of subentries don't work in the root-based view. So
as you're saying below, again this comes down to a UI issue. As far as
storing and linking of the data, we have lots of flexibility.

> The more critical issue for me is what the UI would look like...

Reply all
Reply to author
Forward
0 new messages