Word grammar failed bug?

Alexandre Arkhipov

unread,

Apr 27, 2012, 11:42:50 AM4/27/12

to flex...@googlegroups.com, andy_...@sil.org

Dear Andy and all,

I've come across a strange thing while giving a FLEx class today.
I've set up a trial project for Russian aimed to use Hermit Crab (I've got phonemes and phon.features in place). But first I've tried to showcase the use of Inflection Classes, Inflection Features and Stem Names. Inflection Classes and Stem Names seem to work as expected, but there was a problem with Inflection Features.
We've got two words of the same inflection class ("2nd declension"), which have the same suffixes everywhere except in nominative/accusative: "luk" ('onion') with zero-suffix vs. "okn-o" ('window') with -o. The aim now is to avoid "luk-o" and "okn" to parse. So I add Inflection feature "gender", assigning masculine to "luk", neuter to "okn-", and the corresponding Required features for the zero and " -o".
Unexpectedly, after that the parser fails to parse either nominative form, correct or incorrect.
But -- that's not the end of the story! I parse the correct form, "okno", with "Try a word". It says failure upon resynthesis with "Word grammar failed" message. But if I follow all the steps in the word grammar, it comes to a successful parse!

Any hints to a solution welcome.

Note 1: The instrumental case form, for which there is no gender preference, parses OK.
Note 2: The default parser works as expected: OK for "okno", failure for "okn".
Note 3: An alternative is to introduce inflection subclasses (2nd masc. vs 2nd neut.), but I suspect it is not possible for all the uses of Inflection features.
Note 4: FLEx 7.2.3, Windows 7 and XP. I can also easily share the backup if needed.

All best,
Sasha

Andy Black

unread,

Apr 27, 2012, 11:58:16 AM4/27/12

to flex...@googlegroups.com

Hi, Sasha.

Please do send me the backup (off this list) so I can look at more things in detail.

There was a bug in the word grammar debugger (the one in Try a Word) which failed to find inflectional feature conflicts. So that might explain why it showed success when the parse failed. I'm puzzled, though, about why Hermit Crab and the default parser are behaving differently for these two words.

Thanks.

--Andy

--
You received this message because you are subscribed to the discussion group "FLEx list". This group is hosted by Google Groups and is open for anyone to browse.
To post to this group, send email to flex...@googlegroups.com
To unsubscribe from this group, send email to flex-list+...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/flex-list

Andy Black

unread,

Apr 27, 2012, 1:59:01 PM4/27/12

to flex...@googlegroups.com

Sasha:

Thanks so much for sending the database.

The reason that the Hermit Crab parser failed to parse words like luk and okno is that Required features is not implemented for Hermit Crab. I'm afraid it silently causes the word grammar (in the parser part) to fail. The XAmple parser does correctly handle Required features so it works.

Having noted this, please also understand that Required features are intended to be used to choose between allomorphs of the same morpheme (i.e. same entry). Since you have split the -o and -0 (zero) forms into two entries already, all you probably need to do is to set the inflection features of them: neuter for -o and masculine for the zero. Then you can get want you want. And, if it is indeed the case that -o is always neuter gender and the zero is always masculine gender, you could choose to reflect this in their respective glosses, too.

Hope this helps,

--Andy

On 4/27/2012 8:42 AM, Alexandre Arkhipov wrote:

Andy Black

unread,

Apr 27, 2012, 2:03:43 PM4/27/12

to flex...@googlegroups.com

I forgot to mention that section B.3.4 "Affix allomorphs conditioned by morpho-syntactic features not implemented yet" in the conceptual intro to parsing document mentions that these required features are not implemented for Hermit Crab. See section 3.8 "Affix Allomorphs Conditioned by Morpho-syntactic Features" for more on Required features.

--Andy

Alexandre Arkhipov

unread,

Apr 27, 2012, 2:15:05 PM4/27/12

to flex...@googlegroups.com

Dear Andy,

Thank you very much (and sorry for much ado...). So I see, my error was to use Required features field in the affix instead of Inflection features below. Wow, happily it does work! I think it was the fact that the messages contradicted each other ("parse failed" vs. "parse successful") which confused me, otherwise I would perhaps reread the manual once more.

Many thanks,
Sasha

Fri, 27 Apr 2012 11:03:43 -0700 от Andy Black <andy_...@sil.org>:

Andy Black

unread,

Apr 27, 2012, 2:42:20 PM4/27/12

to flex...@googlegroups.com

On 4/27/2012 11:15 AM, Alexandre Arkhipov wrote:
> ...I think it was the fact that the messages contradicted each other

> ("parse failed" vs. "parse successful") which confused me,

Me, too, to be honest.

I'm glad it is now working for you.

--Andy

Alexandre Arkhipov

unread,

Apr 28, 2012, 10:45:12 AM4/28/12

to flex...@googlegroups.com

Dear Andy and all,

Another couple of HC-related questions if you please.

I beleive remember reading in some FLEx news/release announcements since last summer that Hermit Crab now supports optional rules (i.e. a way to handle free variation). But I cannot find how to make a rule optional, nor can I find any trace of that news anymore, with google or whatever. Was it only in my dreams?

Also, how can one specify a zero morpheme in the rules? The reason I ask is that I made a final devoicing rule which works well with bare stems with no inflection (e.g. adverbs), and also with suffixes, but not with stems followed by a zero suffix. E.g. "vdruk" (< adv. "vdrug" 'suddenly') parses OK, but "luk" (< "lug" 'meadow') does not, failing during resynthesis.

In another project (Alutor) I've got several phonological rules for HC. One of them is supposed to delete dental consonants in word-initial position before (some) other consonants. So for example the verb root -tkepl- loses t- if there is no prefix, as in the infinitive "keplek" (< tkepl + k; e = schwa). However, the infinitive does not parse. In the Try a word window, the last regex for the rules unapplied does include an optional initial "t?", but all the boxes below which show affix guesses and remnants, start just with k.
Finally I simplified the rule to only have the specified phonemes instead of classes/features ("t > zero / # _ k), but the result is the same. On the other hand, a similar rule which cuts the word-final vowel works OK. Is there something special about the word-initial rules, or am I overlooking something else?
I'm attaching the output from the Try window; can send the backup if needed, of course.

All best,
Sasha

Fri, 27 Apr 2012 11:42:20 -0700 от Andy Black <andy_...@sil.org>:

Kevin Warfel

unread,

Apr 29, 2012, 3:39:22 PM4/29/12

to flex...@googlegroups.com

Dear Sasha,

The implementation of optional rules is a recent addition to the HC capabilities. I think that whichever version includes that has not yet been made available to the public yet, but when the next beta or stable release comes out, that should be part of it.

I'm not sure about why your rules aren't working the way you're wanting them to, but I'd be happy to have a look if you send me your backups. Please send them to Kevin_...@sil.org and I'll give them high priority tomorrow when I get to the office.

Best wishes,
Kevin

Kevin Warfel

unread,

Apr 30, 2012, 10:25:25 AM4/30/12

to flex...@googlegroups.com

Now that I'm at my work computer, where I have installed the version that supports optional Phonological Rules in conjunction with the Hermit Crab parser, it seems that I may have been wrong in what I wrote about that. I have FW 7.3 installed, so this feature may be a bit further away from release to the public than I indicated in my previous message.

Andy, if you see this, maybe you can confirm or correct what I've written.

Thanks,
Kevin

Andy Black

unread,

Apr 30, 2012, 11:43:19 AM4/30/12

to flex...@googlegroups.com

On 4/30/2012 7:25 AM, Kevin Warfel wrote:
> Now that I'm at my work computer, where I have installed the version that supports optional Phonological Rules in conjunction with the Hermit Crab parser, it seems that I may have been wrong in what I wrote about that. I have FW 7.3 installed, so this feature may be a bit further away from release to the public than I indicated in my previous message.
>
> Andy, if you see this, maybe you can confirm or correct what I've written.

Kevin, I wonder if you were thinking of the ability to turn rules on and
off which is slated to come out with version 7.3.

From what I can tell, the Hermit Crab parser does not have an ability
to handle optional phonological rules.

There was an issue reported earlier where Hermit Crab was failing to try
two or more allomorphs when these shared the same or similar
environments. It thus was blocking free variation based on allomorphy.
This bug has been fixed and should be available with version 7.2.4.

--Andy

Kevin Warfel

unread,

Apr 30, 2012, 11:47:59 AM4/30/12

to flex...@googlegroups.com

Sorry for the confusion.

I was referring to the ability to turn rules on or off, yes. I interpreted Sasha's reference to "optional Phonological Rules" as meaning "ability to disregard one or more rules" when trying to track down the reason the parser is behaving differently than expected. Thanks for clearing this up.

Craig Farrow

unread,

Apr 30, 2012, 9:50:06 PM4/30/12

to flex...@googlegroups.com, Andy Black

30/04/2012 11:43 p.m. dï, Andy Black pišdimiš:

>
> Kevin, I wonder if you were thinking of the ability to turn rules on
> and off which is slated to come out with version 7.3.

Will this include the ability to turn on/off Affix Templates? That would
be really helpful for debugging, too.

Craig.

Snofriacus

unread,

Apr 30, 2012, 11:09:59 PM4/30/12

to flex...@googlegroups.com

Hey guys,

I had to check out this thread to see if the subject line was meant as
just a noun phrase or as a statement of HC's relative usefulness :) On
that topic, comparing HC to the default Ample parser...

Up to now, Ample is the only parser I'm familiar with, but from things
I've heard here and there, it sounds like it might be productive for me
to try out HC. The general impression I get from comments I've read is
that HC can accomplish anything that Ample can, and maybe more
intuitively and with less tedious work. Would that be an accurate
statement?

Allan

Andy Black

unread,

May 1, 2012, 11:30:08 AM5/1/12

to flex...@googlegroups.com

Yes, Craig, it will. It also will include the ability to turn on/off
compound rules and ad hoc rules.

--Andy

Andy Black

unread,

May 1, 2012, 11:33:03 AM5/1/12

to flex...@googlegroups.com

On 4/30/2012 8:09 PM, Snofriacus wrote:
> Hey guys,
>
> I had to check out this thread to see if the subject line was meant as
> just a noun phrase or as a statement of HC's relative usefulness :)

It was meant to be a noun phrase in this case...

> On that topic, comparing HC to the default Ample parser...
>
> Up to now, Ample is the only parser I'm familiar with, but from things
> I've heard here and there, it sounds like it might be productive for
> me to try out HC. The general impression I get from comments I've read
> is that HC can accomplish anything that Ample can, and maybe more
> intuitively and with less tedious work. Would that be an accurate
> statement?

Almost. See appendix B in the conceptual intro to parsing document for
more on HC, but especially section B.3 for the known limitations.

--Andy

Kevin Warfel

unread,

May 1, 2012, 2:29:47 PM5/1/12

to flex...@googlegroups.com

Allan,

I will preface my remarks by freely admitting that I am strongly biased in favor of the HC parser, so I like your alternative interpretation of the subject line. As a result of my strong preference for the HC parser, I have worked much more extensively with it than with the XAMPLE parser, so my representation of the XAMPLE parser is likely to be incomplete or even unfair, so just keep that in mind as you read on. But having said that, I will say that my bias is due in large part to the frustration I experienced as a *linguist* trying to use the XAMPLE parser to handle the morphophonemic processes that were present in Puguli, the West African language I was trying to parse, so maybe that makes the bias valid - or maybe not, I'll let you decide.

I definitely agree with your statement that the HC parser is more intuitive than the XAMPLE parser, *IF* your intuition runs along linguistic lines (i.e., you're a linguist and/or knowledgeable about the linguistic workings of the language you're trying to parse). This is the aspect of the HC parser that I *love* - it mimics very closely the way I think about morphophonemic processes.

I am less sure about your statement about HC being less tedious than XAMPLE. Most of the tedium involved in the implementation of HC is in the initial setup (*rigorous* and *exhaustive* definition of phonemes as sets of phonological features), but after that, the slope of the curve becomes much flatter and the going is much easier. My experience with XAMPLE is that the slope of the curve remains about the same the whole way along, at least until you're confronted with a complex morphophonemic process or the interplay of several morphophonemic processes, at which point the curve becomes much steeper. In my case, I finally got to the point where the curve was so steep that I felt obligated to try the HC parser, which was experimental at that time, even though I am not at all an "early adopter".

As I consult with FLEx users who are interested in using one of the parsers, I usually advise them to use XAMPLE if the morphophonemics in the language they are working with is relatively uncomplicated (but most of those who come to me for help are not in this category). But as soon as I see a fair amount of complexity, especially where two or more processes affect the same morpheme boundary and the order in which they are applied is important, I recommend HC. It just handles situations like that with "elegance," and I have a great appreciation for an elegant and eloquent synopsis of a piece of linguistic reality.

As an example of this, there is a verb form in Puguli created by the suffixation of -ɔɔ̀ to the verb root (if these characters don’t come through, these are “open o”s or “backwards c”s and indicate low back round vowels, which in West Africa are pronounced with retracted tongue root), and that’s where the morphophonemics kick in. This suffix must be “harmonized” to the root vowel(s) for ATR value (if the root vowel is +ATR, the suffix vowels will change to “oo”), and for height (if the root vowel is “i” the suffix vowels will change to “uu”). If the root is monosyllabic, regressive spread of the high tone of the suffix “overwrites” the tone of the root vowel, making it high tone as well. If the root ends in a vowel, one of the vowels of the suffix must delete. And maybe there’s another process that I’m forgetting at the moment. In any case, that’s at least four processes that apply across the morpheme boundary in this context. Each of those processes can be described independently and made to interact appropriately in conjunction with HC. With XAMPLE, you would have to resort to describing the net effect of each possible combination of these multiple processes, resulting in a very long list of environments and conditions that does not intuitively correlate to the linguistic reality of four independent, but interactive processes.

When the morphophonemic process in question is one like the devoicing of a consonant when it ends up next to a voiceless consonant, the representation for either parser is fairly straightforward:

XAMPLE

Lexeme –bi

Allophone –pi / [S]__

Lexeme –gu

Allophone –ku / [S]__

where [S] is a Natural Class defined as the set of voiceless consonants in the language

HC

[+vcd] à [-vcd] / [-vcd]+__

However, even in a ‘simple’ situation like this, I prefer HC, for several reasons:

1) The symbol “[-vcd]” is more explicit as to what it represents than is “[S]”.

2) It is much more obvious in the HC expression than in the XAMPLE ones that the process at work in these examples is consonant devoicing. This fact would be much more evident if the lexemes and allophones under XAMPLE were written in a script that you don’t know how to read! (If I were looking at a database where these forms were written in an Indic script, for example, it would take me a very long time to figure out what was going on, whereas the HC representation would remain transparent.)

3) Using XAMPLE, every lexeme which changes form, depending on its morphophonemic context, must have one or more allomorphs declared, even if the pattern is widespread in the language. Thus, if suffixes –ga, -ge, -gi, -go, and –gu exist in the language and each one has an allomorph beginning with /k/, which occurs in the context described above, each suffix must have an allomorph explicitly listed in its lexical entry. In the case of HC, one rule suffices for all occurrences of the same process, even if there are hundreds of lexemes affected.

4) When adding a new lexeme to a database in which the parser has already been gotten to work, adding the lexeme is all that needs to be done in order for words built on that lexeme to parse with HC. With XAMPLE, all relevant allomorphs must also be added to the lexical entry before forms built on that lexeme can be parsed.

5) The expression of a linguistic process in a succinct statement with a scope of application that parallels its range of operation in the language is simply “elegant”!

As the morphophonological processes involved become more complex, the contrast between the two approaches becomes increasingly marked, and the argument for HC being preferred to XAMPLE becomes more and more compelling (at least to me).

And there you have a short introduction to my enamorment with the Hermit Crab parser and a few of the reasons why I prefer it to XAMPLE. Others who have more experience with XAMPLE should feel free to correct or rebut my undoubtedly skewed representation of reality. (Note that Andy has already alluded to some situations where HC cannot yet do what XAMPLE is capable of doing.)

Blessings,

Kevin Warfel

-----Original Message-----
From: flex...@googlegroups.com [mailto:flex...@googlegroups.com] On Behalf Of Snofriacus
Sent: Monday, April 30, 2012 11:10 PM
To: flex...@googlegroups.com
Subject: Re: [FLEx] HC rules

Hey guys,

--

Snofriacus

unread,

May 1, 2012, 8:57:52 PM5/1/12

to flex...@googlegroups.com

On 5/2/2012 2:29 AM, Kevin Warfel wrote:

XAMPLE
Lexeme –bi
Allophone –pi / [S]__

Lexeme –gu
Allophone –ku / [S]__

where [S] is a Natural Class defined as the set of voiceless consonants in the language

HC
[+vcd] à [-vcd] / [-vcd]+__

Hi Kevin,

Thanks for these good details. Could you give a prose statement of what this HC rule is saying? I think I'm understanding parts of it, but not fully getting it.

Allan

Kevin Warfel

unread,

May 2, 2012, 8:29:38 AM5/2/12

to flex...@googlegroups.com

Hi Allan,

I’ve inserted the prose statement between the two versions of the “rule” below. (I put “rule” in quotes here because XAMPLE uses allomorphs rather than rules to formulate the effects of a morphophonological pattern in the language.)

Kevin

From: flex...@googlegroups.com [mailto:flex...@googlegroups.com] On Behalf Of Snofriacus
Sent: Tuesday, May 01, 2012 8:58 PM
To: flex...@googlegroups.com
Subject: Re: [FLEx] HC rules

On 5/2/2012 2:29 AM, Kevin Warfel wrote:

XAMPLE
Lexeme bi
Allophone pi / [S]__

Lexeme gu
Allophone ku / [S]__

where [S] is a Natural Class defined as the set of voiceless consonants in the language

Both of these illustrate the fact that a voiced phoneme becomes devoiced when it follows a voiceless phoneme. In actual practice, the rule as expressed for HC is likely to be too general because it will apply the devoicing to all phonemes (including vowels, for example) and the application is likely to be less general than that, so other features would have to be bundled together with the “voicing” feature to restrict its application to consonants only or to stops only, according to the scope of the process in the language.

HC
[+vcd] --> [-vcd] / [-vcd]+__ (I reinserted the arrow that disappeared along the way.)

The “+” in this rule indicates a morpheme boundary, something that XAMPLE does not (cannot) take into consideration, as I understand it.

Hi Kevin,

Thanks for these good details. Could you give a prose statement of what this HC rule is saying? I think I'm understanding parts of it, but not fully getting it.

Allan

--

Alexandre Arkhipov

unread,

May 3, 2012, 7:08:59 PM5/3/12

to flex...@googlegroups.com

Dear Andy and Kevin,

Thank you for clearing up this question. Yes, I was thinking of stating that
a rule by itself is optional, and I now know it is not possible. The two
other things -- turning rules&templates on and off for debugging, and
allowing free variation based on allomorphy -- will be great, thanks to all
the team!

Best,
Sasha

----- Original Message -----
From: "Andy Black" <andy_...@sil.org>
To: <flex...@googlegroups.com>
Sent: Monday, April 30, 2012 6:43 PM
Subject: Re: [FLEx] HC rules

J V C

unread,

May 4, 2012, 4:00:26 PM5/4/12

to flex...@googlegroups.com

Kevin,

I really appreciate you taking the time to lay all of that out. But you didn't sound "strongly biased" to me--maybe that means I'm hyper biased! :)

I've not used either parser extensively, but it seems to me that HC should be the default parser. I didn't find the learning curve to be too tedious up front and was able to start using it a bit prior to doing the "*rigorous* and *exhaustive* definition of phonemes" that you mentioned. (I've asked the HC developers to consider automating this manual approach: treat each orthographic element as a vanilla but valid phoneme until the user has finished defining everything rigorously; then add natural class info gradually, as needed for writing rules.)

Maybe it depends on the language, but I was trying to model Indonesian early on and immediately hit a huge snag with XAMPLE. Indonesian is a relatively simple language in terms of phonology and morphophonemics, but it does have one process that's complex (two-step) and affects a huge number of verbs. The active voice prefix blows away the first consonant of the verb root if that consonant is voiceless and not part of a cluster.

meN- potong --> memotong
meN- tutup --> menutup
meN- sisir --> menyisir
meN- kasihi --> mengasihi

So, the nasal assimilates first, and then the consonant deletes (perhaps because the nasal is carrying its place feature anyway). This two-step process is typically called "nasal substitution".

This is similar to the problem you described under (3) in your message, but instead of giving every suffix an extra allomorph, I would have had to give extra allomorphs to hundreds of verb roots! (I.e. to every root that is affected by the deletion rule.) It's much more satisfying to be able to use rules.

Jon

Andy Black

unread,

May 4, 2012, 4:52:36 PM5/4/12

to flex...@googlegroups.com

On 5/4/2012 1:00 PM, J V C wrote:

I've not used either parser extensively, but it seems to me that HC should be the default parser.

The main reason we have not made HC be the default parser is that it has not been "exercised" anywhere near as much as the Ample part of XAmple has been. So the conceptual intro to parsing document dubbed it the "experimental" parser. And we have had HC users report bugs. I'm not recalling any users reporting XAmple bugs (at least in the same way)...

Maybe it depends on the language, but I was trying to model Indonesian

For what its worth, the conceptual intro to parsing document discusses some challenging aspects of Indonesian morpho-phonology. See section B.1.2.1.4 "Nasal Assimilation".

--Andy

Kevin Warfel

unread,

May 8, 2012, 11:24:48 AM5/8/12

to flex...@googlegroups.com

Jon,

You wrote:

Maybe it depends on the language, but I was trying to model Indonesian early on and immediately hit a huge snag with XAMPLE. Indonesian is a relatively simple language in terms of phonology and morphophonemics, but it does have one process that's complex (two-step) and affects a huge number of verbs. The active voice prefix blows away the first consonant of the verb root if that consonant is voiceless and not part of a cluster.

meN- potong --> memotong
meN- tutup --> menutup
meN- sisir --> menyisir
meN- kasihi --> mengasihi

So, the nasal assimilates first, and then the consonant deletes (perhaps because the nasal is carrying its place feature anyway). This two-step process is typically called "nasal substitution".

This is similar to the problem you described under (3) in your message, but instead of giving every suffix an extra allomorph, I would have had to give extra allomorphs to hundreds of verb roots! (I.e. to every root that is affected by the deletion rule.) It's much more satisfying to be able to use rules.

In response, I would say that this is an excellent example of a real-life situation where one can get the XAMPLE parser to “come up with the right answer,” but where the HC parser is not only more intuitive – it’s less work, too.

Kevin

Kevin Warfel

unread,

May 8, 2012, 11:42:04 AM5/8/12

to flex...@googlegroups.com

Thanks, Andy, for putting this positive spin on the number of bugs that have been coming to light in the HC parser. (Maybe you didn’t see what you wrote as positive, but I’m interpreting it that way.) To me this means that more people are trying to use HC (Hallelujah!), and an increasing number of individuals working in a variety of languages translates into “stressing” aspects of the parser that haven’t been tested quite that same way before, so we find the bugs. That in turn means – assuming that the bugs will get fixed – that we are hastening the day when the HC parser can be touted as the default, a goal of mine for some time now. And so I see this as a positive thing. But I agree with you that we are not yet at the point of being able to recommend HC as the default parser.

Kevin

From: flex...@googlegroups.com [mailto:flex...@googlegroups.com] On Behalf Of Andy Black
Sent: Friday, May 04, 2012 4:53 PM
To: flex...@googlegroups.com
Subject: Re: [FLEx] HC rules

On 5/4/2012 1:00 PM, J V C wrote:

Snofriacus

unread,

May 8, 2012, 10:38:35 PM5/8/12

to flex...@googlegroups.com

I'm not in a position to actually start experimenting with this in Hermit Crab right now, but want to ask this while I'm thinking of it. I don't think there are any words that I've been unable to parse using the Ample parser, but some of the solutions don't seem to represent the linguistic reality very well. The case I've found most unsatisfying is similar to what Jon presented below, but with the additional dimension of a reduplication. I used to have some examples with the nasal "N" assimilating to other consonants besides glottal, but the two below are the only ones I'm remembering right now:

mangingisda (Tagalog)
maN- CV isda

pangongoman (Botolan Sambal)
paN- CV oman

Note that I haven't specified what sort of affix the CV is. Typically CV reduplication before a root is viewed as a prefix. If this CV were a prefix, we'd get this:

CV- isda -> iisda (The C of the CV is a glottal, which in this context isn't written in the orthography)
maN- iisda -> mangiisda (The nasal "N" of the prefix becomes "ng" before a glottal)

But this isn't what we get. The syllable that's reduplicated is "ngi", which doesn't exist until after "maN-" has been joined to and assimilated to "isda". So to get mangingisda, the order of operations has to be reversed:

maN- isda -> mangisda
<CV> mangisda -> mangi<ngi>sda

But for the CV to apply at this point in the process, it needs to be applied as an infix, not a prefix.

So my question for Hermit Crab - and what I'd try if I were set up to do some testing - is would it be capable of modeling this process in a way that reflects these linguistic observations?

Allan

J V C

unread,

May 9, 2012, 3:29:05 AM5/9/12

to flex...@googlegroups.com

You mentioned that there really is a phonemic glottal consonant there, even though the orthography doesn't reflect it. Would the linguistic analysis be more satisfying if the phonological rules made direct reference to that? Just taking a wild stab at it, something along these lines...?
maN- CV 'isda
maN- 'i'isda   redup
mang'i'isda   nasal assim.
mangi'isda   glottal deletion
mangingisda   consonant ?influence?

That way, you wouldn't have to treat the reduplication as infixing. The deletion of the first glottal seems easy enough, although you'd have to explain why the second glottal changes to /ng/. Still, both /ng/ and glottal are [+back], right? And you could appeal to the nearby influence of the first /ng/.

Of course, if you're just parsing orthographic forms and not writing up a phonology, you might not want the parser outputting those glottals as it breaks things down in the reverse order. But maybe there's a way around that, such as a "glottal" insertion rule producing 'isda from isda. Also, I guess there could be problems if certain roots really do begin with a vowel and no glottal, but that may or may not be relevant for these languages.

Jon

Snofriacus

unread,

May 9, 2012, 5:05:20 AM5/9/12

to flex...@googlegroups.com

Interesting thoughts. But those last couple of steps don't seem much happier than the workarounds needed with Ample. Avoiding a reduplicating infix isn't what I'm after at all. What I'm looking for is a parser that -can- handle this analysis. I'm hoping that maybe HC is it.

If I find my other examples that don't involve glottals I'll post those. With those it should be easier to clearly see what's happening, and what features of a parser would be needed to correctly model it.

Oh, ok. Here they are:

"panonokho" (temptation), from "tokho"
"ampangangailangan" (needing), from "kailangan"
"pamumuhay" (living), from "buhay"

Take the first one, "panonokho". The straightforward analysis using processes that are known to happen in the language is:

paN- tokho -> panokho (nasal assimilation and subsequent deletion of the consonant it assimilated to)
-CV- panokho -> panonokho (I don't think it's possible to determine which "no" is the original and which is the copy)

So I'm hoping to find out whether HC can model this. Some might argue that a reduplicating infix -isn't- a process that's known to happen in the language, since most of the CV reduplication happens as prefixes. But the CV reduplication that we model as prefixes is actually ambiguous as to order. Every case can just as easily be interpreted as infixing reduplication. We have traditionally chosen to call it a prefix just because it seems easier to parse this way. So if you have lots of cases which can be viewed as either prefixing or infixing reduplication, and a few diagnostic cases which demand an infixing interpretation, the nice solution is to simply view it all as infixing reduplication.

I wasn't intending to get into linguistic argumentation here, and shouldn't spend too much time on this right now. Basically just wanting to get a feel for what HC can model by throwing some interesting data at it.

Allan

On 5/9/2012 3:29 PM, J V C wrote:

You mentioned that there really is a phonemic glottal consonant there, even though the orthography doesn't reflect it. Would the linguistic analysis be more satisfying if the phonological rules made direct reference to that? Just taking a wild stab at it, something along these lines...?
maN- CV 'isda

maN- 'i'isdaï¿½ï¿½ redup
mang'i'isdaï¿½ï¿½ nasal assim.
mangi'isdaï¿½ï¿½ glottal deletion
mangingisdaï¿½ï¿½ consonant ?influence?

That way, you wouldn't have to treat the reduplication as infixing. The deletion of the first glottal seems easy enough, although you'd have to explain why the second glottal changes to /ng/. Still, both /ng/ and glottal are [+back], right? And you could appeal to the nearby influence of the first /ng/.

Of course, if you're just parsing orthographic forms and not writing up a phonology, you might not want the parser outputting those glottals as it breaks things down in the reverse order. But maybe there's a way around that, such as a "glottal" insertion rule producing 'isda from isda. Also, I guess there could be problems if certain roots really do begin with a vowel and no glottal, but that may or may not be relevant for these languages.

Jon

On 05/09/2012 10:38 AM, Snofriacus wrote:

I'm not in a position to actually start experimenting with this in Hermit Crab right now, but want to ask this while I'm thinking of it. I don't think there are any words that I've been unable to parse using the Ample parser, but some of the solutions don't seem to represent the linguistic reality very well. The case I've found most unsatisfying is similar to what Jon presented below, but with the additional dimension of a reduplication. I used to have some examples with the nasal "N" assimilating to other consonants besides glottal, but the two below are the only ones I'm remembering right now:

mangingisda (Tagalog)
maN- CV isda

pangongoman (Botolan Sambal)
paN- CV oman

Note that I haven't specified what sort of affix the CV is. Typically CV reduplication before a root is viewed as a prefix. If this CV were a prefix, we'd get this:

CV- isda -> iisda (The C of the CV is a glottal, which in this context isn't written in the orthography)

maN- iisda -> mangiisda (The nasal "N" of the prefix becomes "ng" before a glottal)

But this isn't what we get. The syllable that's reduplicated is "ngi", which doesn't exist until after "maN-" has been joined to and assimilated to "isda". So to get mangingisda, the order of operations has to be reversed:

maN- isda -> mangisda

<CV> mangisda -> mangi<ngi>sda

But for the CV to apply at this point in the process, it needs to be applied as an infix, not a prefix.

So my question for Hermit Crab - and what I'd try if I were set up to do some testing - is would it be capable of modeling this process in a way that reflects these linguistic observations?

Allan

On 5/8/2012 11:24 PM, Kevin Warfel wrote:

ï¿½

Jon,

ï¿½

You wrote:

Maybe it depends on the language, but I was trying to model Indonesian early on and immediately hit a huge snag with XAMPLE. Indonesian is a relatively simple language in terms of phonology and morphophonemics, but it does have one process that's complex (two-step) and affects a huge number of verbs. The active voice prefix blows away the first consonant of the verb root if that consonant is voiceless and not part of a cluster.

meN- potong --> memotong
meN- tutup --> menutup
meN- sisir --> menyisir
meN- kasihi --> mengasihi

So, the nasal assimilates first, and then the consonant deletes (perhaps because the nasal is carrying its place feature anyway). This two-step process is typically called "nasal substitution".

This is similar to the problem you described under (3) in your message, but instead of giving every suffix an extra allomorph, I would have had to give extra allomorphs to hundreds of verb roots! (I.e. to every root that is affected by the deletion rule.) It's much more satisfying to be able to use rules.

In response, I would say that this is an excellent example of a real-life situation where one can get the XAMPLE parser to ï¿½come up with the right answer,ï¿½ but where the HC parser is not only more intuitive ï¿½ itï¿½s less work, too.

ï¿½

Kevin

--
You received this message because you are subscribed to the discussion group "FLEx list". This group is hosted by Google Groups and is open for anyone to browse.
To post to this group, send email to flex...@googlegroups.com
To unsubscribe from this group, send email to flex-list+...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/flex-list

J V C

unread,

May 9, 2012, 8:26:13 AM5/9/12

to flex...@googlegroups.com

I've been wondering about this some more and thinking about a similar situation in Indonesian:
meN- DUP pukul --> memukul-mukul
meN- DUP singgung --> menyinggung-nyinggung
and so forth

I went and looked at the document Andy mentioned (Help, Resources, Intro to Parsing), and lo and behold, he's already got a rule for exactly that scenario; rule 197. (Section B.1.2.1.4.2 Unspecified nasal and full reduplication in Bahasa Indonesia) I suspect something very similar would work for yours as well.

Wow. I need to go re-read that document as an HC user, rather than from an XAmple perspective.

Jon

On 05/09/2012 5:05 PM, Snofriacus wrote:

Interesting thoughts. But those last couple of steps don't seem much happier than the workarounds needed with Ample. Avoiding a reduplicating infix isn't what I'm after at all. What I'm looking for is a parser that -can- handle this analysis. I'm hoping that maybe HC is it.

If I find my other examples that don't involve glottals I'll post those. With those it should be easier to clearly see what's happening, and what features of a parser would be needed to correctly model it.

Oh, ok. Here they are:

"panonokho" (temptation), from "tokho"
"ampangangailangan" (needing), from "kailangan"
"pamumuhay" (living), from "buhay"

Take the first one, "panonokho". The straightforward analysis using processes that are known to happen in the language is:

paN- tokho -> panokho (nasal assimilation and subsequent deletion of the consonant it assimilated to)

-CV- panokho -> panonokho (I don't think it's possible to determine which "no" is the original and which is the copy)

So I'm hoping to find out whether HC can model this. Some might argue that a reduplicating infix -isn't- a process that's known to happen in the language, since most of the CV reduplication happens as prefixes. But the CV reduplication that we model as prefixes is actually ambiguous as to order. Every case can just as easily be interpreted as infixing reduplication. We have traditionally chosen to call it a prefix just because it seems easier to parse this way. So if you have lots of cases which can be viewed as either prefixing or infixing reduplication, and a few diagnostic cases which demand an infixing interpretation, the nice solution is to simply view it all as infixing reduplication.

I wasn't intending to get into linguistic argumentation here, and shouldn't spend too much time on this right now. Basically just wanting to get a feel for what HC can model by throwing some interesting data at it.

Allan

On 5/9/2012 3:29 PM, J V C wrote:

You mentioned that there really is a phonemic glottal consonant there, even though the orthography doesn't reflect it. Would the linguistic analysis be more satisfying if the phonological rules made direct reference to that? Just taking a wild stab at it, something along these lines...?
maN- CV 'isda

maN- 'i'isda   redup
mang'i'isda   nasal assim.
mangi'isda   glottal deletion

mangingisda consonant ?influence?

That way, you wouldn't have to treat the reduplication as infixing. The deletion of the first glottal seems easy enough, although you'd have to explain why the second glottal changes to /ng/. Still, both /ng/ and glottal are [+back], right? And you could appeal to the nearby influence of the first /ng/.

Of course, if you're just parsing orthographic forms and not writing up a phonology, you might not want the parser outputting those glottals as it breaks things down in the reverse order. But maybe there's a way around that, such as a "glottal" insertion rule producing 'isda from isda. Also, I guess there could be problems if certain roots really do begin with a vowel and no glottal, but that may or may not be relevant for these languages.

Jon

On 05/09/2012 10:38 AM, Snofriacus wrote:

I'm not in a position to actually start experimenting with this in Hermit Crab right now, but want to ask this while I'm thinking of it. I don't think there are any words that I've been unable to parse using the Ample parser, but some of the solutions don't seem to represent the linguistic reality very well. The case I've found most unsatisfying is similar to what Jon presented below, but with the additional dimension of a reduplication. I used to have some examples with the nasal "N" assimilating to other consonants besides glottal, but the two below are the only ones I'm remembering right now:

mangingisda (Tagalog)
maN- CV isda

pangongoman (Botolan Sambal)
paN- CV oman

Note that I haven't specified what sort of affix the CV is. Typically CV reduplication before a root is viewed as a prefix. If this CV were a prefix, we'd get this:

CV- isda -> iisda (The C of the CV is a glottal, which in this context isn't written in the orthography)

maN- iisda -> mangiisda (The nasal "N" of the prefix becomes "ng" before a glottal)

But this isn't what we get. The syllable that's reduplicated is "ngi", which doesn't exist until after "maN-" has been joined to and assimilated to "isda". So to get mangingisda, the order of operations has to be reversed:

maN- isda -> mangisda

<CV> mangisda -> mangi<ngi>sda

But for the CV to apply at this point in the process, it needs to be applied as an infix, not a prefix.

So my question for Hermit Crab - and what I'd try if I were set up to do some testing - is would it be capable of modeling this process in a way that reflects these linguistic observations?

Allan

On 5/8/2012 11:24 PM, Kevin Warfel wrote:

Jon,

You wrote:

Maybe it depends on the language, but I was trying to model Indonesian early on and immediately hit a huge snag with XAMPLE. Indonesian is a relatively simple language in terms of phonology and morphophonemics, but it does have one process that's complex (two-step) and affects a huge number of verbs. The active voice prefix blows away the first consonant of the verb root if that consonant is voiceless and not part of a cluster.

meN- potong --> memotong
meN- tutup --> menutup
meN- sisir --> menyisir
meN- kasihi --> mengasihi

So, the nasal assimilates first, and then the consonant deletes (perhaps because the nasal is carrying its place feature anyway). This two-step process is typically called "nasal substitution".

This is similar to the problem you described under (3) in your message, but instead of giving every suffix an extra allomorph, I would have had to give extra allomorphs to hundreds of verb roots! (I.e. to every root that is affected by the deletion rule.) It's much more satisfying to be able to use rules.

In response, I would say that this is an excellent example of a real-life situation where one can get the XAMPLE parser to “come up with the right answer,” but where the HC parser is not only more intuitive – it’s less work, too.

Kevin

--
You received this message because you are subscribed to the discussion group "FLEx list". This group is hosted by Google Groups and is open for anyone to browse.
To post to this group, send email to flex...@googlegroups.com
To unsubscribe from this group, send email to flex-list+...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/flex-list

Snofriacus

unread,

May 9, 2012, 8:29:50 PM5/9/12

to flex...@googlegroups.com

Hi Jon,

Thanks for finding this. The Indonesian data in Andy's document does seem to be using the same kind of process, only with a bigger piece being reduplicated. I need to study this too. On first look the parsing solution appears to be doing essentially what you were suggesting - first reduplicating, and then doing the assimilation and deletion twice.

I don't yet fully understand how the rule works. For the second assimilation and deletion, is it specific enough to match only the reduplicated initial consonant? In other words, supposing that the root were "pipil", would it correctly give me "memipil-mipil", not "memimil-pipil" or "memimil-mimil"? If so, it does seem to be a workable approach.

Ok, it appears to be depending on the orthographic hyphen to correctly identify the second "p" to be changed to "m". So it should work for generating "memipil-mipil" from "meN + DUP + pipil". If I'm understanding this correctly, then it might not reliably work for generating "panonokho" from "paN + CV + tokho", where there's no orthographic cue showing where the second copy of the CV begins. Though in this case maybe it would be possible to specify "change 't' to 'n' only for the very next consonant". If there's a way for HC to keep track of morpheme breaks, it seems like some reference to that would be helpful. Could HC put in "-" as a temporary morpheme break marker and then match and delete it when it does the second assimilation & deletion?

As for linguistic reality though, is anybody else bothered by having to specify the assimilation and deletion twice? Or is that just me? The first assimilation and deletion makes sense. We can see how the context motivates it. But this analysis has the second assimilation and deletion occurring after it's been removed from the context that motivates it. Isn't there some other way HC could approach this in which the assimilation and deletion happens only once, and then the result is just duplicated? Maybe my calling it "infixing reduplication" is overkill. Maybe that's what scares people off from this analysis. But there's got to be a way to model it in a more straightforward way.

Allan

On 5/9/2012 8:26 PM, J V C wrote:

I've been wondering about this some more and thinking about a similar situation in Indonesian:

meN- DUP pukulï¿½ --> memukul-mukul

meN- DUP singgung --> menyinggung-nyinggung
and so forth

I went and looked at the document Andy mentioned (Help, Resources, Intro to Parsing), and lo and behold, he's already got a rule for exactly that scenario; rule 197. (Section B.1.2.1.4.2 Unspecified nasal and full reduplication in Bahasa Indonesia) I suspect something very similar would work for yours as well.

Wow. I need to go re-read that document as an HC user, rather than from an XAmple perspective.

Jon

On 05/09/2012 5:05 PM, Snofriacus wrote:

Interesting thoughts. But those last couple of steps don't seem much happier than the workarounds needed with Ample. Avoiding a reduplicating infix isn't what I'm after at all. What I'm looking for is a parser that -can- handle this analysis. I'm hoping that maybe HC is it.

If I find my other examples that don't involve glottals I'll post those. With those it should be easier to clearly see what's happening, and what features of a parser would be needed to correctly model it.

Oh, ok. Here they are:

"panonokho" (temptation), from "tokho"
"ampangangailangan" (needing), from "kailangan"
"pamumuhay" (living), from "buhay"

Take the first one, "panonokho". The straightforward analysis using processes that are known to happen in the language is:

paN- tokho -> panokho (nasal assimilation and subsequent deletion of the consonant it assimilated to)

-CV- panokho -> panonokho (I don't think it's possible to determine which "no" is the original and which is the copy)

So I'm hoping to find out whether HC can model this. Some might argue that a reduplicating infix -isn't- a process that's known to happen in the language, since most of the CV reduplication happens as prefixes. But the CV reduplication that we model as prefixes is actually ambiguous as to order. Every case can just as easily be interpreted as infixing reduplication. We have traditionally chosen to call it a prefix just because it seems easier to parse this way. So if you have lots of cases which can be viewed as either prefixing or infixing reduplication, and a few diagnostic cases which demand an infixing interpretation, the nice solution is to simply view it all as infixing reduplication.

I wasn't intending to get into linguistic argumentation here, and shouldn't spend too much time on this right now. Basically just wanting to get a feel for what HC can model by throwing some interesting data at it.

Allan

On 5/9/2012 3:29 PM, J V C wrote:

You mentioned that there really is a phonemic glottal consonant there, even though the orthography doesn't reflect it. Would the linguistic analysis be more satisfying if the phonological rules made direct reference to that? Just taking a wild stab at it, something along these lines...?
maN- CV 'isda

maN- 'i'isdaï¿½ï¿½ redup
mang'i'isdaï¿½ï¿½ nasal assim.
mangi'isdaï¿½ï¿½ glottal deletion

mangingisdaï¿½ï¿½ consonant ?influence?

That way, you wouldn't have to treat the reduplication as infixing. The deletion of the first glottal seems easy enough, although you'd have to explain why the second glottal changes to /ng/. Still, both /ng/ and glottal are [+back], right? And you could appeal to the nearby influence of the first /ng/.

Of course, if you're just parsing orthographic forms and not writing up a phonology, you might not want the parser outputting those glottals as it breaks things down in the reverse order. But maybe there's a way around that, such as a "glottal" insertion rule producing 'isda from isda. Also, I guess there could be problems if certain roots really do begin with a vowel and no glottal, but that may or may not be relevant for these languages.

Jon

On 05/09/2012 10:38 AM, Snofriacus wrote:

I'm not in a position to actually start experimenting with this in Hermit Crab right now, but want to ask this while I'm thinking of it. I don't think there are any words that I've been unable to parse using the Ample parser, but some of the solutions don't seem to represent the linguistic reality very well. The case I've found most unsatisfying is similar to what Jon presented below, but with the additional dimension of a reduplication. I used to have some examples with the nasal "N" assimilating to other consonants besides glottal, but the two below are the only ones I'm remembering right now:

mangingisda (Tagalog)
maN- CV isda

pangongoman (Botolan Sambal)
paN- CV oman

Note that I haven't specified what sort of affix the CV is. Typically CV reduplication before a root is viewed as a prefix. If this CV were a prefix, we'd get this:

CV- isda -> iisda (The C of the CV is a glottal, which in this context isn't written in the orthography)

maN- iisda -> mangiisda (The nasal "N" of the prefix becomes "ng" before a glottal)

But this isn't what we get. The syllable that's reduplicated is "ngi", which doesn't exist until after "maN-" has been joined to and assimilated to "isda". So to get mangingisda, the order of operations has to be reversed:

maN- isda -> mangisda

<CV> mangisda -> mangi<ngi>sda

But for the CV to apply at this point in the process, it needs to be applied as an infix, not a prefix.

So my question for Hermit Crab - and what I'd try if I were set up to do some testing - is would it be capable of modeling this process in a way that reflects these linguistic observations?

Allan

On 5/8/2012 11:24 PM, Kevin Warfel wrote:

ï¿½

Jon,

ï¿½

You wrote:

Maybe it depends on the language, but I was trying to model Indonesian early on and immediately hit a huge snag with XAMPLE. Indonesian is a relatively simple language in terms of phonology and morphophonemics, but it does have one process that's complex (two-step) and affects a huge number of verbs. The active voice prefix blows away the first consonant of the verb root if that consonant is voiceless and not part of a cluster.

meN- potong --> memotong
meN- tutup --> menutup
meN- sisir --> menyisir
meN- kasihi --> mengasihi

So, the nasal assimilates first, and then the consonant deletes (perhaps because the nasal is carrying its place feature anyway). This two-step process is typically called "nasal substitution".

This is similar to the problem you described under (3) in your message, but instead of giving every suffix an extra allomorph, I would have had to give extra allomorphs to hundreds of verb roots! (I.e. to every root that is affected by the deletion rule.) It's much more satisfying to be able to use rules.

In response, I would say that this is an excellent example of a real-life situation where one can get the XAMPLE parser to ï¿½come up with the right answer,ï¿½ but where the HC parser is not only more intuitive ï¿½ itï¿½s less work, too.

ï¿½

Kevin

--
You received this message because you are subscribed to the discussion group "FLEx list". This group is hosted by Google Groups and is open for anyone to browse.
To post to this group, send email to flex...@googlegroups.com
To unsubscribe from this group, send email to flex-list+...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/flex-list

Andy Black

unread,

May 11, 2012, 2:50:53 PM5/11/12

to flex...@googlegroups.com

On 5/9/2012 5:29 PM, Snofriacus wrote:

Hi Jon,

...The Indonesian data in Andy's document does seem to be using the same kind of process, only with a bigger piece being reduplicated. ... On first look the parsing solution appears to be doing essentially what you were suggesting - first reduplicating, and then doing the assimilation and deletion twice.

The current implementation of phonological rules in FLEx is (old-fashioned) SPE-style: one must build the entire word and then make one pass through the ordered set of phonological rules.ï¿½ This means that, yes, one must put all the morphemes on the form and then apply the rules.

... [The second assimilation and deletion rule] appears to be depending on the orthographic hyphen to correctly identify the second "p" to be changed to "m". So it should work for generating "memipil-mipil" from "meN + DUP + pipil". If I'm understanding this correctly, then it might not reliably work for generating "panonokho" from "paN + CV + tokho", where there's no orthographic cue showing where the second copy of the CV begins. Though in this case maybe it would be possible to specify "change 't' to 'n' only for the very next consonant". If there's a way for HC to keep track of morpheme breaks, it seems like some reference to that would be helpful.

Phonological rules can make reference to morpheme breaks in the environment of the rule (via the + symbol).ï¿½ So yes, there is a way for HC to keep track of them.

Could HC put in "-" as a temporary morpheme break marker and then match and delete it when it does the second assimilation & deletion?

One could make the "underlying form" of the reduplication affix include a hyphen.ï¿½ When the rules are applied they will then see this hyphen.ï¿½ Naturally, one would need to add another rule that would delete the hyphen since it would not surface.

As for linguistic reality though, is anybody else bothered by having to specify the assimilation and deletion twice? Or is that just me? The first assimilation and deletion makes sense. We can see how the context motivates it. But this analysis has the second assimilation and deletion occurring after it's been removed from the context that motivates it. Isn't there some other way HC could approach this in which the assimilation and deletion happens only once, and then the result is just duplicated?

The kind of thinking you are doing here is what I understand helped motivate the development of Lexical Phonology.ï¿½ Basically, putting all the morphemes together in a word and then making a single pass through the phonological rules does not always account for data natural languages have and/or it means writing rules that seem to miss potential generalizations.ï¿½ See http://www.sil.org/linguistics/GlossaryOfLinguisticTerms/WhatIsLexicalPhonology.htm for one web site on it.

HC actually has the capability to model at least a part of what Lexical Phonology needs.ï¿½ In fact, the FieldWorks conceptual model also already includes what would be needed.ï¿½ We just have not implemented these capabilities yet.

What I'm hearing you say is that if we could have a level (or stratum) where the meN- morpheme would be attached and the assimilation and deletion rules would be applied, and then in another level, we'd do the reduplication without having to do the assimilation and deletion again, your linguistic intuitions would be happier.ï¿½ If so, you are definitely not alone.

(On the other hand, I personally am struck by how such a complicated set of data as Indonesian has can actually be handled by the current FLEx implementation of HC in a way that far outshines what one would have to do with XAmple.)

--Andy

Kevin Warfel

unread,

May 16, 2012, 11:05:24 AM5/16/12

to flex...@googlegroups.com

Hi Allan,

Andy has already responded to the effect that the HC parser is not yet able (though the potential is there in the design) to do what we would really like to be able to transparently describe in this situation, namely that the form of a reduplicative affix is based on the result of the interaction of a nasal consonant and a stop that end up being separated by the reduplicative affix in the final form, so that there seem to be strata in the formation of the word, with morphophonological rules applying to each stratum before passing the result to the next one.

My goal in this response, then, is to attempt to get at a solution via HC with its current capabilities and limitations. The questions I asked myself were two: “Is it possible to produce a correct parse using HC in this situation?” and (assuming that the answer to the first question is ‘yes’) “How linguistically inelegant would the solution have to be in order to work?”

Now, I am the first to admit that my attempts at problem-solving often result in solutions that are more cumbersome than they would need to be, so I don’t often get points for “elegance” (though I do value elegance). You (or anyone else) are more than welcome to suggest improvements to my proposed way of handling this. But I think I have found two different approaches that would enable HC to correctly parse forms of the type you described.

In my first approach, I have tried to accomplish the task without the use of an ‘artificial hyphen’ to condition the nasal assimilation. What I came up with will probably appear a bit complicated to someone who is unfamiliar with HC, so my second approach uses the ‘artificial hyphen’ which someone else suggested earlier. That approach simplifies the appearance of the first rule, but makes it necessary to add an extra rule to delete the unwanted hyphen once it has triggered the desired changes.

First of all, your data (where “DUP-” represents the reduplicative morpheme):

panonokho = paN- DUP- tokho

ampangangailanga = aN- paN- DUP- kailangan

pamumuhay = paN- DUP- buhay

pangongoman = paN- DUP- oman

(I believe these are all words from Botolan Sambal; I have set aside the example from Tagalog, so as to try to stay within the same language, though it seems to work the same way.)

APPROACH #1 (no hyphen)

My assumptions based on the data provided:

1) The reduplicative morpheme (DUP-) will be represented in the lexicon by means of an Affix Process Rule (APR) whereby DUP- + root of form CV... ==> CV-+CV... (i.e. a copy of the initial consonant and vowel of the root to which it is affixed; the hyphen used here is the one inserted automatically by FLEx to indicate that it is a prefix, it is not part of the form that HC sees).

2) The glottal stop in roots that have one in the initial position (e.g., “oman”) will be represented in the lexicon with a character in the lexeme form, but without that initial character in the citation form. This gives HC a ‘consonant’ to parse with, but it doesn’t show up in the dictionary or orthography. The exact character chosen to represent the glottal stop is unimportant, so long as it is declared as a phoneme and has the phonological features of a stop with a glottal point of articulation.

3) Only stops undergo the assimilation described (unlike the example in Bahasa Indonesia in the Introduction to Parsing document referred to, where /s/ also becomes nasalized).

4) All phonemes which have the feature “nas:+” (i.e., which are nasalized) are consonants.

5) The phonological system of this language will use at least the following phonological features, which have values of “+” or “-”:

consonantal (cons)

continuant (cont)

sonorant (son)

voiced (voiced)

nasal (nas)

In addition, I’m proposing the use of a custom phonological feature Point of Articulation (POA), which has values of “labial”, “alveolar”, “velar”, “glottal”, and any others that might be necessary (e.g., palatal). This is, so far as I can determine, the same thing as is called OrthPlace in Introduction to Parsing, but happens to be my term of choice. The precise term used is not important; *how* it is used is!

And finally, I created some custom features that I hope will help my illustration. “CFx”, “CFy”, and “CFz” are intended to represent “consonant features x, y, and z” so that you can see how any other features that I omitted, but which might need to be included, would fit into my proposed solution. “VFa”, “VFb”, “VFc”, and “VFd” are abbreviations for “vowel features a, b, c, and d” and are intended to save me the risk of hazarding a guess as to what the relevant vowel features for this language might be.

And now for my proposed solution:

In prose:

Rule 1: nasalize the initial consonant of the root

Rule 2: nasalize the initial consonant of the reduplicative affix, DUP-

Rule 3: “correct” the POA of the nasal resulting from a glottal stop (from glottal to velar, since there is no glottal nasal consonant)

Rule 4: delete the nasal consonant that triggered the assimilation in rules 1 and 2

In code:

Rule 1: assimilation+reduplication rule.jpg

Interpretation: A stop will be transformed into a voiced nasal consonant at the same POA under certain conditions, namely when it is morpheme-initial (symbolized by the ‘+’ to left of the ___), and preceded by a consonant-vowel sequence which exactly matches it and the vowel that follows it; furthermore, that morpheme must also be preceded by a nasalized phoneme (i.e., a nasal consonant).

Rule 2:

assimilation+reduplication rule 6.jpg

Interpretation: A stop will be transformed into a voiced nasal consonant at the same POA under certain conditions, namely when it is morpheme-initial (symbolized by the ‘+’ to left of the ___), and preceded by a nasalized phoneme (i.e., a nasal consonant).

Rule 3:

assimilation+reduplication rule 3.jpg

Interpretation: A nasal consonant with glottal POA is ‘tweaked’ to make it a velar nasal (and thereby matching a phoneme in the language, instead of having a combination of features that doesn’t match anything).

Rule 4:

assimilation+reduplication rule 4.jpg

Interpretation: A nasal consonant deletes when it is morpheme-final and followed by another nasal consonant (the transformed stop, in our case).

APPROACH #2 (with hyphen)

My assumptions based on the data provided:

1) The reduplicative morpheme (DUP-) will be represented in the lexicon by means of an Affix Process Rule (APR) whereby DUP- + root of form CV... ==> CV--+CV... (i.e. a copy of the initial consonant and vowel of the root to which it is affixed; the second hyphen used here is the one inserted automatically by FLEx to indicate that it is a prefix, the first one is part of the form that HC sees and uses to parse).

2) The glottal stop in roots ... (same as in approach #1).

3) Only stops undergo ... (same as in approach #1).

4) All phonemes which have the feature “nas:+” ... (same as in approach #1).

5) The phonological system of this language ... (same as in approach #1)

In addition, I’m proposing the use of a custom phonological feature Point of Articulation (POA), ... (same as in approach #1)

For this approach, I have also created a Natural Class [A] which is defined to include all consonant and vowel phonemes in the language. Most importantly, this natural class specifically excludes the hyphen phoneme.

And now for my proposed solution:

In prose:

Rule 1: nasalize the initial consonant of the root

Rule 2: nasalize the initial consonant of the reduplicative affix, DUP-

Rule 3: “correct” the POA of the nasal resulting from a glottal stop (from glottal to velar, since there is no glottal nasal consonant)

Rule 4: delete the nasal consonant that triggered the assimilation in rules 1 and 2

Rule 5: delete any hyphens

In code:

Rule 1:

assimilation+reduplication rule 5.jpg

Interpretation: A stop will be transformed into a voiced nasal consonant at the same POA under certain conditions, namely when it is morpheme-initial (symbolized by the ‘+’ to left of the ___), and preceded by a morpheme composed of any number of consonant and/or vowel phonemes and ending with a hyphen (indicated by all that is between the two “+” signs, which symbolize morpheme boundaries; furthermore, that morpheme must also be preceded by a nasalized phoneme (i.e., a nasal consonant).

Rule 2:

assimilation+reduplication rule 6.jpg

Interpretation: A stop will be transformed into a voiced nasal consonant at the same POA under certain conditions, namely when it is morpheme-initial (symbolized by the ‘+’ to left of the ___), and preceded by a nasalized phoneme (i.e., a nasal consonant).

Rule 3:

assimilation+reduplication rule 3.jpg

Interpretation: A nasal consonant with glottal POA is ‘tweaked’ to make it a velar nasal (and thereby matching a phoneme in the language, instead of having a combination of features that doesn’t match anything).

Rule 4:

assimilation+reduplication rule 4.jpg

Interpretation: A nasal consonant deletes when it is morpheme-final and followed by another nasal consonant (the transformed stop, in our case).

Rule 5:

assimilation+reduplication rule 7.jpg

Interpretation: Any hyphen, regardless of context, is deleted. (In actual practice, such a general rule might provide HC with so many possibilities to look at that parsing anything would take a very long time. So, it might be necessary to narrow down the context somewhat to make the work manageable.)

FINAL REMARKS:

I have not tested the above proposals to verify that they do indeed correctly parse the words that you provided as data for this discussion, Allan. I’ve tried to reason them through carefully, but it’s possible that there is something that I overlooked. This is probably at least a good approximation of how “ugly” or “linguistically inelegant” the solution could be in using HC to parse these assimilating and reduplicating forms; someone may be able to improve them, but I don’t think it will get any worse than what I’ve presented here.

There are three things I don’t particularly like about these proposals. The obvious one is the fact that they don’t reflect the linguistic reality as transparently as we would hope for. The second is the complexity of Rule 1 in approach #1 or the use of the hyphen in approach #2. The third is the fact that I was unable to find a way to combine Rules 1 and 2 in either approach.

But there you have a couple of concrete proposals as to how HC might handle the language data you presented. I hope this is helpful to you, and perhaps others, as well.

Best wishes,

Kevin

image001.jpg

image002.jpg

image003.jpg

image004.jpg

image005.jpg

image006.jpg

Snofriacus

unread,

May 17, 2012, 12:57:52 AM5/17/12

to flex...@googlegroups.com

Hi Kevin,

Thanks for all your work on this. I look forward to testing out these ideas. At the moment I'm in the process of closing up files and computers and packing things into boxes, getting ready for a major change of location. Not sure when I'll be able to open up these files again, but hopefully that time isn't too far away. I do want to try out HC for a full analysis of the Ayta Mag-anchi language, and I expect that these things you've written up will help me get a jump start on understanding how it all works.

If some of the rest of you have examples of this sort and would like to take Kevin's ideas and run with them, that would be great. Please report back to the list on how it goes :)

Take care,
Allan

--

Reply all

Reply to author

Forward