--
You received this message because you are subscribed to the discussion group "FLEx list". This group is hosted by Google Groups and is open for anyone to browse.
To post to this group, send email to flex...@googlegroups.com
To unsubscribe from this group, send email to flex-list+...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/flex-list
Another couple of HC-related questions if you please.
I beleive remember reading in some FLEx news/release announcements since last summer that Hermit Crab now supports optional rules (i.e. a way to handle free variation). But I cannot find how to make a rule optional, nor can I find any trace of that news anymore, with google or whatever. Was it only in my dreams?
Also, how can one specify a zero morpheme in the rules? The reason I ask is that I made a final devoicing rule which works well with bare stems with no inflection (e.g. adverbs), and also with suffixes, but not with stems followed by a zero suffix. E.g. "vdruk" (< adv. "vdrug" 'suddenly') parses OK, but "luk" (< "lug" 'meadow') does not, failing during resynthesis.
In another project (Alutor) I've got several phonological rules for HC. One of them is supposed to delete dental consonants in word-initial position before (some) other consonants. So for example the verb root -tkepl- loses t- if there is no prefix, as in the infinitive "keplek" (< tkepl + k; e = schwa). However, the infinitive does not parse. In the Try a word window, the last regex for the rules unapplied does include an optional initial "t?", but all the boxes below which show affix guesses and remnants, start just with k.
Finally I simplified the rule to only have the specified phonemes instead of classes/features ("t > zero / # _ k), but the result is the same. On the other hand, a similar rule which cuts the word-final vowel works OK. Is there something special about the word-initial rules, or am I overlooking something else?
I'm attaching the output from the Try window; can send the backup if needed, of course.
All best,
Sasha
Fri, 27 Apr 2012 11:42:20 -0700 от Andy Black <andy_...@sil.org>:
Allan,
I will preface my remarks by freely admitting that I am strongly biased in favor of the HC parser, so I like your alternative interpretation of the subject line. As a result of my strong preference for the HC parser, I have worked much more extensively with it than with the XAMPLE parser, so my representation of the XAMPLE parser is likely to be incomplete or even unfair, so just keep that in mind as you read on. But having said that, I will say that my bias is due in large part to the frustration I experienced as a *linguist* trying to use the XAMPLE parser to handle the morphophonemic processes that were present in Puguli, the West African language I was trying to parse, so maybe that makes the bias valid - or maybe not, I'll let you decide.
I definitely agree with your statement that the HC parser is more intuitive than the XAMPLE parser, *IF* your intuition runs along linguistic lines (i.e., you're a linguist and/or knowledgeable about the linguistic workings of the language you're trying to parse). This is the aspect of the HC parser that I *love* - it mimics very closely the way I think about morphophonemic processes.
I am less sure about your statement about HC being less tedious than XAMPLE. Most of the tedium involved in the implementation of HC is in the initial setup (*rigorous* and *exhaustive* definition of phonemes as sets of phonological features), but after that, the slope of the curve becomes much flatter and the going is much easier. My experience with XAMPLE is that the slope of the curve remains about the same the whole way along, at least until you're confronted with a complex morphophonemic process or the interplay of several morphophonemic processes, at which point the curve becomes much steeper. In my case, I finally got to the point where the curve was so steep that I felt obligated to try the HC parser, which was experimental at that time, even though I am not at all an "early adopter".
As I consult with FLEx users who are interested in using one of the parsers, I usually advise them to use XAMPLE if the morphophonemics in the language they are working with is relatively uncomplicated (but most of those who come to me for help are not in this category). But as soon as I see a fair amount of complexity, especially where two or more processes affect the same morpheme boundary and the order in which they are applied is important, I recommend HC. It just handles situations like that with "elegance," and I have a great appreciation for an elegant and eloquent synopsis of a piece of linguistic reality.
As an example of this, there is a verb form in Puguli created by the suffixation of -ɔɔ̀ to the verb root (if these characters don’t come through, these are “open o”s or “backwards c”s and indicate low back round vowels, which in West Africa are pronounced with retracted tongue root), and that’s where the morphophonemics kick in. This suffix must be “harmonized” to the root vowel(s) for ATR value (if the root vowel is +ATR, the suffix vowels will change to “oo”), and for height (if the root vowel is “i” the suffix vowels will change to “uu”). If the root is monosyllabic, regressive spread of the high tone of the suffix “overwrites” the tone of the root vowel, making it high tone as well. If the root ends in a vowel, one of the vowels of the suffix must delete. And maybe there’s another process that I’m forgetting at the moment. In any case, that’s at least four processes that apply across the morpheme boundary in this context. Each of those processes can be described independently and made to interact appropriately in conjunction with HC. With XAMPLE, you would have to resort to describing the net effect of each possible combination of these multiple processes, resulting in a very long list of environments and conditions that does not intuitively correlate to the linguistic reality of four independent, but interactive processes.
When the morphophonemic process in question is one like the devoicing of a consonant when it ends up next to a voiceless consonant, the representation for either parser is fairly straightforward:
XAMPLE
Lexeme –bi
Allophone –pi / [S]__
Lexeme –gu
Allophone –ku / [S]__
where [S] is a Natural Class defined as the set of voiceless consonants in the language
HC
[+vcd] à [-vcd] / [-vcd]+__
However, even in a ‘simple’ situation like this, I prefer HC, for several reasons:
1) The symbol “[-vcd]” is more explicit as to what it represents than is “[S]”.
2) It is much more obvious in the HC expression than in the XAMPLE ones that the process at work in these examples is consonant devoicing. This fact would be much more evident if the lexemes and allophones under XAMPLE were written in a script that you don’t know how to read! (If I were looking at a database where these forms were written in an Indic script, for example, it would take me a very long time to figure out what was going on, whereas the HC representation would remain transparent.)
3) Using XAMPLE, every lexeme which changes form, depending on its morphophonemic context, must have one or more allomorphs declared, even if the pattern is widespread in the language. Thus, if suffixes –ga, -ge, -gi, -go, and –gu exist in the language and each one has an allomorph beginning with /k/, which occurs in the context described above, each suffix must have an allomorph explicitly listed in its lexical entry. In the case of HC, one rule suffices for all occurrences of the same process, even if there are hundreds of lexemes affected.
4) When adding a new lexeme to a database in which the parser has already been gotten to work, adding the lexeme is all that needs to be done in order for words built on that lexeme to parse with HC. With XAMPLE, all relevant allomorphs must also be added to the lexical entry before forms built on that lexeme can be parsed.
5) The expression of a linguistic process in a succinct statement with a scope of application that parallels its range of operation in the language is simply “elegant”!
As the morphophonological processes involved become more complex, the contrast between the two approaches becomes increasingly marked, and the argument for HC being preferred to XAMPLE becomes more and more compelling (at least to me).
And there you have a short introduction to my enamorment with the Hermit Crab parser and a few of the reasons why I prefer it to XAMPLE. Others who have more experience with XAMPLE should feel free to correct or rebut my undoubtedly skewed representation of reality. (Note that Andy has already alluded to some situations where HC cannot yet do what XAMPLE is capable of doing.)
Blessings,
Kevin Warfel
-----Original Message-----
From: flex...@googlegroups.com [mailto:flex...@googlegroups.com] On Behalf Of Snofriacus
Sent: Monday, April 30, 2012 11:10 PM
To: flex...@googlegroups.com
Subject: Re: [FLEx] HC rules
Hey guys,
--
XAMPLE
Lexeme –bi
Allophone –pi / [S]__
Lexeme –gu
Allophone –ku / [S]__
where [S] is a Natural Class defined as the set of voiceless consonants in the language
HC
[+vcd] à [-vcd] / [-vcd]+__
Hi Allan,
I’ve inserted the prose statement between the two versions of the “rule” below. (I put “rule” in quotes here because XAMPLE uses allomorphs rather than rules to formulate the effects of a morphophonological pattern in the language.)
Kevin
From: flex...@googlegroups.com [mailto:flex...@googlegroups.com] On Behalf Of Snofriacus
Sent: Tuesday, May 01, 2012 8:58 PM
To: flex...@googlegroups.com
Subject: Re: [FLEx] HC rules
On 5/2/2012 2:29 AM, Kevin Warfel wrote:
XAMPLE
Lexeme bi
Allophone pi / [S]__
Lexeme gu
Allophone ku / [S]__
where [S] is a Natural Class defined as the set of voiceless consonants in the language
Both of these illustrate the fact that a voiced phoneme becomes devoiced when it follows a voiceless phoneme. In actual practice, the rule as expressed for HC is likely to be too general because it will apply the devoicing to all phonemes (including vowels, for example) and the application is likely to be less general than that, so other features would have to be bundled together with the “voicing” feature to restrict its application to consonants only or to stops only, according to the scope of the process in the language.
HC
[+vcd] --> [-vcd] / [-vcd]+__ (I reinserted the arrow that disappeared along the way.)
The “+” in this rule indicates a morpheme boundary, something that XAMPLE does not (cannot) take into consideration, as I understand it.
Hi Kevin,
Thanks for these good details. Could you give a prose statement of what this HC rule is saying? I think I'm understanding parts of it, but not fully getting it.
Allan
--
I've not used either parser extensively, but it seems to me that HC should be the default parser.
Maybe it depends on the language, but I was trying to model Indonesian
Jon,
You wrote:
Maybe it depends on the language, but I was trying to model Indonesian early on and immediately hit a huge snag with XAMPLE. Indonesian is a relatively simple language in terms of phonology and morphophonemics, but it does have one process that's complex (two-step) and affects a huge number of verbs. The active voice prefix blows away the first consonant of the verb root if that consonant is voiceless and not part of a cluster.
meN- potong --> memotong
meN- tutup --> menutup
meN- sisir --> menyisir
meN- kasihi --> mengasihi
So, the nasal assimilates first, and then the consonant deletes (perhaps because the nasal is carrying its place feature anyway). This two-step process is typically called "nasal substitution".
This is similar to the problem you described under (3) in your message, but instead of giving every suffix an extra allomorph, I would have had to give extra allomorphs to hundreds of verb roots! (I.e. to every root that is affected by the deletion rule.) It's much more satisfying to be able to use rules.
In response, I would say that this is an excellent example of a real-life situation where one can get the XAMPLE parser to “come up with the right answer,” but where the HC parser is not only more intuitive – it’s less work, too.
Kevin
Thanks, Andy, for putting this positive spin on the number of bugs that have been coming to light in the HC parser. (Maybe you didn’t see what you wrote as positive, but I’m interpreting it that way.) To me this means that more people are trying to use HC (Hallelujah!), and an increasing number of individuals working in a variety of languages translates into “stressing” aspects of the parser that haven’t been tested quite that same way before, so we find the bugs. That in turn means – assuming that the bugs will get fixed – that we are hastening the day when the HC parser can be touted as the default, a goal of mine for some time now. And so I see this as a positive thing. But I agree with you that we are not yet at the point of being able to recommend HC as the default parser.
Kevin
From: flex...@googlegroups.com [mailto:flex...@googlegroups.com] On Behalf Of Andy Black
Sent: Friday, May 04, 2012 4:53 PM
To: flex...@googlegroups.com
Subject: Re: [FLEx] HC rules
On 5/4/2012 1:00 PM, J V C wrote:
You mentioned that there really is a phonemic glottal consonant there, even though the orthography doesn't reflect it. Would the linguistic analysis be more satisfying if the phonological rules made direct reference to that? Just taking a wild stab at it, something along these lines...?
maN- CV 'isda
maN- 'i'isda�� redup
mang'i'isda�� nasal assim.
mangi'isda�� glottal deletion
mangingisda�� consonant ?influence?
That way, you wouldn't have to treat the reduplication as infixing. The deletion of the first glottal seems easy enough, although you'd have to explain why the second glottal changes to /ng/. Still, both /ng/ and glottal are [+back], right? And you could appeal to the nearby influence of the first /ng/.
Of course, if you're just parsing orthographic forms and not writing up a phonology, you might not want the parser outputting those glottals as it breaks things down in the reverse order. But maybe there's a way around that, such as a "glottal" insertion rule producing 'isda from isda. Also, I guess there could be problems if certain roots really do begin with a vowel and no glottal, but that may or may not be relevant for these languages.
Jon
On 05/09/2012 10:38 AM, Snofriacus wrote:
I'm not in a position to actually start experimenting with this in Hermit Crab right now, but want to ask this while I'm thinking of it. I don't think there are any words that I've been unable to parse using the Ample parser, but some of the solutions don't seem to represent the linguistic reality very well. The case I've found most unsatisfying is similar to what Jon presented below, but with the additional dimension of a reduplication. I used to have some examples with the nasal "N" assimilating to other consonants besides glottal, but the two below are the only ones I'm remembering right now:
mangingisda (Tagalog)
maN- CV isda
pangongoman (Botolan Sambal)
paN- CV oman
Note that I haven't specified what sort of affix the CV is. Typically CV reduplication before a root is viewed as a prefix. If this CV were a prefix, we'd get this:
But this isn't what we get. The syllable that's reduplicated is "ngi", which doesn't exist until after "maN-" has been joined to and assimilated to "isda". So to get mangingisda, the order of operations has to be reversed:
- CV- isda -> iisda (The C of the CV is a glottal, which in this context isn't written in the orthography)
- maN- iisda -> mangiisda (The nasal "N" of the prefix becomes "ng" before a glottal)
But for the CV to apply at this point in the process, it needs to be applied as an infix, not a prefix.
- maN- isda -> mangisda
- <CV> mangisda -> mangi<ngi>sda
So my question for Hermit Crab - and what I'd try if I were set up to do some testing - is would it be capable of modeling this process in a way that reflects these linguistic observations?
Allan
On 5/8/2012 11:24 PM, Kevin Warfel wrote:
�
Jon,
�
You wrote:
Maybe it depends on the language, but I was trying to model Indonesian early on and immediately hit a huge snag with XAMPLE. Indonesian is a relatively simple language in terms of phonology and morphophonemics, but it does have one process that's complex (two-step) and affects a huge number of verbs. The active voice prefix blows away the first consonant of the verb root if that consonant is voiceless and not part of a cluster.
meN- potong --> memotong
meN- tutup --> menutup
meN- sisir --> menyisir
meN- kasihi --> mengasihi
So, the nasal assimilates first, and then the consonant deletes (perhaps because the nasal is carrying its place feature anyway). This two-step process is typically called "nasal substitution".
This is similar to the problem you described under (3) in your message, but instead of giving every suffix an extra allomorph, I would have had to give extra allomorphs to hundreds of verb roots! (I.e. to every root that is affected by the deletion rule.) It's much more satisfying to be able to use rules.
In response, I would say that this is an excellent example of a real-life situation where one can get the XAMPLE parser to �come up with the right answer,� but where the HC parser is not only more intuitive � it�s less work, too.
�
Kevin
--
You received this message because you are subscribed to the discussion group "FLEx list". This group is hosted by Google Groups and is open for anyone to browse.
To post to this group, send email to flex...@googlegroups.com
To unsubscribe from this group, send email to flex-list+...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/flex-list
Interesting thoughts. But those last couple of steps don't seem much happier than the workarounds needed with Ample. Avoiding a reduplicating infix isn't what I'm after at all. What I'm looking for is a parser that -can- handle this analysis. I'm hoping that maybe HC is it.
If I find my other examples that don't involve glottals I'll post those. With those it should be easier to clearly see what's happening, and what features of a parser would be needed to correctly model it.
Oh, ok. Here they are:
"panonokho" (temptation), from "tokho"
"ampangangailangan" (needing), from "kailangan"
"pamumuhay" (living), from "buhay"
Take the first one, "panonokho". The straightforward analysis using processes that are known to happen in the language is:
So I'm hoping to find out whether HC can model this. Some might argue that a reduplicating infix -isn't- a process that's known to happen in the language, since most of the CV reduplication happens as prefixes. But the CV reduplication that we model as prefixes is actually ambiguous as to order. Every case can just as easily be interpreted as infixing reduplication. We have traditionally chosen to call it a prefix just because it seems easier to parse this way. So if you have lots of cases which can be viewed as either prefixing or infixing reduplication, and a few diagnostic cases which demand an infixing interpretation, the nice solution is to simply view it all as infixing reduplication.
- paN- tokho -> panokho (nasal assimilation and subsequent deletion of the consonant it assimilated to)
- -CV- panokho -> panonokho (I don't think it's possible to determine which "no" is the original and which is the copy)
I wasn't intending to get into linguistic argumentation here, and shouldn't spend too much time on this right now. Basically just wanting to get a feel for what HC can model by throwing some interesting data at it.
Allan
On 5/9/2012 3:29 PM, J V C wrote:
You mentioned that there really is a phonemic glottal consonant there, even though the orthography doesn't reflect it. Would the linguistic analysis be more satisfying if the phonological rules made direct reference to that? Just taking a wild stab at it, something along these lines...?
maN- CV 'isda
maN- 'i'isda redup
mang'i'isda nasal assim.
mangi'isda glottal deletion
mangingisda consonant ?influence?
That way, you wouldn't have to treat the reduplication as infixing. The deletion of the first glottal seems easy enough, although you'd have to explain why the second glottal changes to /ng/. Still, both /ng/ and glottal are [+back], right? And you could appeal to the nearby influence of the first /ng/.
Of course, if you're just parsing orthographic forms and not writing up a phonology, you might not want the parser outputting those glottals as it breaks things down in the reverse order. But maybe there's a way around that, such as a "glottal" insertion rule producing 'isda from isda. Also, I guess there could be problems if certain roots really do begin with a vowel and no glottal, but that may or may not be relevant for these languages.
Jon
On 05/09/2012 10:38 AM, Snofriacus wrote:
I'm not in a position to actually start experimenting with this in Hermit Crab right now, but want to ask this while I'm thinking of it. I don't think there are any words that I've been unable to parse using the Ample parser, but some of the solutions don't seem to represent the linguistic reality very well. The case I've found most unsatisfying is similar to what Jon presented below, but with the additional dimension of a reduplication. I used to have some examples with the nasal "N" assimilating to other consonants besides glottal, but the two below are the only ones I'm remembering right now:
mangingisda (Tagalog)
maN- CV isda
pangongoman (Botolan Sambal)
paN- CV oman
Note that I haven't specified what sort of affix the CV is. Typically CV reduplication before a root is viewed as a prefix. If this CV were a prefix, we'd get this:
But this isn't what we get. The syllable that's reduplicated is "ngi", which doesn't exist until after "maN-" has been joined to and assimilated to "isda". So to get mangingisda, the order of operations has to be reversed:
- CV- isda -> iisda (The C of the CV is a glottal, which in this context isn't written in the orthography)
- maN- iisda -> mangiisda (The nasal "N" of the prefix becomes "ng" before a glottal)
But for the CV to apply at this point in the process, it needs to be applied as an infix, not a prefix.
- maN- isda -> mangisda
- <CV> mangisda -> mangi<ngi>sda
So my question for Hermit Crab - and what I'd try if I were set up to do some testing - is would it be capable of modeling this process in a way that reflects these linguistic observations?
Allan
On 5/8/2012 11:24 PM, Kevin Warfel wrote:
Jon,
You wrote:
Maybe it depends on the language, but I was trying to model Indonesian early on and immediately hit a huge snag with XAMPLE. Indonesian is a relatively simple language in terms of phonology and morphophonemics, but it does have one process that's complex (two-step) and affects a huge number of verbs. The active voice prefix blows away the first consonant of the verb root if that consonant is voiceless and not part of a cluster.
meN- potong --> memotong
meN- tutup --> menutup
meN- sisir --> menyisir
meN- kasihi --> mengasihi
So, the nasal assimilates first, and then the consonant deletes (perhaps because the nasal is carrying its place feature anyway). This two-step process is typically called "nasal substitution".
This is similar to the problem you described under (3) in your message, but instead of giving every suffix an extra allomorph, I would have had to give extra allomorphs to hundreds of verb roots! (I.e. to every root that is affected by the deletion rule.) It's much more satisfying to be able to use rules.
In response, I would say that this is an excellent example of a real-life situation where one can get the XAMPLE parser to “come up with the right answer,” but where the HC parser is not only more intuitive – it’s less work, too.
Kevin
--
You received this message because you are subscribed to the discussion group "FLEx list". This group is hosted by Google Groups and is open for anyone to browse.
To post to this group, send email to flex...@googlegroups.com
To unsubscribe from this group, send email to flex-list+...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/flex-list
--
You received this message because you are subscribed to the discussion group "FLEx list". This group is hosted by Google Groups and is open for anyone to browse.
To post to this group, send email to flex...@googlegroups.com
To unsubscribe from this group, send email to flex-list+...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/flex-list
I've been wondering about this some more and thinking about a similar situation in Indonesian:
meN- DUP pukul� --> memukul-mukul
meN- DUP singgung --> menyinggung-nyinggung
and so forth
I went and looked at the document Andy mentioned (Help, Resources, Intro to Parsing), and lo and behold, he's already got a rule for exactly that scenario; rule 197. (Section B.1.2.1.4.2 Unspecified nasal and full reduplication in Bahasa Indonesia) I suspect something very similar would work for yours as well.
Wow. I need to go re-read that document as an HC user, rather than from an XAmple perspective.
Jon
On 05/09/2012 5:05 PM, Snofriacus wrote:
Interesting thoughts. But those last couple of steps don't seem much happier than the workarounds needed with Ample. Avoiding a reduplicating infix isn't what I'm after at all. What I'm looking for is a parser that -can- handle this analysis. I'm hoping that maybe HC is it.
If I find my other examples that don't involve glottals I'll post those. With those it should be easier to clearly see what's happening, and what features of a parser would be needed to correctly model it.
Oh, ok. Here they are:
"panonokho" (temptation), from "tokho"
"ampangangailangan" (needing), from "kailangan"
"pamumuhay" (living), from "buhay"
Take the first one, "panonokho". The straightforward analysis using processes that are known to happen in the language is:
So I'm hoping to find out whether HC can model this. Some might argue that a reduplicating infix -isn't- a process that's known to happen in the language, since most of the CV reduplication happens as prefixes. But the CV reduplication that we model as prefixes is actually ambiguous as to order. Every case can just as easily be interpreted as infixing reduplication. We have traditionally chosen to call it a prefix just because it seems easier to parse this way. So if you have lots of cases which can be viewed as either prefixing or infixing reduplication, and a few diagnostic cases which demand an infixing interpretation, the nice solution is to simply view it all as infixing reduplication.
- paN- tokho -> panokho (nasal assimilation and subsequent deletion of the consonant it assimilated to)
- -CV- panokho -> panonokho (I don't think it's possible to determine which "no" is the original and which is the copy)
I wasn't intending to get into linguistic argumentation here, and shouldn't spend too much time on this right now. Basically just wanting to get a feel for what HC can model by throwing some interesting data at it.
Allan
On 5/9/2012 3:29 PM, J V C wrote:
You mentioned that there really is a phonemic glottal consonant there, even though the orthography doesn't reflect it. Would the linguistic analysis be more satisfying if the phonological rules made direct reference to that? Just taking a wild stab at it, something along these lines...?
maN- CV 'isda
maN- 'i'isda�� redup
mang'i'isda�� nasal assim.
mangi'isda�� glottal deletion
mangingisda�� consonant ?influence?
That way, you wouldn't have to treat the reduplication as infixing. The deletion of the first glottal seems easy enough, although you'd have to explain why the second glottal changes to /ng/. Still, both /ng/ and glottal are [+back], right? And you could appeal to the nearby influence of the first /ng/.
Of course, if you're just parsing orthographic forms and not writing up a phonology, you might not want the parser outputting those glottals as it breaks things down in the reverse order. But maybe there's a way around that, such as a "glottal" insertion rule producing 'isda from isda. Also, I guess there could be problems if certain roots really do begin with a vowel and no glottal, but that may or may not be relevant for these languages.
Jon
On 05/09/2012 10:38 AM, Snofriacus wrote:
I'm not in a position to actually start experimenting with this in Hermit Crab right now, but want to ask this while I'm thinking of it. I don't think there are any words that I've been unable to parse using the Ample parser, but some of the solutions don't seem to represent the linguistic reality very well. The case I've found most unsatisfying is similar to what Jon presented below, but with the additional dimension of a reduplication. I used to have some examples with the nasal "N" assimilating to other consonants besides glottal, but the two below are the only ones I'm remembering right now:
mangingisda (Tagalog)
maN- CV isda
pangongoman (Botolan Sambal)
paN- CV oman
Note that I haven't specified what sort of affix the CV is. Typically CV reduplication before a root is viewed as a prefix. If this CV were a prefix, we'd get this:
But this isn't what we get. The syllable that's reduplicated is "ngi", which doesn't exist until after "maN-" has been joined to and assimilated to "isda". So to get mangingisda, the order of operations has to be reversed:
- CV- isda -> iisda (The C of the CV is a glottal, which in this context isn't written in the orthography)
- maN- iisda -> mangiisda (The nasal "N" of the prefix becomes "ng" before a glottal)
But for the CV to apply at this point in the process, it needs to be applied as an infix, not a prefix.
- maN- isda -> mangisda
- <CV> mangisda -> mangi<ngi>sda
So my question for Hermit Crab - and what I'd try if I were set up to do some testing - is would it be capable of modeling this process in a way that reflects these linguistic observations?
Allan
On 5/8/2012 11:24 PM, Kevin Warfel wrote:
�
Jon,
�
You wrote:
Maybe it depends on the language, but I was trying to model Indonesian early on and immediately hit a huge snag with XAMPLE. Indonesian is a relatively simple language in terms of phonology and morphophonemics, but it does have one process that's complex (two-step) and affects a huge number of verbs. The active voice prefix blows away the first consonant of the verb root if that consonant is voiceless and not part of a cluster.
meN- potong --> memotong
meN- tutup --> menutup
meN- sisir --> menyisir
meN- kasihi --> mengasihi
So, the nasal assimilates first, and then the consonant deletes (perhaps because the nasal is carrying its place feature anyway). This two-step process is typically called "nasal substitution".
This is similar to the problem you described under (3) in your message, but instead of giving every suffix an extra allomorph, I would have had to give extra allomorphs to hundreds of verb roots! (I.e. to every root that is affected by the deletion rule.) It's much more satisfying to be able to use rules.
In response, I would say that this is an excellent example of a real-life situation where one can get the XAMPLE parser to �come up with the right answer,� but where the HC parser is not only more intuitive � it�s less work, too.
�
Kevin
--
You received this message because you are subscribed to the discussion group "FLEx list". This group is hosted by Google Groups and is open for anyone to browse.
To post to this group, send email to flex...@googlegroups.com
To unsubscribe from this group, send email to flex-list+...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/flex-list
--
You received this message because you are subscribed to the discussion group "FLEx list". This group is hosted by Google Groups and is open for anyone to browse.
To post to this group, send email to flex...@googlegroups.com
To unsubscribe from this group, send email to flex-list+...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/flex-list
--
You received this message because you are subscribed to the discussion group "FLEx list". This group is hosted by Google Groups and is open for anyone to browse.
To post to this group, send email to flex...@googlegroups.com
To unsubscribe from this group, send email to flex-list+...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/flex-list
Hi Jon,
...The Indonesian data in Andy's document does seem to be using the same kind of process, only with a bigger piece being reduplicated. ... On first look the parsing solution appears to be doing essentially what you were suggesting - first reduplicating, and then doing the assimilation and deletion twice.
... [The second assimilation and deletion rule] appears to be depending on the orthographic hyphen to correctly identify the second "p" to be changed to "m". So it should work for generating "memipil-mipil" from "meN + DUP + pipil". If I'm understanding this correctly, then it might not reliably work for generating "panonokho" from "paN + CV + tokho", where there's no orthographic cue showing where the second copy of the CV begins. Though in this case maybe it would be possible to specify "change 't' to 'n' only for the very next consonant". If there's a way for HC to keep track of morpheme breaks, it seems like some reference to that would be helpful.
Could HC put in "-" as a temporary morpheme break marker and then match and delete it when it does the second assimilation & deletion?
As for linguistic reality though, is anybody else bothered by having to specify the assimilation and deletion twice? Or is that just me? The first assimilation and deletion makes sense. We can see how the context motivates it. But this analysis has the second assimilation and deletion occurring after it's been removed from the context that motivates it. Isn't there some other way HC could approach this in which the assimilation and deletion happens only once, and then the result is just duplicated?
Hi Allan,
Andy has already responded to the effect that the HC parser is not yet able (though the potential is there in the design) to do what we would really like to be able to transparently describe in this situation, namely that the form of a reduplicative affix is based on the result of the interaction of a nasal consonant and a stop that end up being separated by the reduplicative affix in the final form, so that there seem to be strata in the formation of the word, with morphophonological rules applying to each stratum before passing the result to the next one.
My goal in this response, then, is to attempt to get at a solution via HC with its current capabilities and limitations. The questions I asked myself were two: “Is it possible to produce a correct parse using HC in this situation?” and (assuming that the answer to the first question is ‘yes’) “How linguistically inelegant would the solution have to be in order to work?”
Now, I am the first to admit that my attempts at problem-solving often result in solutions that are more cumbersome than they would need to be, so I don’t often get points for “elegance” (though I do value elegance). You (or anyone else) are more than welcome to suggest improvements to my proposed way of handling this. But I think I have found two different approaches that would enable HC to correctly parse forms of the type you described.
In my first approach, I have tried to accomplish the task without the use of an ‘artificial hyphen’ to condition the nasal assimilation. What I came up with will probably appear a bit complicated to someone who is unfamiliar with HC, so my second approach uses the ‘artificial hyphen’ which someone else suggested earlier. That approach simplifies the appearance of the first rule, but makes it necessary to add an extra rule to delete the unwanted hyphen once it has triggered the desired changes.
First of all, your data (where “DUP-” represents the reduplicative morpheme):
panonokho = paN- DUP- tokho
ampangangailanga = aN- paN- DUP- kailangan
pamumuhay = paN- DUP- buhay
pangongoman = paN- DUP- oman
(I believe these are all words from Botolan Sambal; I have set aside the example from Tagalog, so as to try to stay within the same language, though it seems to work the same way.)
APPROACH #1 (no hyphen)
My assumptions based on the data provided:
1) The reduplicative morpheme (DUP-) will be represented in the lexicon by means of an Affix Process Rule (APR) whereby DUP- + root of form CV... ==> CV-+CV... (i.e. a copy of the initial consonant and vowel of the root to which it is affixed; the hyphen used here is the one inserted automatically by FLEx to indicate that it is a prefix, it is not part of the form that HC sees).
2) The glottal stop in roots that have one in the initial position (e.g., “oman”) will be represented in the lexicon with a character in the lexeme form, but without that initial character in the citation form. This gives HC a ‘consonant’ to parse with, but it doesn’t show up in the dictionary or orthography. The exact character chosen to represent the glottal stop is unimportant, so long as it is declared as a phoneme and has the phonological features of a stop with a glottal point of articulation.
3) Only stops undergo the assimilation described (unlike the example in Bahasa Indonesia in the Introduction to Parsing document referred to, where /s/ also becomes nasalized).
4) All phonemes which have the feature “nas:+” (i.e., which are nasalized) are consonants.
5) The phonological system of this language will use at least the following phonological features, which have values of “+” or “-”:
consonantal (cons)
continuant (cont)
sonorant (son)
voiced (voiced)
nasal (nas)
In addition, I’m proposing the use of a custom phonological feature Point of Articulation (POA), which has values of “labial”, “alveolar”, “velar”, “glottal”, and any others that might be necessary (e.g., palatal). This is, so far as I can determine, the same thing as is called OrthPlace in Introduction to Parsing, but happens to be my term of choice. The precise term used is not important; *how* it is used is!
And finally, I created some custom features that I hope will help my illustration. “CFx”, “CFy”, and “CFz” are intended to represent “consonant features x, y, and z” so that you can see how any other features that I omitted, but which might need to be included, would fit into my proposed solution. “VFa”, “VFb”, “VFc”, and “VFd” are abbreviations for “vowel features a, b, c, and d” and are intended to save me the risk of hazarding a guess as to what the relevant vowel features for this language might be.
And now for my proposed solution:
In prose:
Rule 1: nasalize the initial consonant of the root
Rule 2: nasalize the initial consonant of the reduplicative affix, DUP-
Rule 3: “correct” the POA of the nasal resulting from a glottal stop (from glottal to velar, since there is no glottal nasal consonant)
Rule 4: delete the nasal consonant that triggered the assimilation in rules 1 and 2
In code:
Rule 1: 
Interpretation: A stop will be transformed into a voiced nasal consonant at the same POA under certain conditions, namely when it is morpheme-initial (symbolized by the ‘+’ to left of the ___), and preceded by a consonant-vowel sequence which exactly matches it and the vowel that follows it; furthermore, that morpheme must also be preceded by a nasalized phoneme (i.e., a nasal consonant).
Rule 2:

Interpretation: A stop will be transformed into a voiced nasal consonant at the same POA under certain conditions, namely when it is morpheme-initial (symbolized by the ‘+’ to left of the ___), and preceded by a nasalized phoneme (i.e., a nasal consonant).
Rule 3:

Interpretation: A nasal consonant with glottal POA is ‘tweaked’ to make it a velar nasal (and thereby matching a phoneme in the language, instead of having a combination of features that doesn’t match anything).
Rule 4:

Interpretation: A nasal consonant deletes when it is morpheme-final and followed by another nasal consonant (the transformed stop, in our case).
APPROACH #2 (with hyphen)
My assumptions based on the data provided:
1) The reduplicative morpheme (DUP-) will be represented in the lexicon by means of an Affix Process Rule (APR) whereby DUP- + root of form CV... ==> CV--+CV... (i.e. a copy of the initial consonant and vowel of the root to which it is affixed; the second hyphen used here is the one inserted automatically by FLEx to indicate that it is a prefix, the first one is part of the form that HC sees and uses to parse).
2) The glottal stop in roots ... (same as in approach #1).
3) Only stops undergo ... (same as in approach #1).
4) All phonemes which have the feature “nas:+” ... (same as in approach #1).
5) The phonological system of this language ... (same as in approach #1)
In addition, I’m proposing the use of a custom phonological feature Point of Articulation (POA), ... (same as in approach #1)
For this approach, I have also created a Natural Class [A] which is defined to include all consonant and vowel phonemes in the language. Most importantly, this natural class specifically excludes the hyphen phoneme.
And now for my proposed solution:
In prose:
Rule 1: nasalize the initial consonant of the root
Rule 2: nasalize the initial consonant of the reduplicative affix, DUP-
Rule 3: “correct” the POA of the nasal resulting from a glottal stop (from glottal to velar, since there is no glottal nasal consonant)
Rule 4: delete the nasal consonant that triggered the assimilation in rules 1 and 2
Rule 5: delete any hyphens
In code:
Rule 1:

Interpretation: A stop will be transformed into a voiced nasal consonant at the same POA under certain conditions, namely when it is morpheme-initial (symbolized by the ‘+’ to left of the ___), and preceded by a morpheme composed of any number of consonant and/or vowel phonemes and ending with a hyphen (indicated by all that is between the two “+” signs, which symbolize morpheme boundaries; furthermore, that morpheme must also be preceded by a nasalized phoneme (i.e., a nasal consonant).
Rule 2:

Interpretation: A stop will be transformed into a voiced nasal consonant at the same POA under certain conditions, namely when it is morpheme-initial (symbolized by the ‘+’ to left of the ___), and preceded by a nasalized phoneme (i.e., a nasal consonant).
Rule 3:

Interpretation: A nasal consonant with glottal POA is ‘tweaked’ to make it a velar nasal (and thereby matching a phoneme in the language, instead of having a combination of features that doesn’t match anything).
Rule 4:

Interpretation: A nasal consonant deletes when it is morpheme-final and followed by another nasal consonant (the transformed stop, in our case).
Rule 5:

Interpretation: Any hyphen, regardless of context, is deleted. (In actual practice, such a general rule might provide HC with so many possibilities to look at that parsing anything would take a very long time. So, it might be necessary to narrow down the context somewhat to make the work manageable.)
FINAL REMARKS:
I have not tested the above proposals to verify that they do indeed correctly parse the words that you provided as data for this discussion, Allan. I’ve tried to reason them through carefully, but it’s possible that there is something that I overlooked. This is probably at least a good approximation of how “ugly” or “linguistically inelegant” the solution could be in using HC to parse these assimilating and reduplicating forms; someone may be able to improve them, but I don’t think it will get any worse than what I’ve presented here.
There are three things I don’t particularly like about these proposals. The obvious one is the fact that they don’t reflect the linguistic reality as transparently as we would hope for. The second is the complexity of Rule 1 in approach #1 or the use of the hyphen in approach #2. The third is the fact that I was unable to find a way to combine Rules 1 and 2 in either approach.
But there you have a couple of concrete proposals as to how HC might handle the language data you presented. I hope this is helpful to you, and perhaps others, as well.
Best wishes,
Kevin
--