--
"FLEx list" messages are public. Only members can post.
flex_d...@sil.org
http://groups.google.com/group/flex-list.
---
You received this message because you are subscribed to the Google Groups "FLEx list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to flex-list+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/flex-list/fa78c5fa-112e-4e71-8f1a-42de6f95770an%40googlegroups.com.
Dear Andy, all,
A quick follow-up and one further question.
Thanks, Andy, for replying and pointing this out.
I was aware of these empty positions. They crept in when, perhaps with some update of FLEX?, some natural classes I used in my phonological and affix rules where substituted by “automatically generated class for… [some rule]”, which I deleted, which made the slots go empty.
Still, these empty slots and the errors they generated were not responsible for the errors that I reported here a few weeks ago, because some parses worked despite them.
Just a follow-up: With a lot of work, I was able to make the parsing of my FLEX project functional again. In some cases, I had to delete a lexical entry and re-create it with apparently exactly the same content. I also discovered that, apparently, the ordering of affix rules is also significant, not only the ordering of regular allomorph forms restricted to environment.
The only remnant that is still there from the “crisis” (that I am aware of) are a number of spurious entries which show up when I ask for seeing all detailed steps in parsing, entries which are described as “Automatically generated null affix for the […] irregularly inflected form”. I do not know why these entries were created, nor how to access them and less so how to delete them. Can you give me a hint here? I suspect they slow the parsing down.
Thanks again for your support,
Sebastian
Thanks a lot, Andy!
IAs to the null affixes – I see. They were not there a few months back, so it seems in one of the updates the parsing mechanism was altered in some way.
I hope it works from now on…
Thanks again for your support!
Sebastian
--
"FLEx list" messages are public. Only members can post.
flex_d...@sil.org
http://groups.google.com/group/flex-list.
---
You received this message because you are subscribed to the Google Groups "FLEx list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to flex-list+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/flex-list/6c76ba94-b3ab-40ed-8871-5530c3ffb76bn%40googlegroups.com.
Dear Andy, Mike, all,
thanks so much for your very helpful replies, again.
My answers, and open problems, below here:
The question about the (optional) glottal stop deletion rule has the following background:
Glottal stops are phonemic in Awetí and only occur before vowels, mostly between vowels, as they often get elided when in a (rare) consonantal encounter.
There are many minimal pairs of stems one of which begins with the glottal stop, the other one without, such as /ok/ 'house' vs. /?ok/ <'ok> 'tuber'. They take different prefix-allomorphs (cf. tok 'his house' vs. i'ok 'its (the plant's) tuber').
Now, word initially, the glottal stop is often dropped, at least in writing. So some may write 'ok 'tuber (in general)', but others write ok, which becomes ambiguous, also meaning 'house (in general)'.
The two rules were my attempt at formulating this optional rule.
I tested that many months ago, and it seemed to work. My problem
with using a phonological rule is that the rule cannot be optional
(I asked this group about optional "phonological" (in fact, rather
orthographic) rules), which means that the variant spelling <'ok>
would not be recognized. The implementation as an affix also has
the advantage that I can mark the resulting form as
"not-relative", which means that it can not be combined with
prefixes. I need that category independently.
It is certainly a very good idea to reduce the search space by
using ' [V] X as the input.
I followed your, Andy, other hints with reducing the in some
cases spurious (leftover from earlier formulations) empty-affix
rules. Thanks for checking this so thoroughly.
To my surprise, indeed, leaving only the glottal stop deletion
rule explicit still does make it optional, as the general empty
affix rule apparently also holds (which I find weird).
With your suggested changes, parsing has indeed improved somewhat, but other aspects do still not work. For instance, the form tupejat 'the one who is staying' composed of t- up -eju -at does not parse at all; I tried letting it run for hours and hours. Not even toat (to -at)
'the one who goes/went' seems to be parsing, at least not in a
reasonable time (I gave up after 5 minutes). I believe that an initial
or and/or final t make the parsing extremely slow, but I do not know how to fix that.
What I also see and do not understand is how empty affixes are dealt
with by the 'try a word' procedure, as the outcome is quite
heterogeneous, in some cases it appears explicitly, in others not.
Take the two forms ok and 'ok, which I mentioned earlier.
Here are the results for ok:
The output is correct, in principle: besides the absolute form of ok 'house' (the third parse), it can be the unmarked-for-person form of both 'ok and ok,
and the parsing details show that
the empty 0.pers-prefix has been applied also in the case of the first
parse, but there it is not shown in the results (which is what I prefer,
actually), but is made explicit in the second parse. Why?
(There could also be a parse recognizing that the form could also be the
spelling variant without the glottal stop of the absolute (prefix-less)
form of 'ok, but I have not formulated any rule for that
(because that would imply formulating either a phonological rule, which
can not be optional, or an empty affix rule, but that form is
affix-less, which is exactly the point).
Here is the output for 'ok:
This is also correct, the form can be analyzed as the absolute form, or as the unmarked-for-person relative form, and that is what is happening underneath, as the parsing details show. They do not show the empty prefix in the results themselves (which is what I prefer), but why are these then shown in the case of the second parse of ok?
So these are the open questions. The latter ones are more for my understanding of how the parsing works internally, but the not parsing of tupejat or toat is indeed a problem.
Thanks again in advance for your help.
Best, Sebastian
Sometimes a language has free fluctuation between two allomorphs of a morpheme. In such cases, one should create both allomorphs and condition them exactly the same way (in terms of environments and inflection classes). One should also order them one after the other. The FieldWorks Language Explorer default parser will try both forms in such cases.
To view this discussion on the web visit https://groups.google.com/d/msgid/flex-list/ade4101f-18d8-45f6-9593-5a1ab65253e6n%40googlegroups.com.
Thanks a lot indeed for your time and effort, Andy.
Really much appreciated.
A general comment: I am trying to model the linguistic analysis
that I believe to be accurate for Awetí in FLEX, with some
necessary adaptions. For one thing, I actually do not believe in
0-morphemes (just in affix-less forms), but in order to implement
situations where affixes only occur with a certain class of stems
or only in forms that belong to a certain category (inflection
feature), while other forms do not have any affix, I understand
that 0-morphemes are necessary. If I can do without, I am all in
for that.
To your individual observations and suggestions:
Looking through the data, I notice that there are several affixes which have both allomorphs and variant forms where the variant forms are the same as the allomorphs. Since both the main entry (and all its allomorphs) and the variant forms are tried, this produces double work any time any of the these forms is found. Is there a reason why you need affixes to have variant forms?
Usually, the variants are marked as "abstract forms" and are, if I understood that correctly, not considered by the parser. They are there in order to appear in the dictionary, so that somebody who does not know the language and comes across such a different allomorph has a chance to find the relevant main entry with the main headword.
Am I wrong assuming that "abstract forms" are ignored by the parser?
Also, try using environments to try and restrict forms as much as possible. This is especially important for any null forms. If these happen to only occur word initially, then add that to the environment. For example, something that is word initial and before a vowel would have an environment of
/# _ [V]
Fine, that's a good hint. I will
implement that for all affixes which occur only at the very
beginning or end of words then.
Whenever possible use allomorph forms instead of affix process rules. This is especially the case when something can only occur word initially or finally because then you can set the environment to say so. This can greatly reduce the work of the parser.
Yes, fine. On the other hand, Mike
Maxwell said that in his experience, mixing in the same entry
allomorphs with environments and allomorph processes is usually
a bad idea, which is why I switched to use only processes in
entries where at least one process was needed. Would you advise
me to not do so?
One case I tested just now is one of
the alphabetically first affixes, -ap. When I use the
affix allomorph ap in the environment /[V]_ , then I
cannot have both toap and twap to be parsed,
which is what I need (besides inconsistencies of the different
transcribers, the rules which determine whether /o/ turns to /w/
are too complex to implement in FLEX, and they refer to the
position of the word accent, which is not written in Awetí
texts). The latter has to be done by a process X [ou] =>
1 w + ap. If I put the regular affix after/below that
process, only twap parses, not toap, and if I
put it above, the reverse holds. So have returned to indicating
two processes side by side, and indeed, then both forms parse.
When you have variation (like some of the glottal initial forms), you can add a form with the same environments as the form that does not have the variation. One place to do this is with the 'tuber' entry. The conceptual intro document has this section that can help here:
3.1.4.1 Free Fluctuation
Yes, thanks, I saw that. Problem with the glottal stop is that
it can be dropped only word-initially, not between prefix and
stem. If I indicate this to be free variation, all word forms of
'house' ok would incorrectly also parse as possible
variant forms of 'ok 'tuber'.
Consider removing the null t.v. entry altogether. You can make the slot it's in optional I see that it does occur in an intransitive template that requires more derivation but couldn't you make the now optional nom slot be required for this case and leave the TematicV/Syl slot be optional?
I will try that. I have checked, there are only very few
instances where the same root can be both, of the inflectional
class e-Verb (where the -e thematic vowel is required) and a
regular verb, and in most cases they are synonymous.
Perhaps I can also do without the 0-prefix for person, making the person slot for the modal forms optional (which comes closer to my analysis anyways). It would not capture that the absolute forms like mo 'hand' only occur without a possessor while the relative unmarked for person forms like po 'hand (of)' only occur after a possessor, but this would not be such a hindrance in parsing the texts.
I will do these tests and come back to you. Thanks again.
Sebastian...
To your individual observations and suggestions:
Looking through the data, I notice that there are several affixes which have both allomorphs and variant forms where the variant forms are the same as the allomorphs. Since both the main entry (and all its allomorphs) and the variant forms are tried, this produces double work any time any of the these forms is found. Is there a reason why you need affixes to have variant forms?
Usually, the variants are marked as "abstract forms" and are, if I understood that correctly, not considered by the parser. They are there in order to appear in the dictionary, so that somebody who does not know the language and comes across such a different allomorph has a chance to find the relevant main entry with the main headword.
Am I wrong assuming that "abstract forms" are ignored by the parser?