How to parse unwritten morphemes? (Persian & Arabic-based script issue)

Erin SanGregory

unread,

Apr 24, 2025, 7:33:19 PM4/24/25

to FLEx list

Hello everyone,

I've run into an issue with parsing morphemes that aren't written. The language I'm working on sometimes uses a linking particle [-i] (called ezafa) to connect modifiers to a head noun with a noun phrase. For example:

ʃtrʊç kʊʈɐk-i mɐʐ
flour put-EZ 1SG
'my engagement' (lit., 'my flour-throwing')

But this is how it looks in my FLEx database:

The ezafa does not show up to be parsed or glossed here (or elsewhere) because it is not written in the local orthography. However, I can usually tell from the grammar when an ezafa is needed, and I can confirm from audio recordings.

In the constructions where it is used, the ezafa is essential. It is ungrammatical to leave it out. So it's important to include it in the interlinearizations in order to have an accurate record, both for posterity's sake and for grammatical analysis.

On the rare occasions when ezafa is written in Persian, it is written with ِ (kasra). I have tried adding this to the baseline to see if I could parse it out (e.g. کُټَکِ), but apparently FLEx can't parse individual Arabic diacritics as morphemes.

Has anyone encountered this type of issue before? If so, how have you resolved it? I'm open to any and all suggestions.

Thanks,
Erin

kathleen...@sil.org

unread,

Apr 24, 2025, 7:53:48 PM4/24/25

to flex...@googlegroups.com

Erin,

Several things to look at. Have you defined the dicritic as a word forming character. This is done by going into the Vernacular Writing System Properties à Character Tab.

It also needs to be defined as a phoneme with a set of features unique from all other phonemes in the language.

This is done in the grammar section. First you will need to define a new phonological feature. It may be best to give it its own feature (perhaps ezafa abbreviated ez). Then this phoneme would be the +ez and minus for all other features. (The values for all of the features can be changed in the Bulk Edit Phoneme Features.) Likewise all other phonemes in the language would be -ez. Thus making this a uniquely defined phoneme. This is the kind of strategy that has been used for hyphens in languages where hyphens are used orthographically is cases like reduplication.

Kathleen

--
"FLEx list" messages are public. Only members can post.
flex_d...@sil.org
http://groups.google.com/group/flex-list.
---
You received this message because you are subscribed to the Google Groups "FLEx list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to flex-list+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/flex-list/70f6ca66-d63f-47e4-8f16-7fea6b221d35%40sil.org.

image001.png

Kevin Warfel

unread,

Apr 24, 2025, 9:14:13 PM4/24/25

to flex...@googlegroups.com

Erin,

The verb "parse" is used in at least two different ways by different people, and I'm not sure exactly what you mean by it. Are you using one of FLEx's automated parsers (XAmple or Hermit Crab)? Or are you dividing words manually by inserting hyphens? Kathleen's response to your post indicates that she interpreted your use of "parse" to mean that you are using Hermit Crab, as her instructions are specific to that parser. Personally, I can interpret what you wrote to mean either manual parsing or parsing with the aid of one of FLEx's automated parsers.

Either way, I think that your entry for the ezafa needs to have a null allomorph. That should allow the parser to include ^0- or ∅- (whichever option you choose for representing a null morpheme) in the Word line of your interlinear, whether Hermit Crab recognizes it and puts it there or you insert it when separating the word into morphemes manually. If you *are* using Hermit Crab, then Kathleen's advice should be followed.

Best wishes,

Kevin

--

Beth-docs Bryson

unread,

Apr 25, 2025, 5:33:22 AM4/25/25

to flex...@googlegroups.com

Diacritics can definitely be separate morphemes, but there are tricks for how to work with it in the interface.

In the Lexicon Edit area, you can create a morpheme that consists of just that diacritic.

When working in the interlinear and using the cursor to move over characters, normally it would move over both a base character and diacritic in one click. If you want it to move over combining diacritics separately from base characters, there is a setting called “arrow by character” that makes this possible. The way to activate this is via a registry setting.

I believe that if you search for “arrow by character” in the FLEx helps, it will tell you how to find the file that will allow you to activate that setting.

-Beth

On Apr 24, 2025, at 8:13 PM, 'Kevin Warfel' via FLEx list <flex...@googlegroups.com> wrote:

Erin,

The verb "parse" is used in at least two different ways by different people, and I'm not sure exactly what you mean by it. Are you using one of FLEx's automated parsers (XAmple or Hermit Crab)? Or are you dividing words manually by inserting hyphens? Kathleen's response to your post indicates that she interpreted your use of "parse" to mean that you are using Hermit Crab, as her instructions are specific to that parser. Personally, I can interpret what you wrote to mean either manual parsing or parsing with the aid of one of FLEx's automated parsers.

Either way, I think that your entry for the ezafa needs to have a null allomorph. That should allow the parser to include ^0- or ∅- (whichever option you choose for representing a null morpheme) in the Word line of your interlinear, whether Hermit Crab recognizes it and puts it there or you insert it when separating the word into morphemes manually. If you *are* using Hermit Crab, then Kathleen's advice should be followed.

Best wishes,
Kevin

On Thu, Apr 24, 2025 at 7:33 PM 'Erin SanGregory' via FLEx list <flex...@googlegroups.com> wrote:

Hello everyone,
I've run into an issue with parsing morphemes that aren't written. The language I'm working on sometimes uses a linking particle [-i] (called ezafa) to connect modifiers to a head noun with a noun phrase. For example:
ʃtrʊç kʊʈɐk-i mɐʐ
flour put-EZ 1SG
'my engagement' (lit., 'my flour-throwing')
But this is how it looks in my FLEx database:

<fyDOYDjQ0heIXmd4.png>

The ezafa does not show up to be parsed or glossed here (or elsewhere) because it is not written in the local orthography. However, I can usually tell from the grammar when an ezafa is needed, and I can confirm from audio recordings.

In the constructions where it is used, the ezafa is essential. It is ungrammatical to leave it out. So it's important to include it in the interlinearizations in order to have an accurate record, both for posterity's sake and for grammatical analysis.
On the rare occasions when ezafa is written in Persian, it is written with ِ (kasra). I have tried adding this to the baseline to see if I could parse it out (e.g. کُټَکِ), but apparently FLEx can't parse individual Arabic diacritics as morphemes.
Has anyone encountered this type of issue before? If so, how have you resolved it? I'm open to any and all suggestions.
Thanks,
Erin

--
"FLEx list" messages are public. Only members can post.
flex_d...@sil.org
http://groups.google.com/group/flex-list.
---
You received this message because you are subscribed to the Google Groups "FLEx list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to flex-list+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/flex-list/70f6ca66-d63f-47e4-8f16-7fea6b221d35%40sil.org.

--
"FLEx list" messages are public. Only members can post.
flex_d...@sil.org
http://groups.google.com/group/flex-list.
---
You received this message because you are subscribed to the Google Groups "FLEx list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to flex-list+...@googlegroups.com.

To view this discussion visit https://groups.google.com/d/msgid/flex-list/CAHs8okVYhL443Raxm7Te39RK6dridqeXg5c5jWqZaCv4fX4gfw%40mail.gmail.com.

Erin SanGregory

unread,

Apr 25, 2025, 12:41:24 PM4/25/25

to flex...@googlegroups.com

Kevin,

You are right, I wasn't clear about my use of "parse." I've had some issues trying to set up the environments and templates that would be needed to use one of the automated parsers with the Arabic-based orthography. So I'm currently parsing by hand by inserting hyphens while working in the Analyze tab of of the Interlinear Texts area.

I do like the idea of adding a null allomorph so that I can indicate where the ezafa is spoken even though it's not usually written. The null character -∅ doesn't work with the Arabic-based script though, and that's what I'm using as the baseline, which means that's also what I end up parsing in the Analyze tab. Plus, I do actually want the spoken form -i to show up in the IPA morphemes line. What do you think of using the Arabic script numeral for zero (۰) to represent the allomorph? It's not technically a null, but it does give me a way to orthographically represent the morpheme. Or is there another alternative that might be better?

Thanks,
Erin

To view this discussion visit https://groups.google.com/d/msgid/flex-list/CAHs8okVYhL443Raxm7Te39RK6dridqeXg5c5jWqZaCv4fX4gfw%40mail.gmail.com.

Andreas_Joswig

unread,

Apr 28, 2025, 6:21:10 AM4/28/25

to flex...@googlegroups.com

Hi Erin,

For zero allomorphs, there is no need to use the symbol -∅. You can type -^0 as a placeholder, and FLEx will realize this to be a zero allomorph.

Andreas

To view this discussion visit https://groups.google.com/d/msgid/flex-list/dcb9cca7-30f9-4c3c-a24d-f5527988be54%40sil.org.

ron_lo...@sil.org

unread,

Apr 28, 2025, 7:49:44 AM4/28/25

to flex...@googlegroups.com

Erin,

The null character ∅ should work fine as a zero morpheme regardless of script. I often just use 0, though, since it’s easier to type.

I also have no problems using Arabic diacritics as morphemes. It gets a bit tricky to position your cursor between a letter and a diacritic, but FLEx has a shortcut for moving the cursor just one character over. That is the F7 key going RTL or the F8 key going LTR. Once you are between the letter and the diacritic you can press the – key or the spacebar to split the morpheme.

I suggest you make an entry that is the diacritic as a suffix and then add an allomorph that is 0. Then in your texts you can manually add a -0 for the places the ezafa is not written.

Ron

To view this discussion visit https://groups.google.com/d/msgid/flex-list/dcb9cca7-30f9-4c3c-a24d-f5527988be54%40sil.org.

image001.png

Reply all

Reply to author

Forward