non-congruence between number of text analyses and search concordance

8 views
Skip to first unread message

Françoise Rose

unread,
Nov 17, 2025, 2:09:17 AM (8 days ago) Nov 17
to flex...@googlegroups.com

Dear all,

Following a previous thread on how to count tokens of morphemes in texts, I’ve come up with a mystery. Could anyone help ? The specific questions are at the end of this mail, after the different numbers are explained.

 

I am taking the example of a morpheme with 5 senses, 3 allomorphs and 2 variants (for reasons internal to the grammar and phonology of Mojeño Trinitario, this is common in that language).

 

The results of “number of text analyses (entry)” and “number of text analyses (sense)” are given below, with a total of 135 for the last column). The first column does not include the 9 occurrences of variants, so these two numbers match (126 + 9 = 135)

 

cid:image001.png@01DC4F24.B143A690

 

The results of the Search Concordance do not correspond neatly. It gives:

Lexeme form: 16 (includes neither allomorphs or variants)

 

Variant 1: 5

Variant 2: 4

= 9 variants

 

 Allomorph 1: 80

Allomorph 2: 31

Allomorph 3: 1

= 112 allomorphs

 

Lexeme form + variants + allomorphs = 137

 

Sense 1: 114

Sense 2: 9

Sense 3: 11

Sense 4: 3

Sense 5: 0          

Total: 137

 

Why do the total hits of search concordance, i.e. 137, do not match those of “number in text analyses” (135) ?

 

Françoise

Andreas_Joswig

unread,
Nov 17, 2025, 7:13:32 AM (7 days ago) Nov 17
to flex...@googlegroups.com
My first hunch is that the search concordance counts all instances in the corpus, whereas the text analyses only count different analyses and therefore discount all corpus instances where the same analysis is used for a second (or beyond) time.
Andreas

On 11/17/2025 10:09 AM, 'Françoise Rose' via FLEx list wrote:

Dear all,

Following a previous thread on how to count tokens of morphemes in texts, I’ve come up with a mystery. Could anyone help ? The specific questions are at the end of this mail, after the different numbers are explained.

 

I am taking the example of a morpheme with 5 senses, 3 allomorphs and 2 variants (for reasons internal to the grammar and phonology of Mojeño Trinitario, this is common in that language).

 

The results of “number of text analyses (entry)” and “number of text analyses (sense)” are given below, with a total of 135 for the last column). The first column does not include the 9 occurrences of variants, so these two numbers match (126 + 9 = 135)

 

mailbox:///C:/Users/andre/OneDrive/Dokumente/e-mail/Local%20Folders/Inbox?number=30259&header=quotebody&part=1.2&filename=image001.png

 

The results of the Search Concordance do not correspond neatly. It gives:

Lexeme form: 16 (includes neither allomorphs or variants)

 

Variant 1: 5

Variant 2: 4

= 9 variants

 

 Allomorph 1: 80

Allomorph 2: 31

Allomorph 3: 1

= 112 allomorphs

 

Lexeme form + variants + allomorphs = 137

 

Sense 1: 114

Sense 2: 9

Sense 3: 11

Sense 4: 3

Sense 5: 0          

Total: 137

 

Why do the total hits of search concordance, i.e. 137, do not match those of “number in text analyses” (135) ?

 

Françoise

--
"FLEx list" messages are public. Only members can post.
flex_d...@sil.org
http://groups.google.com/group/flex-list.
---
You received this message because you are subscribed to the Google Groups "FLEx list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to flex-list+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/flex-list/0c562c5cc4b144979a5313f61439486b%40univ-lyon2.fr.


Natalia

unread,
Nov 18, 2025, 9:26:39 AM (6 days ago) Nov 18
to FLEx list
Dear Françoise,

My hunch is  that this difference is due to the problem Kevin pointed to in another thread yesterday: one of the variant forms is probably linked to more than two entries. It could even be your head form that is linked as a variant of another entry. Thus, when you count by form instead of by sense, the two examples of the form which are unrelated to this entry but which got linked in the process of adding variants (because of the wording on the button Kevin mentioned).

One way I would proceed to find the two extra examples that are linked to this entry when you search by form:
1. Go to the Lexicon module in Lexicon edit view
2. Show the column "Variant of" as the first column
3a. Check out the line of the two variant forms of your entry: only one entry should be listed in the "Variant of" column
3b. Check out the line of the main form of your entry (let's call it main_entry1): there should be no form listed in the "Variant of" column

If in the step 3 you found eiher two entries linked to a variant or another main entry linked to your main_entry1, click on that "extra" linked entry. There:
4. Go to the variant list of the extra main entry (main_entry2)
5. Copy the form of the variant that led you there (let's call it homophone_variant1) and click on "insert variant" and paste the form but instead of clicking the "add variant" button, click on "create" which should create a new homophonous variant entry (homophone_variant2).

Before you unlink the homophone_variant1, change the analysis of the wordforms linked to main_entry2 that contain homophone_variant1 so that they now contain  homophone_variant2. At that moment, when you go to your main_entry1 and redo your counts, you should not have 2 extra examples when you seach by form.

I hope this is helpful!

Best,

Natalia

Craig

unread,
Nov 20, 2025, 5:09:49 AM (4 days ago) Nov 20
to flex...@googlegroups.com
It is a long-standing issue that there is a discrepancy in the occurrence numbers when the same word appears more than once in a sentence. See this Jira issue for an example. It would better to count all the occurrences, but that hasn't made it onto a priority list for fixing.

Craig.
Reply all
Reply to author
Forward
0 new messages