TAALES: The Interpretation of Some Indices

Shanshan Li

unread,

Jun 7, 2023, 11:00:49 PM6/7/23

to Suite of automatic linguistic analysis tools

I read the research papers in which your team tested the predictive power of lexical proficiency by using TAALES 1.0 and TAALES 2.0. The research reported that some indices can predict lexical proficiency efficiently. So, I'm trying to use these indices to find whether there are significant differences among three group peoples in the same genre. But, I am confused by how to interpret the value of some indices.

KF_Ncats_cw: texts with higher index scores, does it mean the words used in texts are less sophisticated.
COCA_magazine_tri_2_MI: texts with higher index scores, does it mean the lexical proficiency is higher.
BNC_Written_Trigram_Freq_Normed_Log: texts with higher indexx scores, does it mean the trigrams used in the texts are more frequent and the lexical proficiency is higher.

#TAALES #Interpretations of some indices

Thanks for everything!

Scott Crossley

unread,

Jun 8, 2023, 3:02:14 PM6/8/23

to Shanshan Li, Suite of automatic linguistic analysis tools

Have you looked at the index description spreadsheet?

--
You received this message because you are subscribed to the Google Groups "Suite of automatic linguistic analysis tools" group.
To unsubscribe from this group and stop receiving emails from it, send an email to linguistic-analysi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/linguistic-analysis-tools/0c70c76f-43e2-4c69-a271-82a8e9dbb054n%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

Scott Crossley

Professor, Department of Special Education

Data Science Institute | Peabody College

Vanderbilt University

https://www.linguisticanalysistools.org/

https://learlab.org/

Shanshan Li

unread,

Jun 23, 2023, 7:03:04 AM6/23/23

to Suite of automatic linguistic analysis tools

Thank you for your reply! It indeed helps a lot. I have read the index description spreadsheet and so some problems were solved. However, I am still confused with three indices, which are WN_Mean_Accuracy, BNC_Written_Trigram_Freq_Normed_Log, and Aoe_inverse_linear_regression_slope.

For WN_Mean_Accuracy, the description in the spreadsheet is “average naming accuracy of all participants for this word” and there is no equation provided. So if the score of this index is higher, the words in the texts could elicit more accurate responses. And so the texts with a higher score of this index are less sophisticated in lexical proficiency. I’m not sure my understanding is right.
For BNC_Written_Trigram_Freq_Normed_Log, the description in the spreadsheet is “Mean frequency score” and the equation is sum logged trigram frequency score/number of trigrams in text with frequency score. For my self-built corpus, the data of this index are negative. So could I interpret this index like other indices related to n-gram frequency? If the value of this index is higher, the trigrams in the texts are more frequent.
For Aoe_inverse_linear_regression_slope, the description in the spreadsheet is “incremental Age of exposure (AOE) for words across 13 grade level using LDA modeling” and the equation is “1/slope of linear regression based on LDA cosine values”. After reading the spreadsheet, I’m still not clear on how to interpret this index. Kyle et al. (2018) published an article named The tool for the automatic analysis of lexical sophistication (TAALES): Version 2.0 and they found that aoe_inverse_linear_regression_slope “explained a small amount of the variance (0.3%) in lexical proficiency scores. The results indicate that texts including words that have lower co-occurrence patterns at later grade level tended to earn higher scores”. Firstly, does "scores" in "earn higher scores" refer to the index scores or the lexical proficiency scores? Secondly, does it mean that a text with a higher index score suggests the words in that text are exposed later, and the lexical proficiency is accordingly higher? I still feel confused with this index.

#TAALES #Interpretations of some indices

Thanks for everything!

Kyle, K., Crossley, S., & Berger, C. (2018). The tool for the automatic analysis of lexical sophistication (TAALES): Version 2.0. Behavior Research Methods, 50(3), 1030–1046. https://doi.org/10.3758/s13428-017-0924-4

Scott Crossley

unread,

Jun 23, 2023, 12:20:10 PM6/23/23

to Shanshan Li, Suite of automatic linguistic analysis tools

Quick answers below. Hope they help.

For WN_Mean_Accuracy, the description in the spreadsheet is “average naming accuracy of all participants for this word” and there is no equation provided. So if the score of this index is higher, the words in the texts could elicit more accurate responses. And so the texts with a higher score of this index are less sophisticated in lexical proficiency. I’m not sure my understanding is right.

That is correct

For BNC_Written_Trigram_Freq_Normed_Log, the description in the spreadsheet is “Mean frequency score” and the equation is sum logged trigram frequency score/number of trigrams in text with frequency score. For my self-built corpus, the data of this index are negative. So could I interpret this index like other indices related to n-gram frequency? If the value of this index is higher, the trigrams in the texts are more frequent.

That is correct

For Aoe_inverse_linear_regression_slope, the description in the spreadsheet is “incremental Age of exposure (AOE) for words across 13 grade level using LDA modeling” and the equation is “1/slope of linear regression based on LDA cosine values”. After reading the spreadsheet, I’m still not clear on how to interpret this index. Kyle et al. (2018) published an article named The tool for the automatic analysis of lexical sophistication (TAALES): Version 2.0 and they found that aoe_inverse_linear_regression_slope “explained a small amount of the variance (0.3%) in lexical proficiency scores. The results indicate that texts including words that have lower co-occurrence patterns at later grade level tended to earn higher scores”. Firstly, does "scores" in "earn higher scores" refer to the index scores or the lexical proficiency scores? Secondly, does it mean that a text with a higher index score suggests the words in that text are exposed later, and the lexical proficiency is accordingly higher? I still feel confused with this index.

That is also correct.

Scott

To view this discussion on the web visit https://groups.google.com/d/msgid/linguistic-analysis-tools/8c6ff1b9-0f51-4f43-bbd2-bda9c92d5a29n%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Shanshan Li

unread,

Jun 23, 2023, 9:53:54 PM6/23/23

to Suite of automatic linguistic analysis tools

Thank you so much! Your reply was helpful. I will try to analyze the data based on the spreadsheet and your reply.

Reply all

Reply to author

Forward