TAALES DATA INTERPRETATION

JeanMarie Farrow

unread,

Oct 11, 2024, 6:52:46 PM10/11/24

to Suite of automatic linguistic analysis tools

I am currently analyzing data on teacher-child conversations in prekindergarten classrooms within underserved communities, specifically examining linguistic features such as clausal density, lexical diversity, and lexical sophistication.

For my analysis, I have transcribed book reading conversations and processed them through the TAALES and TAALED software. I am using the subordination index (SI; SALT) to assess clausal density, MATTR for lexical diversity, and I am exploring the COCA Academic frequency measure (all words, log transformation) to capture lexical sophistication. However, I am having difficulty understanding the output of TAALES. The frequency counts surpass the total word count. In looking at the articles posted and the spreadsheet, it seems that words are ranked by their frequency, with less frequent words on the academic word list receiving a higher rank, and the sum is then divided by the total word count. Thus, higher scores would indicate higher use of academic words. Am I interpreting this process correctly?

Additionally, do you consider this measure appropriate for assessing lexical sophistication in spoken conversations with children? My preliminary findings suggest that SI, MATTR, and the COCA frequency measure capture distinct dimensions, each potentially influencing children's vocabulary growth in different ways.

Thank you very much for any feedback!

JeanMarie

JeanMarie Farrow

unread,

Oct 12, 2024, 5:17:15 PM10/12/24

to Suite of automatic linguistic analysis tools

Thank you so much--very interesting!

I used the COCA_spoken_Frequency_Log_AW and the SUBTLEXus_Freq_AW_Log as you suggested. It seems these measures of teachers' talk do not predict children's vocab, but lexical diversity and complex syntax (measured by the subordination index) do (controlled for other variables, including total amount of words). There's also a negative correlation between teachers MATTR and each of the lexical sophistication measures. Complex syntax (SI) and lexical diversity (MATTR) are positively correlated.

Could you just clarify the lexical sophistication measures:

DO

Higher scores = more frequent words used in the corpus, which means that higher scores suggest less lexical sophistication? (I am seeing a negative correlation between both the Coca and subtlexus spoken corpus scores and teachers' MATTR scores)

OR

Higher scores = higher frequency of sophisticated words (greater lexical sophistication in teachers' convos with children).

I appreciate the clarification.

Scott Crossley

unread,

Oct 15, 2024, 3:56:11 PM10/15/24

to JeanMarie Farrow, Suite of automatic linguistic analysis tools

Hi JeanMarie,

Thanks for using our tools.

The raw frequency counts will surpass the counts in most texts because they are based on counts from an external corpus. I would suggest using logged normed scores instead. And, yes, higher scores mean more frequent words. I would also not suggest using an academic corpus for this analysis, but rather a spoken corpus (COCA spoken or subtlexus).

Good luck,

Scott

--
You received this message because you are subscribed to the Google Groups "Suite of automatic linguistic analysis tools" group.
To unsubscribe from this group and stop receiving emails from it, send an email to linguistic-analysi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/linguistic-analysis-tools/3473bd2c-f86d-4c7a-962f-bd02b94cbd41n%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

Scott Crossley

Professor, Department of Special Education

Data Science Institute | Peabody College

Vanderbilt University

https://www.linguisticanalysistools.org/

https://learlab.org/

JeanMarie Farrow

unread,

Oct 15, 2024, 3:56:18 PM10/15/24

to Scott Crossley, Suite of automatic linguistic analysis tools

Thank you so much--very interesting!

I used the COCA_spoken_Frequency_Log_AW and the SUBTLEXus_Freq_AW_Log as you suggested. It seems these measures of teachers' talk do not predict children's vocab, but lexical diversity and complex syntax (measured by the subordination index) do (controlled for other variables, including total amount of words). There's also a negative correlation between teachers MATTR and each of the lexical sophistication measures. Complex syntax (SI) and lexical diversity (MATTR) are positively correlated.

Could you just clarify the lexical sophistication measures:

DO

Higher scores = more frequent words used in the corpus, which means that higher scores suggest less lexical sophistication? (I am seeing a negative correlation between both the Coca and subtlexus spoken corpus scores and teachers' MATTR scores)

OR

Higher scores = higher frequency of sophisticated words (greater lexical sophistication in teachers' convos with children).

I appreciate the clarification.

Scott Crossley

unread,

Oct 15, 2024, 3:56:23 PM10/15/24

to JeanMarie Farrow, Suite of automatic linguistic analysis tools

I used the COCA_spoken_Frequency_Log_AW and the SUBTLEXus_Freq_AW_Log as you suggested. It seems these measures of teachers' talk do not predict children's vocab, but lexical diversity and complex syntax (measured by the subordination index) do (controlled for other variables, including total amount of words). There's also a negative correlation between teachers MATTR and each of the lexical sophistication measures. Complex syntax (SI) and lexical diversity (MATTR) are positively correlated.

There may be an interesting relationship with Academic COCA depending on research question, so don't give up on it.

Could you just clarify the lexical sophistication measures:
DO
Higher scores = more frequent words used in the corpus, which means that higher scores suggest less lexical sophistication? (I am seeing a negative correlation between both the Coca and subtlexus spoken corpus scores and teachers' MATTR scores)
OR
Higher scores = higher frequency of sophisticated words (greater lexical sophistication in teachers' convos with children).

This is the answer:

Higher scores = more frequent words used in the corpus, which means that higher scores suggest less lexical sophistication? (I am seeing a negative correlation between both the Coca and subtlexus spoken corpus scores and teachers' MATTR scores)

I appreciate the clarification.

Reply all

Reply to author

Forward