Wordcounts

16 views
Skip to first unread message

Pleun Van Der Werf

unread,
Mar 20, 2024, 4:17:32 PMMar 20
to chibolts
hi, I am using EVAL in the Dutch language for the analyses. However, CLAN shows that it only recognizes 700 words out of 1400 (while these are existing words in Dutch). How is this possible? and is the analysis still reliable? Can somebody help me out

Pleun

Brian Macwhinney

unread,
Mar 20, 2024, 7:37:32 PMMar 20
to ChiBolts, Pleun Van Der Werf
Pleun,
Could you please send me the files you are trying to analyze, so I can see what is happening and perhaps replicate.

— Brian MacWhinney
Teresa Heinz Professor of Cognitive Psychology,
Language Technologies and Modern Languages, CMU

> On Mar 20, 2024, at 4:17 PM, Pleun Van Der Werf <pleun...@gmail.com> wrote:
>
> hi, I am using EVAL in the Dutch language for the analyses. However, CLAN shows that it only recognizes 700 words out of 1400 (while these are existing words in Dutch). How is this possible? and is the analysis still reliable? Can somebody help me out
>
> Pleun
>
> --
> You received this message because you are subscribed to the Google Groups "chibolts" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+u...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/175fa520-f8ae-4cde-8dfe-f20dcd0a20d2n%40googlegroups.com.

Leonid Spektor

unread,
Mar 20, 2024, 8:21:23 PMMar 20
to chib...@googlegroups.com, Pleun Van Der Werf
Hi,

EVAL does not count words. In the EVAL output you can see MLU_Words, which is mlu words count divided by mlu utterances count from %mor tier. MLU does not count every word, you can look at https://talkbank.org/manuals/CLAN.pdf manual chapter "7.19 MLU" to see the rules that are used to select which words and utterances will be counted. EVAL also has FREQ_tokens count from %mor tier. So, if your %mor tier is not accurate, then the results are not going to be accurate. EVAL also excludes whole utterances and all words on those utterances if they have "[+ exc]" post-code on them. EVAL also only looks at particular gem in the data file, if you have selected to analyze only specific gem.

As Brian asked please send us the file you are referring to in your email and please include the command line you were using.


Leonid.

Reply all
Reply to author
Forward
0 new messages