Hello again,
Some of my students pointed out today that they are getting different MLU results when they run it within the browser versus in CLAN. The effect seems to be widespread--not just one corpus. They noticed discrepancies with the Tardif corpus at first but then found more.
Taking Eve (Brown corpus) file 020000a in the browser as an example, the command
mlu +t*CHI 020000a.cha yields:
From file <childes/Eng-NA/Brown/Eve/020000a.cha>
MLU for Speaker: *CHI:
MLU (xxx, yyy and www are EXCLUDED from the utterance and morpheme counts):
Number of: utterances = 424, morphemes = 3687
Ratio of morphemes over utterances = 8.696
Standard deviation = 5.953
That can't be correct.
In downloaded transcripts using CLAN, the same command yields:
From file <C:\talkbank\clan\Brown\Eve\020000a.cha>
MLU for Speaker: *CHI:
MLU (xxx, yyy and www are EXCLUDED from the utterance and morpheme counts):
Number of: utterances = 424, morphemes = 1468
Ratio of morphemes over utterances = 3.462
Standard deviation = 1.975
Any advice would be appreciated.
Thanks,
Jenny