VOCD manual

Amber

unread,

Aug 19, 2015, 10:36:05 AM8/19/15

to chibolts

Dear all

The CLAN manual contains the following link, which no longer appears to work.

http://childes.psy.cmu.edu/manuals/vocd.doc

Is there a new link to the VOCD manual/another similar article?

Many thanks
Amber

Amber

unread,

Aug 19, 2015, 10:37:03 AM8/19/15

to chibolts

VOCD article I mean, not manual...

Amber

unread,

Aug 19, 2015, 10:46:58 AM8/19/15

to chibolts

In fact, perhaps I should just post my question on here. The CHAT manual talks about calculating D and states that it can be done on samples as small as 50 tokens. However, reliability is less good for small samples. Could anyone advise me on whether VOCD is still a more valid measure than TTR, even for small transcripts (e.g., during a shared-book reading session)? I assume that as long as the sample is large enough for KIDEVAL to be able to give me VOCD values then it is OK?
Thanks
Amber

On Wednesday, 19 August 2015 15:36:05 UTC+1, Amber wrote:

Gaby Silva

unread,

Aug 19, 2015, 11:32:18 AM8/19/15

to chib...@googlegroups.com

Dear Amber,

A study by McCarthy & Jarvis (2007) found that VOCD performs reasonably well in the 100 to 400 tokens (correlation coefficient with tokens, r=.22). I haven't checked that source in a while, but perhaps it has more information about smaller samples.

The source is:

McCarthy, P. M., & Jarvis, S. (2007). Vocd: A theoretical and empirical evaluation. Language Testing, 24(4), 459-488.

Good luck!

Gaby Silva

Dra. Gabriela Silva Maceda

Profesor-Investigador

Facultad de Psicología, UASLP.

Carretera Central Km. 424.5

San Luis Potosí, S.L.P., 78494

México

Tel. +52 (444)832-1000 Ext. 9358

Date: Wed, 19 Aug 2015 07:46:58 -0700
From: ambal...@gmail.com
To: chib...@googlegroups.com
Subject: Re: VOCD manual

--
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+u...@googlegroups.com.
To post to this group, send email to chib...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/322d0805-4490-4ed5-8514-7048385d997f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Amber

unread,

Aug 19, 2015, 12:04:40 PM8/19/15

to chibolts

Dear Gaby

Thanks very much for the reference - I'll check it out!

Best wishes
Amber

Brian MacWhinney

unread,

Aug 19, 2015, 12:22:53 PM8/19/15

to chib...@googlegroups.com, Amber

Dear Amber,

Thanks for noting that old link in the manual. I have removed it and replaced it with a pointer to

Malvern, D., Richards, B., Chipere, N., & Purán, P. (2004). Lexical diversity and language development. New York: Palgrave Macmillan.

In regards to sample size, VOCD requires 100 utterances. For smaller samples and more generally, you may wish to consider using MATTR, as described in my previous ChiBolts message on this topic. There I gave a reference to

Covington, M. A., & McFall, J. D. (2010). Cutting the Gordian knot: The moving-average type–token ratio (MATTR). Journal of Quantitative Linguistics, 17(2), 94-100.

and you may wish to read the recent comparison of VOCD, TTR, and MATTR

Fergadiotis, G., Wright, H., & Green, S. (2015). Psychometric evaluation of lexical diversity indices: Assessing length effects. Journal of Speech, Language, and Hearing Research, 58, 840.

My take-home from this is that people should only use TTR when comparing across samples of the same length and even then, VOCD or MATTR would be better. In general, researchers should prefer MATTR to VOCD. In CLAN, you run MATTR using this option in FREQ

+bN This option calculates the lexical diversity of a sample using the Moving Average Type-Token Ratio (MATTR). This index is based on a moving window that computes TTRs for each successive window of fixed length (N). Initially, a window length is selected (e.g., 10 words) and the TTR for words 1- 10 is estimated. Then, the TTR is estimated for words 2-11, then 3-12, and so on to the end of the text. For the final score, the estimated TTRs are averaged.

—Brian MacWhinney

Amber

unread,

Aug 19, 2015, 1:14:38 PM8/19/15

to chibolts, ambal...@gmail.com

Dear Brian

Thank you. When I have gone to run the +bN command, it is not working. I am thinking that I need to download a newer version of CLAN (I can see from previous posts that MATTR has only recently been implemented). However, I want to avoid having to re-MOR and check all my transcripts to ensure their compatibility with the newer version of the software. Is there anyway to update my current CLAN software (without downloading the newer version) so that will allow me to run +bN?

Best wishes
Amber

Leonid Spektor

unread,

Aug 19, 2015, 1:56:02 PM8/19/15

to chib...@googlegroups.com

Amber,

CHECK does not look at %mor tier. KIDEVAL will check word's one to one correspondence between %mor tier and main speaker tier, But, if you have already ran KIDEVAL with your current version of CLAN without any errors, then the latest version of CLAN will not find errors either. You do not need to re-MOR your data to use MATTR analyses, because FREQ is only looking at main speaker tiers and is ignoring %mor tiers. But, if you want to get more accurate results from KIDEVAL, for example, then you should get new MOR grammar and re-MOR your data. In fact you should do that even if you do not update your CLAN, because we constantly improve MOR grammar and this will give you more accurate results.

Leonid.

--
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+u...@googlegroups.com.
To post to this group, send email to chib...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/2bef66d3-ac3e-468b-b15a-f3624b731eb0%40googlegroups.com.

Amber

unread,

Aug 19, 2015, 2:08:15 PM8/19/15

to chibolts

Thanks Leonid! Going to download a newer version.

Best
Amber

Amber

unread,

Aug 19, 2015, 5:22:38 PM8/19/15

to chibolts

I've done it and it is working - thank you. One more question - I wondered if it is possible to have the MATTR output formatted like KIDEVAL is, in a spreadsheet with each transcript listed in a row and the MATTR in the column next to it. This would make it so much easier to copy + paste into my stats software.

Thanks,
Amber

spektor

unread,

Aug 19, 2015, 9:37:54 PM8/19/15

to chib...@googlegroups.com

Yes, it is possible. Just add +d3 option to FREQ command line.

Leonid.

To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/d9633c76-3ba0-476d-9980-6fdd1a93ec79%40googlegroups.com.

Amanda Owen Van Horne

unread,

Aug 20, 2015, 9:31:03 AM8/20/15

to chibolts

Also see

Owen, A. J., & Leonard, L. B. (2002). Lexical Diversity in the Spontaneous Speech of Children With Specific Language ImpairmentApplication of D.Journal of Speech, Language, and Hearing Research, 45(5), 927-937.

Bottom line is that VOCD is better than TTR, but less transparent. It does still vary with sample size, but to a lesser degree.

McCarthy, P. M., & Jarvis, S. (2010). MTLD, vocd-D, and HD-D: A validation study of sophisticated approaches to lexical diversity assessment. Behavior research methods, 42(2), 381-392.

Compares various methods of computing D. Recommends a variety of measures.

Koizumi, R. (2012). Relationships between text length and lexical diversity measures: can we use short texts of less than 100 tokens. Vocabulary Learning and Instruction, 1(1), 60-69.

Recommends MLTD

Brian MacWhinney

unread,

Aug 20, 2015, 11:19:23 AM8/20/15

to ChiBolts

Dear ChiBolts,

Comparing across these various studies and their new analyses, Fergadiotis et al. support the conclusion from the earlier study by Owen and Leonard regarding the fact that VOCD is still sensitive to sample size, although less that TTR. However, both MATTR and MLTD seem to avoid this problem to the extent that they do not correlate with TTR, but rather with each other. Given that both MATTR and MLTD are relatively sample-size-independent measures, one may want to turn to other criteria to decide which to use. The advantage of MATTR is that its computation and interpretation is so transparent. The advantage of MTLD is that it has been used more widely in the recent literature (outside of child language). We have implemented MATTR in CLAN. Whether or not we should implement MTLD is unclear.

Best,

—Brian MacWhinney

--
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+u...@googlegroups.com.
To post to this group, send email to chib...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/ae7fc0ce-464b-4792-83ce-15aa689d722d%40googlegroups.com.

Amber

unread,

Aug 24, 2015, 5:05:10 PM8/24/15

to chibolts

Great, thank you v. much.

Reply all

Reply to author

Forward

Message has been deleted