Hello,
I am attempting to get MATTR values from the MOR line of transcripts in which we have coded speech to be ignored using < > [% g] as in the following:
*151: <vale> [% g] .
*151: esta es un [//] una historia acerca de dos hermanos, Gustavo y
Jorge [^c] .
*151: <&um Jorge es el hermano mayor &eh quien se traslado a otra ciudad
en el año dos mil porque empezó su carrera universitaria> [% g] .
*151: &ehm cuando salió Jorge [^c] Gustavo se sentía muy solo [^c] porque
antes ju(gaba) [/] jugaba siempre con Jorge [^c] .
I can use this command freq @ +t*1* +t%mor +b10 +sm;*,o% -sm|neo +d3 which outputs MATTR but realized it includes the < > [% g] coded text.
I tried using the switch -s"<% g>" which works with a simple FREQ command as follows but received the same Type/Token/MATTR values as when I ran the above command.
freq @ +t*1* -s"<% g>" +t%mor +b10 +sm;*,o% -sm|neo +d3
I also tried using the -s"<% g>" switch during the MOR step (mor -s"<% g>"+t*1* @) but received a message to only use language codes with the -s option.
Is there a way to ignore the < > [% g] coded text when running MOR? Or if not, is there a way to ignore the < > [% g] coded text when calculating MATTR with the FREQ command?
Thank you for your help!
Amanda
Leonid and Brian,
Thank you very much! Because we exclude both partial and
whole utterances, the [e] will work best for us. I was able to batch update [% g] to
[e] easily with chstring +w +cCODEchange.cut, and now the MATTR results only
include the language we’re interested in.
Much appreciated!!
Amanda
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/17A20A92-41C4-483C-BC3A-F714564DE6F1%40andrew.cmu.edu.