Exclude marked text when getting MATTR from MOR

4 views
Skip to first unread message

Amanda Huensch

unread,
Dec 7, 2022, 10:32:31 AM12/7/22
to chibolts

Hello,

I am attempting to get MATTR values from the MOR line of transcripts in which we have coded speech to be ignored using < > [% g] as in the following:

*151:     <vale> [% g] .

*151:     esta es un [//] una historia acerca de dos hermanos, Gustavo y

                Jorge [^c] .

*151:     <&um Jorge es el hermano mayor &eh quien se traslado a otra ciudad

                en el año dos mil porque empezó su carrera universitaria> [% g] .

*151:     &ehm cuando salió Jorge [^c] Gustavo se sentía muy solo [^c] porque

                antes ju(gaba) [/] jugaba siempre con Jorge [^c] .

 

I can use this command freq @ +t*1* +t%mor +b10 +sm;*,o% -sm|neo +d3 which outputs MATTR but realized it includes the < > [% g] coded text.

I tried using the switch -s"<% g>" which works with a simple FREQ command as follows but received the same Type/Token/MATTR values as when I ran the above command.

freq @ +t*1* -s"<% g>" +t%mor +b10 +sm;*,o% -sm|neo +d3

I also tried using the -s"<% g>" switch during the MOR step (mor -s"<% g>"+t*1* @) but received a message to only use language codes with the -s option.

Is there a way to ignore the < > [% g] coded text when running MOR? Or if not, is there a way to ignore the < > [% g] coded text when calculating MATTR with the FREQ command?

Thank you for your help!

Amanda

Brian Macwhinney

unread,
Dec 7, 2022, 12:56:27 PM12/7/22
to ChiBolts, Amanda Huensch, Leonid Spektor
Amanda,

I’ll let Leonid answer this in detail, but I would say that, if your goal is to systematically exclude certain utterances, the usual method is to add the [+ exc] postcode and then the -s switch. Leonid will probably give a more adequate answer.

— Brian MacWhinney
Teresa Heinz Professor of Cognitive Psychology,
Language Technologies and Modern Languages, CMU
> --
> You received this message because you are subscribed to the Google Groups "chibolts" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+u...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/969d97ab-b817-4228-852c-1e3906a123f4n%40googlegroups.com.

Leonid Spektor

unread,
Dec 7, 2022, 1:11:34 PM12/7/22
to chib...@googlegroups.com
Amanda,

The < > [% g] does not effect the creation of %mor tier. I would suggest you replace it with  < > [e]. This will prevent words surrounded by [e] from being placed on %mor tier and your FREQ MATTR command will work the way you want. Of course, this will create other problems for other analyses if words within [e] need to be analyzed by MLU or KIDEVAL or other commands. You might need to have two copies of your data. One for MATTR analyses and other data without [e] for other analyses.


Leonid.

Leonid Spektor

unread,
Dec 7, 2022, 1:44:20 PM12/7/22
to chib...@googlegroups.com
Amanda,

One more suggestion is if you always want to exclude only whole utterances from MATTR computation, then Brians suggestion of adding [+ exc] post-code to end of utterance will do the job. For MATTR computation you would use -s"[+ exc]" option to tell FREQ to exclude all utterances with "[+ exc]" post-code. This way you need to have just one copy of the data. If you want to exclude just specific word(s) on utterances from MATTR, then "[+ exc]" post-code will not work.


Leonid.

Amanda Huensch

unread,
Dec 7, 2022, 3:52:28 PM12/7/22
to chib...@googlegroups.com

Leonid and Brian,

Thank you very much! Because we exclude both partial and whole utterances, the [e] will work best for us. I was able to batch update [% g] to [e] easily with chstring +w +cCODEchange.cut, and now the MATTR results only include the language we’re interested in.

Much appreciated!!

Amanda


Reply all
Reply to author
Forward
0 new messages