Counting single-word utterances & dividing transcripts

21 views
Skip to first unread message

ran...@g.harvard.edu

unread,
Dec 30, 2020, 2:41:43 PM12/30/20
to chibolts
Hello all,

Hope you're enjoying the holiday season! I have two quick questions and would really appreciate your kind help.

1) I have a batch of coded transcripts. In each transcript, some utterances were coded as A (using a dependent tier %cat: $A), and the remaining ones did not receive that code (i.e. no %cat tier). Is there a way to run VOCD/MLU on the A utterances and no-code utterances separately? Essentially I'm hoping to divide up the transcripts based on the code A.

2) Is there a convenient way to count the number of single-word utterances (e.g., "Sit.") or multi-word utterances in a transcript? All transcripts are in English.

Thank you and stay healthy!

Best,
Ran


Leonid Spektor

unread,
Dec 30, 2020, 5:28:24 PM12/30/20
to chib...@googlegroups.com
Hi Ran,

Happy holidays.

1). You need to breakup your data files into sets with $A and without $A file. To create the $A file set run command:

kwal +d +o@ +t% +s"$A" +f$A filenames

Then run any other command on the $A files created with above command: "*.$A.cex"


Next, to create the no $A file set run command:

kwal +d +o@ +t% -s"$A" +fno$A filenames

Now run any other command on the no $A files created with above command: "*.no$A.cex"


2). Lots of commands take +x option. The +x=0w counts utterances with 0 words, +x=1w counts utterances with just one word and +x>1w counts utterances with more than one word. For example, you can use these options with mlt +x>1w filenames command.



Leonid.

--
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/5757b052-295a-4ea9-86ff-61c4767a33c6n%40googlegroups.com.

Reply all
Reply to author
Forward
0 new messages