Running FREQ for bilingual transcripts

Lulu

unread,

Jun 28, 2016, 5:10:46 PM6/28/16

to chibolts

Hi Brian and team members,

I ran the freq command

freq +tTCH +s"[- zho]" *.cha

for transcripts that contain bilingual utterances (e.g., *TCH: [- zho] this@s$n 星). The dominant language of the transcripts was English so we marked utterances that contained Chinese with [- zho]. The output types and tokens included all the English words that were marked @s. I thought I would get the types and tokens of all the Chinese words by running the above command. Is the problem with the transcript or the command?

Thank you!

Lulu

Brian MacWhinney

unread,

Jun 28, 2016, 10:15:26 PM6/28/16

to chib...@googlegroups.com

Dear Lulu,

Without seeing your transcripts, I can’t say exactly what is wrong. However, if you run this similar command on the CharlotteEng folder in the YipMatthews corpus, you get good results:

freq +t*CHI +s"[- yue]" *.cha

The idea is that this will include all words on the [- yue] lines including those with @s, although the latter are pretty rare. If you want to exclude those, just add –s”*@s”

-- Brian MacWhinney

--
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+u...@googlegroups.com.
To post to this group, send email to chib...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/0e5d867c-79b1-4d36-87be-1303f390a83b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Lulu

unread,

Jun 28, 2016, 10:22:40 PM6/28/16

to chibolts

Dear Brian,

That just did magic! Thank you so much!

Best,
Lulu

Lulu

unread,

Jun 28, 2016, 10:34:19 PM6/28/16

to chibolts

Hi Brian,

I tried to run the reverse command on the same transcript (mostly English with a dozen words in Chinese)

freq +tTCH -s"[- zho]" +s”*@s” *.cha (I added * after @s because my transcript also tags if the @s word is a noun or a verb)

hoping to add the few @s English words embedded in [- zho] lines to the English word counts, but only got 0's. With +s"*@s*" removed, I get good results which don't include the @s English words. Not sure how I can fix this.

Thanks!

Lulu

Brian MacWhinney

unread,

Jun 29, 2016, 2:41:46 PM6/29/16

to chib...@googlegroups.com

Dear Lulu,

I think you want +s:[- zho]” in this case, not –s”[- zho]” When I run

freq +s"[- yue]" +t*CHI *.cha +u

on CharlotteEng, I get both the English words marked as @s and the Cantonese.

--Brian

To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/56bb5824-2983-4e64-9111-42841037333f%40googlegroups.com.

Lulu

unread,

Jun 29, 2016, 4:56:36 PM6/29/16

to chibolts

Dear Brian,

But I'd like to get separate counts for English and Chinese words. Let me rephrase my question. In a predominantly English transcript, I'd like to get a count of ALL English words, including the ones embedded in [- zho] lines marked with @s. I can now achieve this by running two separate commands (TCH is teacher):

freq +tTCH -s"[- zho-yue]" -s"[- zho]" *.cha
freq +tTCH +s*@s* *.cha

There are two issues with the 2-command solution:
1. I get two sets of counts that need to be summed manually.
2. The same word with and without @s are counted as two types.

I was wondering if there's a way to combine these two commands and resolve these two issues (or at least one). Below is an excerpt of my transcript (TCH is teacher) (for IRB reasons, I cannot provide the whole transcript):

*TCH: you can take this heart .
*TCH: [- zho] this@s$n 星 .

Thank you so much for your patience and kindness.

Lulu

Brian MacWhinney

unread,

Jun 29, 2016, 5:19:55 PM6/29/16

to chib...@googlegroups.com

Dear Lulu,

In that case, you can add the +l switch which marks each word overtly for its language. The command that would work for CharlotteEng is:

freq +l +s*@s:eng *.cha +u +f +t*CHI

To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/948766f2-00d7-4ce1-a775-af37ae759393%40googlegroups.com.

Leonid Spektor

unread,

Jun 29, 2016, 5:41:36 PM6/29/16

to chib...@googlegroups.com

Lulu,

The +l option has a bug that will fail for some of your data words. The problem is with words that have $ character as in your example word "this@s$n". I have fixed this bug and I will install new CLAN on childes web site later today.You will need to get and install new CLAN on your computer. The +s option would be "+s*@s:eng*" in your case. If you want to see what CLAN sees when +l option is used, then try "kwal +d +l *.cha" command.

Leonid.

To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/C39C8F89-C619-4FDA-BAFD-3E141968A709%40cmu.edu.

Lulu

unread,

Jun 30, 2016, 1:29:43 AM6/30/16

to chibolts

Hi Brian and Leonid,

Thanks so much for your responses. They were very helpful. I also tried the FREQMERG command to add the English words embedded in Chinese utterances to the total English words list for a transcript. However, I noticed that FREQMERG only produced a frequency list without the total types/tokens or the type/token ratios that are produced by FREQ. Is there a way to get FREQMERG to produce the totals?

Thanks!

Lulu

Leonid Spektor

unread,

Jun 30, 2016, 10:56:32 AM6/30/16

to chib...@googlegroups.com

Lulu,

FREQMERG is not meant to produce totals. It is just a very simplified subset version of FREQ command. You can get the same effect with FREQ command:

freq +u +o3 *.cha

Leonid.

To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/d992b8d8-98b4-46a5-9bf0-37c9592a2ee6%40googlegroups.com.

Lulu

unread,

Jul 1, 2016, 12:39:45 AM7/1/16

to chibolts

Leonid,

Thanks so much for the reply!

Some questions from a programmer who is helping me:
Is there a way to make BATCH stop on the first error it encounters?
Is there a way to run CLAN commands from cmd.exe / from Python without the GUI?

Lulu

Reply all

Reply to author

Forward