bilingual

Ying

unread,

Dec 30, 2013, 6:46:16 PM12/30/13

to chib...@googlegroups.com

Dear Leonid,

I want to get MLU, the number of different words, and also TTR from some Mandarin(Putonghua)-English bilingual narrative data. Also I added some word and utterance level codes and want to summarize the codes. For example, for the following sample (I am attaching the transcript after running the commands),

*CHI: [- zho] 我去了一个 <一个> [/] park@s yesterday@s. [+ CS]
*CHI: It is a very big one.
*EXA: Nice.
*CHI: My mom say [* tense] “We will come from time to time”. [+ GE]
Note:
The precode [- zho] is for Mandarin/Putonghua, as [- yue] is for Cantonese
[+ CS] is an utterance level code, indicating code-switched sentences
[+ GE] is an utterance level code, indicating sentences with grammatical errors
[* tense] is a word level code, indicating a tense error

Here are the commands I used:
mor +s"[- zho]" sample_English.cha +1
post sample_English.cha +1
mor -s"[- zho]" sample_English.cha +1
post sample_English.cha +1
Esc_L
freq +s"[% *]" *.cha

Questions I have:
(1) May I ignore an error and move to the next one when I run CHECK?
(2) for code-switched words within an utterance, I don't care for mor info such as noun or verb. But I do want to calculate MLU and TTR. If I go with park@s but don't bother to make park@s$n, will CLAN give me the correct results?
(3) I can get codes [zho], [CS], and [GE] calculated using FREQ, but not [* tense]. How may I count the occurance of [* tense]. Moreover, can I know whether it is the same verb (e.g., say) coded [* tense]?

Thank you very much!
Happy New Year!

Sincerely,
Ying

sample_English.cha

Leonid Spektor

unread,

Dec 31, 2013, 7:17:36 AM12/31/13

to ChiBolts

Dear Ying,

First I would suggest that you use +/-s" [- zho]" option with both MOR and POST commands:

mor +s"[- zho]" sample_English.cha +1

post +s"[- zho]" sample_English.cha +1

mor -s"[- zho]" sample_English.cha +1

post -s"[- zho]" sample_English.cha +1

Here are answers to your questions:

1. You can fix errors in any order you like, as long as in the end CHECK reports no errors found. If you are using ESC-L CHECK, then you will not have a choice of ignoring the first error found, because ESC-L CHECK always starts from the top of the file. It has no continue from current location option.

2. if you use "park@s" instead of "park@s$n", then MLU will give correct result.

The TTR results could be wrong depending on your FREQ command. If you only run FREQ with +s"[- zho]" or

-s"[- zho]" options, then result will be correct with either "park@s" or "park@s$n" choice. If you run FREQ without

+/-s"[- zho]" options, then you will force FREQ to compare words, for example, "park@s" and "park", which are not same, and it will inflate the TTR result. If you use "park@s$n" choice and run FREQ on %mor tier, i.e use "+t%mor -t*" options, then result will be more accurate.

3. Your sample file had "[*tense]" instead of "[* tense]" code, notice missing space character. Maybe that was the cause for failure to find "[* tense]" code. I have added the space character and got correct result searching for "[* tense]" code with this command:

freq +s"[* tense]" sample_English.cha

If you want to count the actual words associated with code "[* tense]", then use this command:

freq +s"<* tense>" sample_English.cha

I hope this helps and Happy New Year!

Leonid.

--
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+u...@googlegroups.com.
To post to this group, send email to chib...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/6de36ec6-5c23-41d7-a0b5-06f5b35a5ce2%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
<sample_English.cha>

Ying Lu

unread,

Dec 31, 2013, 11:16:01 AM12/31/13

to chib...@googlegroups.com

Dear Leonid,

Thank you so much for your prompt and very helpful reply! Wish you a happy new year!

Best wishes!

Ying

To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/DBB212D9-F6B5-497A-A88B-2E563A04C7F4%40andrew.cmu.edu.

Reply all

Reply to author

Forward