within-word coding in Mandarin transcripts

15 views
Skip to first unread message

Pei-Tzu Tsai

unread,
Oct 23, 2021, 1:01:05 AM10/23/21
to chibolts
Hi,
Is there any recommended way of coding Mandarin transcripts in Chinese characters at the sound level, while still running Mor successfully? For example, initial sound repetition in 狗 (↫g↫gou3) 看 到 了. We tried replacing the character with pinyin but Mor doesn't recognize it.
Thanks,
Peitzu

Brian Macwhinney

unread,
Oct 23, 2021, 12:14:10 PM10/23/21
to ChiBolts, Nan Bernstein Ratner
Dear Peitzu,
There are two ways to do this. If you just want to do this occasionally for one or two words, you can use the form
↫g↫gou3 [: 狗]
then MOR will ignore the ↫g↫gou3 and only make use of 狗.
Alternatively, if you want to study phonology systematically, you can create a complete %pho tier.
Your example also makes the important point that the systematic coding of disfluencies faces some challenges in Chinese and other languages with whole-character coding. I see that you are using disfluency coding for the repetition of the initial consonant. If you wanted to study this extensively, it might almost be best to create a secondary %ort line that would be the basis of a complete Pinyin transcription.

— Brian MacWhinney
> --
> You received this message because you are subscribed to the Google Groups "chibolts" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+u...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/836ba77e-f988-4a61-bf1c-27df26fdfc9en%40googlegroups.com.

Pei-Tzu Tsai

unread,
Oct 25, 2021, 12:39:11 PM10/25/21
to chibolts
Thank you, Brian. I'm guessing the first option would then prevent flucalc from identifying the disfluencies since the coding is replaced. Please advise if there is a way around it. If we go with converting all samples to pinyin entirely, using the segmenter/translater command that runs in the terminal, is there any way we can at the same time convert the characters from the main tier to the %ort tier? 
Peitzu


Brian Macwhinney

unread,
Oct 25, 2021, 12:51:04 PM10/25/21
to ChiBolts, Pei-Tzu Tsai
Dear Peitzu,
     I am not recommending replacing the main line with thhe %ort line, but rather adding the %ort line with Pinyin to support disfluency coding. There is software that can do automatic recoding of Hanzi to Pinyin.  I could discuss that with you separately, if you decide to go that way.

— Brian MacWhinney
Teresa Heinz Professor of Cognitive Psychology, 
Computational Linguistics, 
and Modern Languages, CMU

Leonid Spektor

unread,
Oct 25, 2021, 2:54:25 PM10/25/21
to chib...@googlegroups.com, Pei-Tzu Tsai
I have to add that FLUCALC will not work correctly no matter what option you choose, because it was designed to work for English language only.


Leonid.

Brian Macwhinney

unread,
Oct 25, 2021, 3:05:09 PM10/25/21
to ChiBolts
Correct. Moreover we have no comparison fluency data set for other languages, although there might be some eventually for Dutch. Supporting fluency analysis for Chinese would be a great thing for fluency research, but it would be a big project, hopefully supported by grants from China.

— Brian 

Pei-Tzu Tsai

unread,
Oct 26, 2021, 12:50:31 PM10/26/21
to chibolts
Expanding FLUCALC to other languages would certainly help speed up fluency research across languages. I'd happy to continue the discussion on the side about fluency data set in Mandarin to support the modification of FLUCALC. For now, we'll rely on FREQ to get part of the analysis done. Thanks for all the input. 
Peitzu
Reply all
Reply to author
Forward
0 new messages