How to search for Chinese signs when transcribers randomly added spaces in the transcript?

5 views
Skip to first unread message

Eva Berglund

unread,
Aug 21, 2025, 4:53:03 AMAug 21
to chib...@googlegroups.com
Hello,

I am right now analyzing Mandarin transcripts for third-person pronouns and for instance some plurals are written 他們, however I have noticed that some transcribers have added one or maybe many spaces between the signs and thus they are not counted as they should when I use FREQ. Is it possible to write some kind of command in CLAN to find the instances with one or many spaces so plurals are counted as they should?

Best regards

Eva Berglund

Leonid Spektor

unread,
Aug 21, 2025, 9:10:48 AMAug 21
to chib...@googlegroups.com
Eva,

If there many spaces between words, then you can use "chstring -q +d *.cha" command remove extra spaces.

After that if there are still unwanted space characters between signs, then those spaces need to be removed too. CLAN completely relies on spaces to determine how to separate text into words. I am not familiar with Chinese language, so I can't give you specific suggestion. However, CHSTRING command can find space character and remove them if necessary. For example to remove a space character before a particular sign you can use command:

chstring -w +s" s" "s" *.cha

In above example I am using letter 's' to represent plural Chinese sign that you might want to join with previous adjacent sign to create plural word without space character between signs.

Hopefully some with knowledge of Chinese language might give you better advice.


Leonid.
> --
> You received this message because you are subscribed to the Google Groups "chibolts" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+u...@googlegroups.com.
> To view this discussion visit https://groups.google.com/d/msgid/chibolts/ABB3C942-A4CF-4345-88FE-67BC555C23CF%40gmail.com.

Brian Macwhinney

unread,
Aug 21, 2025, 10:04:48 AMAug 21
to chib...@googlegroups.com
Eva,
Another approach would be to create a CHSTRING list of forms from which you wish to remove spaces.

— Brian MacWhinney
> To view this discussion visit https://groups.google.com/d/msgid/chibolts/91AF3F57-7626-4ADB-83DA-19E9ADB57A8A%40andrew.cmu.edu.

Reply all
Reply to author
Forward
0 new messages