Participant ID

9 views
Skip to first unread message

Elnaz Kia

unread,
Aug 11, 2021, 3:16:25 AM8/11/21
to chibolts
Hello again,

I had a question about the participant ID in CHAT. In our corpus, all the files are single-speaker and the participant ID includes 5 characters, one letter representing the language, and 4 digits representing a unique student id).

@Begin
@Languages: zho
@Participants: c0002 IDc0002 Student
@ID: zho|corpus|c0002|0;00.00||||Student|||
@Media: c0002cadr01_1, audio
@Transcriber: Xiqiang Wang
@Comment: hum, ahn, ah, eh signal hesitation
@Situation: unspecified
*c0002: eh (..) .
*c0002: wo3 (..) .
*c0002: eh .
@End

According to the CHAT Manual (2021, p. 21), "After the asterisk on the main line comes a three-letter code in upper case letters for the participant who was the speaker of the utterance being coded."

My questions are: 
1. How does the fact that our participant IDs are 5 characters and include digits affect CLAN functions?
2. If having more than 3 characters is okay, does it matter whether the letters are capital or small (c0002 vs. C0002)? Which one is preferred?

I would appreciate any help!

Thanks,
Elnaz


Leonid Spektor

unread,
Aug 11, 2021, 4:47:25 AM8/11/21
to chib...@googlegroups.com
Hi Elnaz,

Your question has two aspects:

From technical point there is nothing wrong with speaker codes if they are shorter than 9 characters including the * character at the beginning. They can have numbers and/or letters in any combination you want. One way you can see if what you are doing is okay is by running CHECK command on your data files. If the result is "ALL FILES CHECKED OUT OK!", then all is good. Otherwise, you should fix whatever error is reported.

From non-technical or preferred point I will leave it to Brian MacWhinney, who designed CHAT format, to answer your question.

Leonid.

--
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/CAOwOJYneWMKtfh8tqzT80Fh8WDEroe6TKC8qyuPuByiQnXXh2w%40mail.gmail.com.

Brian Macwhinney

unread,
Aug 11, 2021, 11:26:12 AM8/11/21
to ChiBolts
Dear Elnaz,
Leonid pointed out that your IDs are technically okay. However, unless you really envision recruiting 10,000 participants, I think it would be much better to stick with two digits for the subject ID and maybe consider whether the inclusion of “c” is really necessary. Perhaps it is redundant with something else? So, maybe you only need *02 and such.

— Brian MacWhinney

Leonid Spektor

unread,
Aug 11, 2021, 12:36:29 PM8/11/21
to chib...@googlegroups.com
There is another reason to keep speaker codes simple and uniform. Some commands require specifying speaker code with +t option to run analyses on data files. If you have many different speaker codes, such as c0002, c0003, c0004 and so on, then when you run commands on multiple data files, for example *.cha, you will have to use +t option to specify all different speaker codes that you have in your data files. The best solution would be to have one consistent *STU: code for all students and the way you can distinguish one subject from another is by filename or by @ID code. This way you can run command like KWAL or FREQ on *.cha with just one +t*STU option. You can edit @ID headers with "Tiers->ID headers" menu.

For example your data file would look like this:

@Begin
@Languages: zho
@Participants: STU IDc0002 Student
@ID: zho|corpus|STU|0;00.00||||Student||IDc0002|
@Media: c0002cadr01_1, audio
@Transcriber: Xiqiang Wang
@Comment: hum, ahn, ah, eh signal hesitation
@Situation: unspecified
*STU: eh (..) .
*STU: wo3 (..) .
*STU: eh .
@End


Leonid.
> To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/4B98B1BA-24CA-4739-B0DA-A45737A175E4%40andrew.cmu.edu.

Brian Macwhinney

unread,
Aug 11, 2021, 12:38:27 PM8/11/21
to ChiBolts, Leonid Spektor
Leonid,
Thanks for pointing this out. Yes, this is totally correct.
— Brian
> To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/CB9C77A2-9F2C-4A6B-8A6D-F7A5F8F002FF%40andrew.cmu.edu.

Elnaz Kia

unread,
Aug 11, 2021, 12:55:08 PM8/11/21
to chibolts, Brian Macwhinney, Leonid Spektor
Dear Leonid and Brian,

I cannot thank you enough for the help. Thanks for the clear explanations and instructions.

Thank you,
Elnaz



Reply all
Reply to author
Forward
0 new messages