Using the gem and freq commands on bilingual data to get types and tokens by language for different activities

15 views
Skip to first unread message

Sarah Surrain

unread,
Feb 24, 2020, 8:18:12 PM2/24/20
to chibolts
Hello,

I am working with Spanish-English bilingual data from parent-child dyads. In the header, I have specified the languages as
@Languages:    spa, en
and I have used [- eng] precodes for English utterances and @s tags on English words embedded in Spanish utterances.

I would like to use gem markers to segment the transcripts by activity (such as book reading) and then run a freq command to count the number of tokens in Spanish and English used by the parent and child during that activity.

I used this command to retrieve only the book reading activities and create new CHAT files with headers:
gem +sbook +d1 +f *.cha

Then I tried these commands on the output to create an Excel file with the types, tokens, TTR and MATTR for each language:
freq +l +s*@s:eng +d3 +b10 *.cha
freq +l +s*@s:spa +d3 +b10 *.cha

However, I am getting these errors:
Language "eng" is not defined on "@Languages:" header tier.
and
Illegal use of "@s", no alternative language in position 1 defined on @Language: tier.

I can fix this by manually pasting the @Languages line into the header in the new file that I created using the gem command. Is there a way to automatically create CHAT files using the gem command that retain the @Languages line?

(I also tried the gemfreq command (gemfreq +sbook +l +s*@s:eng +d3 +b10 *.cha) but I wasn't able to create an excel worksheet with the types, tokens, etc for each participant. I got the error: The only +d levels allowed are 0–1).

Thank you!

Sarah Surrain

Sarah Surrain, Ed.M.
Ph.D. Candidate
Harvard University FAS | GSE
https://scholar.harvard.edu/sarahsurrain

Leonid Spektor

unread,
Feb 24, 2020, 10:08:07 PM2/24/20
to chib...@googlegroups.com
Sarah,

English language code is three letters as all other language codes are. For English the code is "eng".


Leonid.

--
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/11b1fe6c-0bf5-421f-98b1-2e76aeaa71bd%40googlegroups.com.

Sarah Surrain

unread,
Feb 25, 2020, 7:44:56 AM2/25/20
to chibolts
Dear Leonid,

My apologies for the typo! The languages header I used was

@Languages:    spa, eng

-Sarah
Sarah,

To unsubscribe from this group and stop receiving emails from it, send an email to chib...@googlegroups.com.

Leonid Spektor

unread,
Feb 25, 2020, 10:36:09 AM2/25/20
to chib...@googlegroups.com
Sarah,

In this case the problem is that GEM does not copy @Languages: header to its output. Add option +t@ or +t@Languages to GEM command.


Leonid.

To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/0bbdad80-63d5-43dd-9c4e-ec7560b1fdb6%40googlegroups.com.

Sarah Surrain

unread,
Feb 25, 2020, 10:46:49 AM2/25/20
to chibolts
Thank you! That fixed my problem.

-Sarah
Sarah,

Reply all
Reply to author
Forward
0 new messages