excluding utterances with code-switching

15 views
Skip to first unread message

Janet Bang

unread,
Nov 19, 2018, 6:52:45 PM11/19/18
to chib...@googlegroups.com

Hello, 


We are working on bilingual transcriptions and had a question about code-switched utterances. Apologies if I've missed this in the manual. 


One of our goals is to obtain an mlu for Spanish only utterances, excluding mixed utterances. For example: 

*MOT: ahorita tienes que comer.

*MOT: no es time@s to@s sleep@s. 


We would like to obtain an mlu (on the %mor line) excluding the utterance with code-switching. We've tried the following command, but this includes both utterances excluding the English words, where we'd like the output to consider the Spanish-only line. 

mlu -s"[- eng]" -s"L2|*"


It seems like our options are: 

1) go back to our transcripts and add a postcode for any code-switched utterances to use the +s switch with postcodes

2) use kwal to exclude utterances with the @s symbol similar to what is seen here


I wanted to know if there was a way to use the switches to exclude utterances with the @s symbol, or automate a way to include a postcode in our transcripts for every utterance with the @s symbol? 


Thank you in advance, 

Janet







Leonid Spektor

unread,
Nov 19, 2018, 9:08:11 PM11/19/18
to chib...@googlegroups.com
Janet,

MLU and most other commands work on word by word bases, so the -s"L2|*" or -s"*@s* options will only exclude those words, but not the whole utterance. KWAL and COMBO allow selection or exclusion of the whole utterances based on words on those utterances. If you have "[- eng]" pre-code and @s symbols on words and you want to exclude all utterances with "[- eng]" pre-code or if the utterance has at least one word with @s, then you need to use KWAL to extract the utterances you want first and then run MLU on the output of KWAL command. This KWAL and MLU commands should do what you want:

kwal -s"[- eng]" -s*@s* +o@ID +o% -d +f  filename(s).cha
mlu *.kwal.cex

If this doesn't do what you want, then please email to me a sample of your data files, so that I could see how you have coded them and give me more details on what you want to achieve.


Leonid.

On Nov 19, 2018, at 18:52, Janet Bang <jb...@stanford.edu> wrote:

Hello, 

We are working on bilingual transcriptions and had a question about code-switched utterances. Apologies if I've missed this in the manual. 

One of our goals is to obtain an mlu for Spanish only utterances, excluding mixed utterances. For example: 
*MOT: ahorita tienes que comer.
*MOT: no es time@s to@s sleep@s. 

We would like to obtain an mlu (on the %mor line) excluding the utterance with code-switching. We've tried the following command, but this includes bothutterances excluding the English words, where we'd like the output to consider the Spanish-only line. 
mlu -s"[- eng]" -s"L2|*"

It seems like our options are: 
1) go back to our transcripts and add a postcode for any code-switched utterances to use the +s switch with postcodes
2) use kwal to exclude utterances with the @s symbol similar to what is seen here

I wanted to know if there was a way to use the switches to exclude utterances with the @s symbol, or automate a way to include a postcode in our transcripts for every utterance with the @s symbol? 

Thank you in advance, 
Janet






-- 
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+u...@googlegroups.com.
To post to this group, send email to chib...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/MWHPR02MB32801F5C1CF989ABE63E162FD7D80%40MWHPR02MB3280.namprd02.prod.outlook.com.
For more options, visit https://groups.google.com/d/optout.

Janet Bang

unread,
Nov 19, 2018, 10:12:15 PM11/19/18
to chib...@googlegroups.com
Hi Leonid,

Yes thank you this will work out great!

Janet

Reply all
Reply to author
Forward
0 new messages