language alternation search

17 views
Skip to first unread message

A Cristia

unread,
Mar 23, 2017, 2:19:53 PM3/23/17
to chibolts, gladys
Dear clan users,

In a bilingual corpus, is there a way to search for pairs of sentences where a language switch has occurred? A search for the tagged language will only reveal switches from the minor to the major language, but we'd like to extract both:

*FAC:    ʔaqaixana .
*FAC:    ten qaica naxa qaicaʔ .
*FAC:    [- spa] vamos afuera . <---- LANGUAGE SWITCH FROM THE PREVIOUS SENTENCE TO THIS SENTENCE (major to minor -- can be found searching for [- spa])
*FAC:    ñaq qaica ten  paʔatauec na . <---- LANGUAGE SWITCH FROM THE PREVIOUS SENTENCE TO THIS SENTENCE (minor to major -- can it be found?)
*FAC:    ñaq qaica ten .



Thank you in advance,

Gladys Ojea and Alex Cristia


Leonid Spektor

unread,
Mar 23, 2017, 3:27:02 PM3/23/17
to chib...@googlegroups.com, gladys

Alex,

    I am not sure what do you mean by "LANGUAGE SWITCH", but you can use +s"[- spa]" option to analyze only utterances with "[- spa]" code and -s"[- spa]" option to analyze only utterances that do not have "[- spa]" code. If this doesn't help, then please email to me with more input data files examples and examples of output that you want to get.

Leonid.

--
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+u...@googlegroups.com.
To post to this group, send email to chib...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/b465a75f-66da-4a69-86c1-35cd9bc50ea8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

A Cristia

unread,
Mar 24, 2017, 6:50:01 AM3/24/17
to chibolts, glady...@gmail.com
Dear Leonid,

Thank you for the fast response. Gladys would like to extract are *pairs* of sentences, one spoken in one language, the other in another. Imagine a sequence like this:
  1. English
  2. English
  3. French
  4. French
  5. French
  6. English
Gladys would like to extract sentences 2-3 (switch Eng->Fr), and 5-6 (switch Fr->Eng).

Of course, this can be approximated by using kwal, extracting the [- spa] sentences with some context, and then looking through by hand to see if the context is also in Spanish (so not a switch) or in Qom (yes, it's a switch, and thus part of what we would like to extract). I wonder if there is an elegant solution for this in CLAN already.

If I were to do this in bash, I'd do something not very elegant like (imagining there is only the content of the transcription):
sed -E '/[- spa]/!s/^/[- qom]/' | #add [- qom] to all lines NOT marked with [- spa]
   tr '\n' '€' |                             #next replace the line breaks by a placeholder
   sed 's/€\(.....)/\1€\1/g' |       #duplicate the language marker on each side of the placeholder
   tr '€' '\n'  |                            #translate back the placeholder into  line breaks
grep -A 1 -B 1 '[- qom]*[- spa]'  # and finally extract sentences that have both language markers

Does that make more sense? Thank you in advance,

Alex

Leonid Spektor

unread,
Mar 24, 2017, 11:15:54 AM3/24/17
to chib...@googlegroups.com, glady...@gmail.com

Alex,

Here is are two commands that will do what I think you want. They are not extremely elegant, but then again nothing involving regular expressions search is. For English to French switch try this command:

combo +b2 -l +s"\**:^*s:eng^*^\**:^*s:fra" *.cha

And for French to English switch try this command:

combo +b2 -l +s"\**:^*s:fra^*^\**:^*s:eng" *.cha

If this is not working well for you and Gladys, then I really need you to email to me directly a sample of your data file, so that I can see all tags and their use in the file in order to suggest a more precise command. I understand that this feature is very valuable to studying bilingual data, so we might even try to add some new features to CLAN to do a better job at searching for language switching.


Leonid.

Reply all
Reply to author
Forward
0 new messages