Having difficulty using KWAL command to search for strings which involve periods

28 views
Skip to first unread message

Peredur Webb-Davies

unread,
Apr 17, 2017, 10:26:58 PM4/17/17
to chibolts
Hi,

I'm trying to analyse our corpus of Welsh linguistic data which has been coded using CHAT. The corpus has multiple secondary tiers, and one of them is called %aut ('autoglossed tier') , which details word-by-word morphological glossing for each item in the main tier.

I want to use the KWAL command in CLAN to find certain items in the %aut tier, namely (at the moment) to find pronouns. The aim is to then extract the data from the relevant main tier, so that I have collated all tokens of that pronoun from a given set of data. These pronouns have been glossed in %aut in the format e.g. your.ADJ.POSS.2S for a 2nd person singular pronoun. My assumption was that a command such as the following would find all items matching the search string in the %aut tier in (e.g.) the file called davies2ag.cha:

kwal +t%aut +syour.ADJ.POSS.2S davies1ag.cha

However, this gives no results - even though I know there are instances of this token in that transcript. If you search for just +syour , for example, it finds those results (but cannot disambiguate between 2nd singular and plural forms, among other issues). I assume the problem is presence of the periods/full stops in the %aut tier.

I have tried using "" around the search string, I have tried to replace the stops with * and _ and ^_^, and I have also tried using COMBO instead of KWAL, but nothing helps. It says no items found.

I assume that there is an obvious answer that my very limited knowledge of coding is keeping from me, but if anyone can assist I would be hugely grateful!

I can supply more details as required.

Many thanks,

Peredur Webb-Davies

PS For info, here is a weblink to an example of a transcript, to show how it has been glossed etc.: http://bangortalk.org.uk/chats/siarad/davies2.cha

Brian MacWhinney

unread,
Apr 18, 2017, 12:20:30 AM4/18/17
to chib...@googlegroups.com

Yes, the problem is your use of periods.  CLAN considers the period to be a sentence delimiter.  I replaced them with a dash and then used the command 

kwal +s"see-V-INFIN*"  +t%aut test.cha

There were no problems.  The example with one line changed is attached.  It should easy enough to fix these using a GREP command.

 

-Brian MacWhinney

--
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+u...@googlegroups.com.
To post to this group, send email to chib...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/a53a72cd-fbd2-4cc5-9d04-3849a546b1c6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

test.cha

Peredur Webb-Davies

unread,
Apr 18, 2017, 1:16:26 AM4/18/17
to chibolts
Thanks Brian. Can you please point me to where I can learn about GREP commands? I can't find a reference to them in the CHAT or CLAN manuals.

Peredur

Brian MacWhinney

unread,
Apr 18, 2017, 1:44:36 AM4/18/17
to chib...@googlegroups.com

Peredur,

 

You can also do this with CHSTRING using this command:

 

chstring +t%aut +s"*.*" "*-*" *.cha +1

 

Make a complete copy of all of your files before trying this, because the +1 switch overwrites the originals, and there could be mistakes.

 

You have to run that command several times to make sure you change all the periods.

 

GREP is a part of UNIX that uses REGEX (regular expressions) and there are books about this.

 

-- Brian

Peredur Webb-Davies

unread,
Apr 21, 2017, 1:42:26 AM4/21/17
to chibolts
Thanks again Brian. I'll give this a try.

Peredur
Reply all
Reply to author
Forward
0 new messages