Some problems with extracting error-free utterances and verbs from CHAT files

13 views
Skip to first unread message

Li Zeng

unread,
Jun 28, 2018, 12:48:18 PM6/28/18
to chibolts
Hi there, 

I encounter some problems with extracting utterances/ verbs  in CHAT files. 

Firstly, I have tagged ungrammatical utterances of *CHI with either [*], [* aux] or [* wh]. Now I wanna calculate the number of utterances without those tags([*], [* aux], [* wh] as well as those containing www, yyy.  I tried using the following command:  trim -s"[*_ ]" +1 , only to find it turns out to be unsuccessful.

Secondly, I would like to extract all the verbs of CHI* (including copulers, modals, auxiliaries as well as regular verbs ) in the file. I find out that at%mor, "walking" is coded not as a verb but as "PART |" . In that case, I guess I need to  also include "PART|"  , right?  I was wondering what might be the comprehensive command to be used to extract all the verbs mentioned above?

Thank you. 

Li 

Leonid Spektor

unread,
Jun 28, 2018, 1:37:50 PM6/28/18
to chib...@googlegroups.com
Li,

1. The codes like [*] or [* aux] refer to the word before them. If you want your codes refer to the whole utterance, then they need to start with "[+ ". You can change your codes to [+ *], [+ *aux], [+ *wh], then trim those utterances with command: trim -s"[+ \**]". 

2. If you used the latest MOR grammar on your data, then you can comprehensive command option for all verbs is: +sm|v,|cop,|aux,|mod,|mod:*,|part


Leonid.

--
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+u...@googlegroups.com.
To post to this group, send email to chib...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/addb310b-f4ed-497a-bd48-e1f91c045f53%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Li Zeng

unread,
Jun 29, 2018, 1:06:27 AM6/29/18
to chibolts
Leonid, thank you very much for your response. I was wondering what is the use of  "\" within "trim -s"[+ \**]"" ?

Li

Leonid Spektor

unread,
Jun 29, 2018, 10:10:53 AM6/29/18
to chib...@googlegroups.com
In searches the "*" character is used as a wildcard to specify that you want to match zero or more characters. Because error codes have "*" character in them you want to tell CLAN that in this case you are looking for the actual "*" character and not wildcard. The "\" character means that the following character is literal character and not a wildcard, so "\**" means that you want for find a "*" character followed by any zero or more characters. This will let you match [*] and [* aux]. CLAN searches have three wildcard character: *, % and _ .


Leonid.

Li Zeng

unread,
Jun 29, 2018, 9:01:11 PM6/29/18
to chibolts
I've got it. Many thanks. 

Li 
Reply all
Reply to author
Forward
0 new messages