Frequency of verb forms by verb type

Nicole Tracy-Ventura

unread,

Oct 8, 2013, 1:16:24 PM10/8/13

to chibolts

Dear all,

I'm interested in getting frequency counts of how often different verb forms are used by verb type. For example, imagine I want to know how often the verb 'walk' is used in the present tense, the present progressive, simple past, etc. But in this case I don't have a specific list of verbs. Rather I want to compare all verb types to see if some occur more often in one form over the others. Is it possible to do this in CLAN? I'm working with Spanish data that has been tagged with MOR so the information I need is there with the lemmas and verb tag.

Many thanks in advance for any suggestions.

Best wishes,

Nicole

Nicole Tracy-Ventura, Ph.D.
Assistant Professor
Department of World Languages
University of South Florida
4202 East Fowler Ave, CPR419
Tampa, FL 33620-5700
Email: n...@usf.edu

Leonid Spektor

unread,

Oct 8, 2013, 1:57:22 PM10/8/13

to chib...@googlegroups.com

Nicole,

Try command "freq +s"@|-v,|-cop,|-aux,|-mod,|-mod:*,|-part" *.cha". The +s option is so complicated, because there are a lot of way to mark different kind of verbs. This assumes that your data is in English and has %mor tier that was created with the latest mor grammar from childes web site. Otherwise, I don't think CLAN can do anything without explicit list of words.

Leonid.

--
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+u...@googlegroups.com.
To post to this group, send email to chib...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/CA%2B3CKJ4mXGwwXoNYjY9hewpAFM7Ktrs1zTf%3DRz3akQTQZxja%3Dw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Donna Jackson

unread,

Oct 8, 2013, 3:31:01 PM10/8/13

to chib...@googlegroups.com

I did this many years ago with my Spanish data. I have 2 published articles with the results of verb tense and aspecto. When I ran the data MOR was no working well with Spanish so I ran FREQs,, pulled out the verbs and then ran KWAL with the verb root and *. Don´t know if that is of any use to use.

Donna Jackson-Maldonado

2013/10/8 Nicole Tracy-Ventura <nicole.tra...@gmail.com>

--
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+u...@googlegroups.com.
To post to this group, send email to chib...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/CA%2B3CKJ4mXGwwXoNYjY9hewpAFM7Ktrs1zTf%3DRz3akQTQZxja%3Dw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Donna Jackson-Maldonado
Centro de Estudios Lingüísticos y Literarios
Facultad de Lenguas y Letras, Universidad Autónoma de Querétaro
Campus Aeropuerto, Circuito Fray Junípero Serra Km 8
Santiago de Querétaro, Qro., México 76140
web: http://www.donnajackson.weebly.com
e-mail: djacksonq r...@gmail.com

tel: 52 442 192 1200 ex. 61200 o 61140
home: 52 442 2180264

Nicole Tracy-Ventura

unread,

Oct 8, 2013, 3:37:16 PM10/8/13

to chibolts

Thank you, Leonid. I will try that.

Do you also happen to know if there's a way to make a copy of a .cha file where a certain speaker tier is always excluded? For example, I want to have the participant's data only in a new file. Then I plan to use the cp2utf +d1 +d2 command to create .txt files that I can analyze with another program that can't distinguish between speaker tiers like CLAN (I'm only interested in the participant's speech). I hope that makes sense.

Many thanks,

Nicole

To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/B807EB79-1F71-4D30-A6B9-AAD2D2740DD8%40andrew.cmu.edu.

Kevin Donnelly

unread,

Oct 8, 2013, 3:52:53 PM10/8/13

to chib...@googlegroups.com

::::On Tuesday 08 October 2013 Nicole Tracy-Ventura said::::

> Do you also happen to know if there's a way to make a copy of a .cha file
> where a certain speaker tier is always excluded? For example, I want to
> have the participant's data only in a new file.

grep '\*HYW' my_original_file.cha > myHYW.cha
where HYW is the speaker ID.

grep comes with GNU/Linux and Mac OS X.

On Microsoft Windows you could use something like Wingrep
(http://wingrep.com).

--
Pob hwyl / Best wishes

Kevin Donnelly
kevindonnelly.org.uk
bangortalk.org.uk

Leonid Spektor

unread,

Oct 8, 2013, 4:16:12 PM10/8/13

to chib...@googlegroups.com

To include only specified tiers in the output you should used KWAL command. For example, to create a file with only *PAR: tiers use this command:

kwal +t*PAR +d +f *.cha

Leonid.

To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/CA%2B3CKJ4AkmxQ94WDjvPBHRLpJDohNOkAuhuVuOzaw2iz%3D0FMDg%40mail.gmail.com.

Nicole Tracy-Ventura

unread,

Oct 8, 2013, 4:27:31 PM10/8/13

to chibolts

Thanks for both of those ideas! I just quickly tried the one Leonid sent and it did exactly that. I appreciate the quick replies!

Nicole

To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/4D96D8AE-DF5D-4462-A5C6-C9D4BBD05979%40andrew.cmu.edu.

Leonid Spektor

unread,

Oct 8, 2013, 4:30:08 PM10/8/13

to chibolts

Kevin,

Your suggestion to use Unix commands or any other non-CLAN commands on CHAT data is not a good idea, unless you take extra precautions, because CHAT format allows long tiers to wrap-around. Non-CLAN commands will work often, but they will fail on wrapped-around tiers and will leave out the part of tier that is wrapped. If you really want to use non-CLAN commands, then you should first run CLAN's LONGTIER command to remove all the tier wrapping and to make sure that the whole tier is on one line.

Leonid.

Kevin Donnelly

unread,

Oct 8, 2013, 5:54:25 PM10/8/13

to chib...@googlegroups.com

Hi Leonid

::::On Tuesday 08 October 2013 Leonid Spektor said::::

> Your suggestion to use Unix commands or any other non-CLAN commands
> on CHAT data is not a good idea, unless you take extra precautions,
> because CHAT format allows long tiers to wrap-around. Non-CLAN commands
> will work often, but they will fail on wrapped-around tiers and will leave
> out the part of tier that is wrapped. If you really want to use non-CLAN
> commands, then you should first run CLAN's LONGTIER command to remove all
> the tier wrapping and to make sure that the whole tier is on one line.

Sure, good point - you need to use LONGTIER or something like Sed to
straighten the lines (and in fact that might usefully be added to CLAN as an
option). But I think it's important to remember that one of the CHAT format's
strengths is that it is a vanilla plain text file, and therefore does not
actually require CLAN programs to analyse - using other programs is eminently
possible. It's maybe worth highlighting this because there was a recent
exchange on an R (stats language) list where the OP seemed to be under the
impression that CHAT files could ONLY be handled by CLAN - in fact, R will
consume them with no problem at all, just as it will consume any other text
file.

Leonid Spektor

unread,

Oct 8, 2013, 10:18:59 PM10/8/13

to chib...@googlegroups.com

Kevin,

You are absolutely right that CHAT is not CLAN specific format. But, being a plain text format makes it prone to have a lot of extraneous data in between. For example if someone wants to look at a speaker tier or %mor tier only you would have to filter out the rest. If you are looking for %mor tier of just one particular speaker only, then it becomes even more complicated. Using CLAN to filter unneeded data is the easiest solution. After that CHAT is just a plain text.

For those who do not want to use CLAN at all and still have an easy way to parse the data we have XML-CHAT on our server. Just look for "XML" in "Database" section on our web server's home page.

Leonid.

> --
> You received this message because you are subscribed to the Google Groups "chibolts" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+u...@googlegroups.com.
> To post to this group, send email to chib...@googlegroups.com.

> To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/201310082254.25791.kevin%40dotmon.com.

Kevin Donnelly

unread,

Oct 9, 2013, 5:17:13 AM10/9/13

to chib...@googlegroups.com

Hi Leonid

::::On Wednesday 09 October 2013 Leonid Spektor said::::

> You are absolutely right that CHAT is not CLAN specific format. But, being
> a plain text format makes it prone to have a lot of extraneous data in
> between. For example if someone wants to look at a speaker tier or %mor
> tier only you would have to filter out the rest.

The grep line I gave earlier does the speaker. Adjust it a bit to do %mor:
grep '%mor' my_original_file.cha > mymor.cha
Again, the file should have its lines straightened first using LONGTIER or Sed.

> If you are looking for
> %mor tier of just one particular speaker only, then it becomes even more
> complicated. Using CLAN to filter unneeded data is the easiest solution.
> After that CHAT is just a plain text.

Sure, the more complicated your question, the more complicated your tools may
have to be, and CLAN may do nearly everything you want out of the box,
provided you get the switches right. For your use-case, I would use the
following 12-line PHP script:
===
<?php

//Open a new file to write to.
$fp = fopen("mymor.txt", "w");

// Open the source file.
$lines=file("path/to/my/file.cha");
// Read through each line.
foreach ($lines as $line)
{
// If it's a speaker line matching the speaker you want ...
if (preg_match("/^\*HYW/", $line))
{
$getmor=1; // ... set a marker.
}
// If it's a speaker line that doesn't match that speaker ...
elseif (preg_match("/^\*[^(HYW)]/", $line))
{
$getmor=0; // ... revert the marker.
}

// If it's a %mor line and the marker is set ...
// (ie the last speaker was HYW) ...
if (preg_match("/^%mor/", $line) and $getmor==1)
{
echo $line; // ... show the line ...
fwrite($fp, $line); // ... write it to the new file ...
$getmor=0; // ... revert the marker.
}
}

// Close the new file.
fclose($fp);

?>
===

Not everyone will want to do this, I know, but the benefit of using PHP (or
grep, or Python, or R, or whatever) is that researchers can learn something
they can re-use in other contexts, or where other tools fit in better with
their workflow, or where CLAN, for all its versatility, can't produce what they
need.

> For those who do not want to use CLAN at all and still have an easy
> way to parse the data we have XML-CHAT on our server. Just look for "XML"
> in "Database" section on our web server's home page.

Hmm - I've always found XML harder to parse than text, but maybe that's just
me! :-)

Reply all

Reply to author

Forward