verb lemmas and their frequencies

26 views
Skip to first unread message

Naomi Shin

unread,
Mar 12, 2018, 4:11:52 PM3/12/18
to chibolts
Hi all,

I'm quite new to working with CHILDES.

I am trying to extract all verb lexemes and their associated frequencies from all Spanish files on CHILDES.  So far, I created a folder with only the Spanish files that have the %mor tier. I have been able to run > freq +sm"r-v,o-%" *.cha

and the program runs, but the output is 0 for each file even though when I open files randomly, I do see examples of verbs coded as verbs in the %mor tier.

I'd be so grateful for any suggestions you have. I'd also be very happy to hire a tutor if you know of anyone who might be interested and who has the relevant expertise.

Thank you so much,
Naomi Shin

Leonid Spektor

unread,
Mar 12, 2018, 4:56:27 PM3/12/18
to chib...@googlegroups.com
Naomi,

The "r-v" tells FREQ to find all stems that are "v".

To find all verbs you need this command:
freq +sm|v,o-% *.cha

To find all verbs and stems you need this command:
freq +sm|v,;*,o-% *.cha

For more explanation of "+sm" option and more examples please type command "freq +sm"


Leonid.

--
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+u...@googlegroups.com.
To post to this group, send email to chib...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/ddd49f13-0526-423c-b219-3f754a4f1eb6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Naomi Shin

unread,
Mar 12, 2018, 5:32:07 PM3/12/18
to chib...@googlegroups.com
Thank you thank you thank you, Leonid!!!! This is terrific!
Is there any way to calculate frequencies for each verb form (or even better, verb lexeme) based on the output file? I was able to get frequencies of each verb stem for EACH file using freq +sm|v,;*,o-% *.cha. 
Thank you again!
-Naomi

On Mon, Mar 12, 2018 at 2:56 PM, Leonid Spektor <spe...@andrew.cmu.edu> wrote:
Naomi,

The "r-v" tells FREQ to find all stems that are "v".

To find all verbs you need this command:
freq +sm|v,o-% *.cha

To find all verbs and stems you need this command:
freq +sm|v,;*,o-% *.cha

For more explanation of "+sm" option and more examples please type command "freq +sm"


Leonid.
On Mar 12, 2018, at 16:11, Naomi Shin <naomi...@gmail.com> wrote:

Hi all,

I'm quite new to working with CHILDES.

I am trying to extract all verb lexemes and their associated frequencies from all Spanish files on CHILDES.  So far, I created a folder with only the Spanish files that have the %mor tier. I have been able to run > freq +sm"r-v,o-%" *.cha

and the program runs, but the output is 0 for each file even though when I open files randomly, I do see examples of verbs coded as verbs in the %mor tier.

I'd be so grateful for any suggestions you have. I'd also be very happy to hire a tutor if you know of anyone who might be interested and who has the relevant expertise.

Thank you so much,
Naomi Shin

--
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe@googlegroups.com.

To post to this group, send email to chib...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/ddd49f13-0526-423c-b219-3f754a4f1eb6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "chibolts" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/chibolts/QdMDz_pU_ws/unsubscribe.
To unsubscribe from this group and all its topics, send an email to chibolts+unsubscribe@googlegroups.com.

To post to this group, send email to chib...@googlegroups.com.

Leonid Spektor

unread,
Mar 12, 2018, 5:38:05 PM3/12/18
to chib...@googlegroups.com
Naomi,

I don't understand exactly what you want, because the command you are using does calculate frequency for each verb lemma. Perhaps you want the following command:

freq +sm|v,;* *.cha


Leonid.

To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+u...@googlegroups.com.

To post to this group, send email to chib...@googlegroups.com.

Naomi Shin

unread,
Mar 12, 2018, 6:15:54 PM3/12/18
to chib...@googlegroups.com
Hi, 
Sorry for being opaque. I DID get frequency for each lemma, but the frequencies are based on each file, so I got output like what I've pasted below. But what I'm asking is how to get the frequency ACROSS all the speakers/files. So, for example, there are 3 tokens of verb stem abridged  in the first file and then 1 token from the child, but from a later file. Imagine that for all the files I'm looking at there's a total of 50 tokens of abri verb stem. Is there a way to just automatically extract that number (for each verb stem) without having to manually go through and count how many abridged stems there are for each file (i.e. 3+1+ ...).  In other words, what I want is the TOTAL number of tokens of verb stem abri -- including all speakers and including all the Spanish files that have %mor tiers - a few hundred files since there are often more than one file per child (I've put them all in one folder). 
I hope this clarifies the question.
Thanks!
-Naomi


small portion of current output:

From file <diegoU030614a.cha>
Speaker: *MOT:
  3 v|abri
  5 v|cabe
  3 v|cerra
  1 v|coge
  1 v|da
  1 v|dormi
  1 v|empuja
  1 v|entra
  1 v|falta
  6 v|gusta
  4 v|habe
  2 v|hace
 12 v|i
  1 v|importa
  2 v|junta
  1 v|marcha
  3 v|mira
  1 v|monta
  2 v|move
  1 v|necesita
  2 v|parece
  8 v|pode
  4 v|pone
  1 v|prepara
  4 v|quere
  1 v|regala
  3 v|sabe
  2 v|saca
  1 v|sali
 16 v|tene
  1 v|tira
  1 v|toca
  4 v|trae
  3 v|ve
  1 v|veni
------------------------------
   35  Total number of different item types used
  104  Total number of items (tokens)
0.337  Type/Token ratio

Speaker: *CHI:
  1 v|abri
  1 v|aparca
  3 v|baja
  6 v|cabe
  5 v|cerra
  1 v|coge
  3 v|come
  1 v|deja
  1 v|desperta
  1 v|entra
  3 v|espera
  7 v|habe
  2 v|hace
 31 v|i
  1 v|mira
  2 v|oí
  1 v|parece
  3 v|pode
  2 v|pone
  2 v|queda
  1 v|queja
  1 v|sabe
  1 v|saca
  2 v|senta
  4 v|tene
  1 v|tira
  1 v|toca
  1 v|trae
  2 v|vale
  1 v|ve
------------------------------
   30  Total number of different item types used
   92  Total number of items (tokens)
0.326  Type/Token ratio

From file <diegoU030614b.cha>
Speaker: *CHI:
  2 v|dispara
  1 v|escapa
  2 v|espera
  2 v|habe
 16 v|i
  1 v|lanza
  1 v|mete
  4 v|mira
  2 v|oí
  4 v|parece
 14 v|pode
  4 v|pone
  1 v|quere
  1 v|saca
  1 v|senta
  1 v|tira
------------------------------
   16  Total number of different item types used
   57  Total number of items (tokens)
0.281  Type/Token ratio

Speaker: *GUI:
  4 v|apreta
  5 v|da
  1 v|deja
  1 v|echa
  3 v|espera
  1 v|explica
  2 v|habe
  2 v|hace
  7 v|i
  4 v|mira
  1 v|oí
  1 v|pode
  6 v|pone
  1 v|sali
  1 v|sujeta
  2 v|tene
  2 v|tira
  2 v|veni



--
You received this message because you are subscribed to a topic in the Google Groups "chibolts" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/chibolts/QdMDz_pU_ws/unsubscribe.
To unsubscribe from this group and all its topics, send an email to chibolts+unsubscribe@googlegroups.com.
To post to this group, send email to chib...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages