Re: using CLAN to find the frequency of nouns and verbs

Leonid Spektor

unread,

Oct 16, 2013, 4:33:53 PM10/16/13

to chibolts, stephanie...@gmail.com

Stephanie,

If you want to analyze data from our server, then we have many data choices that already have been tagged with MOR grammar. Our data is located on one of two servers at URLs:

http://childes.talkbank.org/data/

http://talkbank.org/data/local.html

If you look at "http://childes.talkbank.org/data/" web page, you will see data names with "-MOR" string. This data has MOR tags. You just need to download it and run FREQ commands to compute frequency of nouns and verbs. I will give an example of the FREQ commands later. First you need to decide which words exactly do you consider to be nouns and verbs. To give you better explanation I would recommend that you download English or your choice language grammar from our server at URL:

http://childes.talkbank.org/morgrams/

I will use English data as an example, because you did not specify which language you are interested in. After you download MOR grammar from web link above you will unzip it. In case of English grammar you will get "eng" folder and move it to hard disk to preferably "CLAN" folder. If you are using Mac, then it will go into "/Applications/CLAN" folder and if you are using Windows PC, then it will go into "c:\TalkBank\CLAN" folder. If you installed CLAN in custom location, then you know where CLAN is located on your computer. Now open folder "eng/lex". Here you will see files that combine words into groups of particular parts of speech. You can see that there a number of files with "n-" and "v-" string. This is where deciding which words are nouns and which are verbs comes in. For example absolutely all verbs and all nouns can be found with this FREQ command:

freq +s"@|-n,|n:*,|-v,|-cop,|-aux,|-mod,|-mod:*,|-part" *.cha

This search includes nouns, pronouns, verbs, auxiliary and participle verbs and other variations of nouns and verbs. You can open each file in "eng/lex" folder to see a list of all words of each part of speech.

If you do not want to include pronouns or other variations of nouns in your count, then you would use command:

freq +s"@|-n,|-v,|-cop,|-aux,|-mod,|-mod:*,|-part" *.cha

In the purest form nouns and verbs are counted with this command:

freq +s"@|-n,|-v" *.cha

But, if you want to count auxiliary and participle verbs along with basic verbs, then use this command:

freq +s"@|-n,|-v,|-aux,|-part" *.cha

As you can see you can fine tune your search to your particular specifications. All above FREQ command will output the whole form of each words. If you want to know only the count of each part of speech, then replace above four commands with following four commands:

freq +s"@|-n,|n:*,|-v,|-cop,|-aux,|-mod,|-mod:*,|-part,o-%" *.cha
freq +s"@|-n,|-v,|-cop,|-aux,|-mod,|-mod:*,|-part,o-%" *.cha
freq +s"@|-n,|-v,o-%" *.cha
freq +s"@|-n,|-v,|-aux,|-part,o-%"

There are simpler ways to look for verbs and nouns with FREQ command, but the more complex search patterns in above commands give you most precision. If you want to see the meaning of all those "|-" and "o-" symbol, then just type "freq +s@" command or for even more explanation look in CLAN manual.

If you want to analyze your own data or data that doesn't have MOR tags, then after you download and unzip MOR grammar you need to set "mor lib" directory to the location on hard drive where you placed the grammar folder. In my example above it will be on Mac "/Applications/CLAN/eng" folder and on PC "c:\TalkBank\CLAN\eng" folder. In CLAN's "Commands" window click on the button "mor lib", navigate to location of language grammar on your computer's hard drive and select that folder. Now you need to run two following commands:

mor +1 *.cha

post +1 *.cha

This will add "%mor" tier to all your data files and you will be ready to run your analyzes. If you have any CLAN questions, then please post them to the chib...@googlegroups.com address and some will be able to help you.

Leonid.

On Oct 16, 2013, at 11:27, stephanie...@gmail.com wrote:

Hi,

I'm new to the CHILDES community. I've been reading the CLAN Manual but I would like to ask a question. I would like to analyse the frequency of nouns and verbs. I do not seem to be using the correct command and I cannot seem to find it. Could you please guide me when I should look please?

Could you also tell me where I should download MOR please as if I understood I need to download it separately and it is needed to analysis the nouns and verbs.

Thank you for your help

Stephanie

--
You received this message because you are subscribed to the Google Groups "Info-CHILDES" group.
To unsubscribe from this group and stop receiving emails from it, send an email to info-childes...@googlegroups.com.
To post to this group, send email to info-c...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/info-childes/378b4439-6551-473c-bc8d-7ed81bb85a1a%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

stephanie...@gmail.com

unread,

Oct 17, 2013, 1:02:15 PM10/17/13

to chib...@googlegroups.com

I'm not sure if you are receiving my emails as I wasn't accepted before. I managed to do as you said and yes, sorry for not mentioning it before I am looking at the English language. When I tried the commands you gave me this message came up

> freq +s"@|-n,|-v" *.cha

**** WARNING: No file matching "*.cha" was found.

DATA ASSOCIATED WITH CODES [/], [//], [///], [/-], [/?] IS EXCLUDED BY DEFAULT.
TO INCLUDE THIS DATA PLEASE USE "+r6" OPTION.

When I downloaded the English file they are all .cut. I'm guessing that is wrong. Did I have to download something else apart from the CLAN to convert them please?

Is there a way to sort the data according to the age please? As I would like to look at older children.

Leonid Spektor

unread,

Oct 17, 2013, 1:41:05 PM10/17/13

to Stephanie Mifsud, chibolts

Stephanie,

The "*.cha" refers to CHAT data files, those files have file extension ".cha". This is how FREQ knows which files you want to analyze. If you do have CHAT files, then the error occurred because you did not set "working" directory in CLAN's "Commands" window to the location on hard disk where those data files are located. First I need to know what format your data files are in. Are they CHAT or plain text formatted? Did you download them from our server? I assume that the .cut files you are referring to are the English grammar files you have downloaded. Those are not data files. CLAN can only work on plain text encoded files. To set "working" directory you need to click on "working" button and navigate to location on hard disk where those data files are located. Alternatively, when you type "freq" in "Commands" window you will see the "File In" button appear above where you are typing. Click on that button and it will let you navigate to directory where your data files are located and let you select any data files you want FREQ command to analyze. If you always use "File In" button, then you will never need to set "working" directory. CLAN has many different ways to do the same thing and you can choose anyway it is easier for you. If you use "File In" button, then your command will be:

freq +s"@|-n,|-v" @

The '@' symbol will be automatically added to command by "File In" button.

It would be very helpful to me if you tell me what computer/OS you are using. Is it Mac/Apple or Windows XP, Windows 7, 8 or something else? Since your questions are getting very specific, the please email to me directly instead of posting to chilbolt group.

Leonid.

On Oct 17, 2013, at 10:25, Stephanie Mifsud wrote:

Hi Leonid,

I'm not sure if you are receiving my emails as I cannot send from the google groups, maybe I haven't been accepted yet. I managed to do as you said and yes , sorry for not mentioning it before I am looking at the English language. When I tried the commands you gave me this message came up

> freq +s"@|-n,|-v" *.cha

**** WARNING: No file matching "*.cha" was found.

DATA ASSOCIATED WITH CODES [/], [//], [///], [/-], [/?] IS EXCLUDED BY DEFAULT.
TO INCLUDE THIS DATA PLEASE USE "+r6" OPTION.

When I downloaded the English file they are all .cut. I'm guessing that is wrong. Did I have to download something else apart from the CLAN to convert them please?

Thanking you in advance,

Stephanie

On Wed, Oct 16, 2013 at 10:53 PM, Stephanie Mifsud <stephanie...@gmail.com> wrote:

Wow ok I will try this tomorrow :) Thank you very much. It seems much more clear now.

Reply all

Reply to author

Forward