Frequency of words

40 views
Skip to first unread message

Brielle Stark

unread,
Sep 13, 2017, 1:49:37 PM9/13/17
to chib...@googlegroups.com
Hello all.

I have a question about calculating word frequency. We're working with aphasia participants who will often make mistakes, and when they do make mistakes, we'll put in the intended word into [: target] if we know what the intention was. However, I do not want to count [: target] words in the frequency tally of words. Basically, if someone said furry [: fairy] in one instance, and I am looking for a frequency count of the correctly spoken 'fairy,' I want the frequency calculation for 'fairy' to be 0, thus ignoring the word in the target. Further, I'd also like to run for lemmas and not morphological changes. In other words, if I'm looking for "stair," I want 'stairs' to be counted in the frequency of 'stair' usage.

Detail:

When I run the command:

freq -sm** -sm@* +sCinderella +sstair +sfairy

on the attached transcript [completely made up, by the way], it evaluates the %mor line but doesn't ignore the target [: target] words like I thought it would. It does do the correct job in tagging 'stair' even though the participant said 'stairs,' a correct usage from the %mor line. Output of frequency for this command was:
Cinderella: 1
stair: 1
fairy: 1

However, as I said, I wouldn't want the incorrect furry [: fairy] to count. So, I tried:

freq -sm** -sm@* +t*PAR +sCinderella +sstair +sfairy

Now that I've told CLAN to stick to the speaker tier, it then ignores 'stair' because 'stairs' was written, which isn't what we were going for. However, it correctly does not look within the [: target] and correctly states that 'fairy' was said 0 times. As an added point, I've also found that when I run the above command on transcripts, it sometimes gets the counts incorrect. For this command, I get the count:
Cinderella: 1
stair: 0
fairy: 0

So basically, is there any way to tell CLAN to run the analysis on the %mor tier for frequencies of words [specifically, lemmas], but somehow to specify to ignore [: target] words on the speaker tier? 

In an ideal world, from the attached transcript, I'd be getting the frequency counts as:
Cinderella: 1
stair: 1
fairy: 0

Thank you very much,

Brie

--
Brielle Stark, PhD
Post Doctoral Fellow in Communication Sciences and Disorders, University of South Carolina
t: +1 803-777-9240
alternate email: sta...@mailbox.sc.edu
Aphasia Lab: http://web.asph.sc.edu/aphasia/
Center for the Study of Aphasia Recovery: http://web.asph.sc.edu/cstar/
Content_Tester.cha

Nan Bernstein Ratner

unread,
Sep 13, 2017, 2:55:58 PM9/13/17
to chib...@googlegroups.com
A partial response, as I rush off to a meeting:

I always encourage mispronunciations (furry for fairy) to be coded as the actual target. If it's just a pronunciation problem and that is not the focus of our current work, I might make a %com tier comment so as to find it later if we do post-hoc phonological analysis, but typing wabbit for rabbit will screw with the MOR program, and typing furry for fairy will screw up lexical analyses. 

N


Nan Bernstein Ratner, F-, H-ASHA, F-AAAS, ABCLD
Professor
Hearing and Speech Sciences
University of Maryland
0100 Lefrak Hall
College Park, MD 20742


ADVANCE Professor, College of Behavioral and Social Sciences
Director, University of Maryland Autism Research Consortium (UMARC), www.autism.umd.edu
Faculty, Language Science (languagescience.umd..edu), Neuroscience & Cognitive Neuroscience (NACS, nacs.umd.edu), Developmental Science Field Committee



--
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe@googlegroups.com.
To post to this group, send email to chib...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/CAEs2yToSuaOv1de5DWc4CS3h6HR7YEdgUYm0SQ1oxBDC1%2BRcFg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Brielle Stark

unread,
Sep 13, 2017, 3:00:19 PM9/13/17
to chib...@googlegroups.com
This is why we've always had -sm** and -sm@* which tells the program to not evaluate things that are within brackets, like [: targets]. We've transcribed orthographically, with an error code directly behind, such that: furry [: fairy] [* phon] or similar. Therefore the lexical analysis of mor doesn't evaluate 'furry'...

Thanks,

Brie


For more options, visit https://groups.google.com/d/optout.

Leonid Spektor

unread,
Sep 13, 2017, 4:51:39 PM9/13/17
to chib...@googlegroups.com
Brie,

The answer depends on whether you are interested in words on speaker tier or lemmas on %mor tier. Your command lines ask for both. I have changed you sample file "Content_Tester.cha" by adding word "fairy" that is not an error or a replacement, so you should get 1 "fairy" count in the output of the following two command lines:

For "fairy" lemmas, except errors and target replacement, you want:

freq -sm** -sm@* +sm;fairy Content_Tester.cha

For "fairy" words on speaker tier, except errors and target replacement, you want:

freq -s<**> -s<:*> +sfairy Content_Tester.cha


Leonid.

Content_Tester.cha

Brielle Stark

unread,
Sep 14, 2017, 7:28:45 AM9/14/17
to chib...@googlegroups.com
Yes, so I suppose what I am asking is whether there is something that can tell the mor command to be computed ignoring the [: target] errors? This is the only way I can think of solving my question.
Thanks,
Brie

--
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe@googlegroups.com.
To post to this group, send email to chib...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe@googlegroups.com.
To post to this group, send email to chib...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/CAEs2yToSuaOv1de5DWc4CS3h6HR7YEdgUYm0SQ1oxBDC1%2BRcFg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe@googlegroups.com.
To post to this group, send email to chib...@googlegroups.com.

Leonid Spektor

unread,
Sep 14, 2017, 9:08:41 AM9/14/17
to chib...@googlegroups.com
Besides the command lines that I gave you in my previous email, the other solution is to use "furry [:: fairy]" coding. Notice two ':' characters instead of one. This code will tell MOR command to put the word "furry" on %mor tier instead of the word "fairy". This solution only works if the actual word spoken by a subject is a real word. Otherwise, you will get "?|..." on %mor tier meaning that the word is not recognized. Also, this solution will not ignore the error word as you seem to want to do.

The question is whether you want to simply ignore the [: target] error word, like word "fairy", or do you want to ignore any erroneously spoken words altogether, like the whole "furry [:: fairy]" structure.

If you want to ignore the whole erroneously spoken word, then you need to use one the command lines below depending on whether you want words from speaker tier or lemmas from %mor tier:

freq -sm** -sm@* +sm;fairy +sm;Cinderella +sm;stair Content_Tester.cha

freq -s"<**>" -s"<: *>" +sfairy +sCinderella +sstairs Content_Tester.cha


Leonid.

To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+u...@googlegroups.com.

To post to this group, send email to chib...@googlegroups.com.

Brielle Stark

unread,
Sep 14, 2017, 9:18:01 AM9/14/17
to chib...@googlegroups.com
Fantastic. I think the freq -s"<**>" -s"<: *>" +sfairy +sCinderella +sstairs Content_Tester.cha is what I want!

Brie



Leonid.

--
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe@googlegroups.com.
To post to this group, send email to chib...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Reply all
Reply to author
Forward
0 new messages