Frequency of words

Brielle Stark

unread,

Sep 13, 2017, 1:49:37 PM9/13/17

to chib...@googlegroups.com

Hello all.

I have a question about calculating word frequency. We're working with aphasia participants who will often make mistakes, and when they do make mistakes, we'll put in the intended word into [: target] if we know what the intention was. However, I do not want to count [: target] words in the frequency tally of words. Basically, if someone said furry [: fairy] in one instance, and I am looking for a frequency count of the correctly spoken 'fairy,' I want the frequency calculation for 'fairy' to be 0, thus ignoring the word in the target. Further, I'd also like to run for lemmas and not morphological changes. In other words, if I'm looking for "stair," I want 'stairs' to be counted in the frequency of 'stair' usage.

Detail:

When I run the command:

freq -sm** -sm@* +sCinderella +sstair +sfairy

on the attached transcript [completely made up, by the way], it evaluates the %mor line but doesn't ignore the target [: target] words like I thought it would. It does do the correct job in tagging 'stair' even though the participant said 'stairs,' a correct usage from the %mor line. Output of frequency for this command was:

Cinderella: 1

stair: 1

fairy: 1

However, as I said, I wouldn't want the incorrect furry [: fairy] to count. So, I tried:

freq -sm** -sm@* +t*PAR +sCinderella +sstair +sfairy

Now that I've told CLAN to stick to the speaker tier, it then ignores 'stair' because 'stairs' was written, which isn't what we were going for. However, it correctly does not look within the [: target] and correctly states that 'fairy' was said 0 times. As an added point, I've also found that when I run the above command on transcripts, it sometimes gets the counts incorrect. For this command, I get the count:

Cinderella: 1

stair: 0

fairy: 0

So basically, is there any way to tell CLAN to run the analysis on the %mor tier for frequencies of words [specifically, lemmas], but somehow to specify to ignore [: target] words on the speaker tier?

In an ideal world, from the attached transcript, I'd be getting the frequency counts as:

Cinderella: 1

stair: 1

fairy: 0

Thank you very much,

Brie

--

Brielle Stark, PhD

Post Doctoral Fellow in Communication Sciences and Disorders, University of South Carolina
t: +1 803-777-9240, alternate email: sta...@mailbox.sc.edu

Aphasia Lab: http://web.asph.sc.edu/aphasia/
Center for the Study of Aphasia Recovery: http://web.asph.sc.edu/cstar/

Content_Tester.cha

Nan Bernstein Ratner

unread,

Sep 13, 2017, 2:55:58 PM9/13/17

to chib...@googlegroups.com

A partial response, as I rush off to a meeting:

I always encourage mispronunciations (furry for fairy) to be coded as the actual target. If it's just a pronunciation problem and that is not the focus of our current work, I might make a %com tier comment so as to find it later if we do post-hoc phonological analysis, but typing wabbit for rabbit will screw with the MOR program, and typing furry for fairy will screw up lexical analyses.

N

Nan Bernstein Ratner, F-, H-ASHA, F-AAAS, ABCLD

Professor

Hearing and Speech Sciences

University of Maryland

0100 Lefrak Hall

College Park, MD 20742

nra...@umd.edu, 301-405-4217

Co-director: FluencyBank (www.fluency.talkbank.org); http://languagefluency.umd.edu/

ADVANCE Professor, College of Behavioral and Social Sciences

Director, University of Maryland Autism Research Consortium (UMARC), www.autism.umd.edu

Faculty, Language Science (languagescience.umd..edu), Neuroscience & Cognitive Neuroscience (NACS, nacs.umd.edu), Developmental Science Field Committee

http://hesp.umd.edu/facultyprofile/bernstein%20ratner/nan

My PubMed Bibliography

--
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe@googlegroups.com.
To post to this group, send email to chib...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/CAEs2yToSuaOv1de5DWc4CS3h6HR7YEdgUYm0SQ1oxBDC1%2BRcFg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Brielle Stark

unread,

Sep 13, 2017, 3:00:19 PM9/13/17

to chib...@googlegroups.com

This is why we've always had -sm** and -sm@* which tells the program to not evaluate things that are within brackets, like [: targets]. We've transcribed orthographically, with an error code directly behind, such that: furry [: fairy] [* phon] or similar. Therefore the lexical analysis of mor doesn't evaluate 'furry'...

Thanks,

Brie

To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/CAAFocx5FkRzWeyHrJKKRWibeeP4itF4Z9zV-4%3DDpOxCUVotYLQ%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

Leonid Spektor

unread,

Sep 13, 2017, 4:51:39 PM9/13/17

to chib...@googlegroups.com

Brie,

The answer depends on whether you are interested in words on speaker tier or lemmas on %mor tier. Your command lines ask for both. I have changed you sample file "Content_Tester.cha" by adding word "fairy" that is not an error or a replacement, so you should get 1 "fairy" count in the output of the following two command lines:

For "fairy" lemmas, except errors and target replacement, you want:

freq -sm** -sm@* +sm;fairy Content_Tester.cha

For "fairy" words on speaker tier, except errors and target replacement, you want:

freq -s<**> -s<:*> +sfairy Content_Tester.cha

Leonid.

Content_Tester.cha

Brielle Stark

unread,

Sep 14, 2017, 7:28:45 AM9/14/17

to chib...@googlegroups.com

Yes, so I suppose what I am asking is whether there is something that can tell the mor command to be computed ignoring the [: target] errors? This is the only way I can think of solving my question.

Thanks,

Brie

--
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe@googlegroups.com.
To post to this group, send email to chib...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/ADD234CB-2730-4C0E-929D-7A8BABA3DFD4%40andrew.cmu.edu.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe@googlegroups.com.
To post to this group, send email to chib...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/CAEs2yToSuaOv1de5DWc4CS3h6HR7YEdgUYm0SQ1oxBDC1%2BRcFg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe@googlegroups.com.
To post to this group, send email to chib...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/ADD234CB-2730-4C0E-929D-7A8BABA3DFD4%40andrew.cmu.edu.

Leonid Spektor

unread,

Sep 14, 2017, 9:08:41 AM9/14/17

to chib...@googlegroups.com

Besides the command lines that I gave you in my previous email, the other solution is to use "furry [:: fairy]" coding. Notice two ':' characters instead of one. This code will tell MOR command to put the word "furry" on %mor tier instead of the word "fairy". This solution only works if the actual word spoken by a subject is a real word. Otherwise, you will get "?|..." on %mor tier meaning that the word is not recognized. Also, this solution will not ignore the error word as you seem to want to do.

The question is whether you want to simply ignore the [: target] error word, like word "fairy", or do you want to ignore any erroneously spoken words altogether, like the whole "furry [:: fairy]" structure.

If you want to ignore the whole erroneously spoken word, then you need to use one the command lines below depending on whether you want words from speaker tier or lemmas from %mor tier:

freq -sm** -sm@* +sm;fairy +sm;Cinderella +sm;stair Content_Tester.cha

freq -s"<**>" -s"<: *>" +sfairy +sCinderella +sstairs Content_Tester.cha

Leonid.

To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+u...@googlegroups.com.

To post to this group, send email to chib...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/CAEs2yTotEPoM08YkA27DFP5aP%2BZ9h75CpJrZ3Cy0c%3DH_ax%3Dc3Q%40mail.gmail.com.

Brielle Stark

unread,

Sep 14, 2017, 9:18:01 AM9/14/17

to chib...@googlegroups.com

Fantastic. I think the freq -s"<**>" -s"<: *>" +sfairy +sCinderella +sstairs Content_Tester.cha is what I want!

Brie

Leonid.

To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/CAEs2yTotEPoM08YkA27DFP5aP%2BZ9h75CpJrZ3Cy0c%3DH_ax%3Dc3Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe@googlegroups.com.
To post to this group, send email to chib...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/F0BB1B4B-827F-4F99-A3A4-CADE2CE807DF%40andrew.cmu.edu.