RE: number of utterances

Nan Bernstein Ratner

unread,

Jul 19, 2012, 5:42:22 PM7/19/12

to chib...@googlegroups.com

If it's finding the number of utterances, as opposed to determining WHAT is an utterance during transcription, try MLT? ; it prints the number of utterances and you can specify speaker tiers.
Is that what you wanted?
N

Nan Bernstein Ratner, Professor and Chairman
Department of Hearing and Speech Sciences
0100 Lefrak Hall
University of Maryland, College Park
College Park, MD 20742
301-405-4213
http://www.bsos.umd.edu/hesp/facultyStaff/ratnern.htm

-----Original Message-----
From: chib...@googlegroups.com [mailto:chib...@googlegroups.com] On Behalf Of Misha Becker
Sent: Thursday, July 19, 2012 4:13 PM
To: chib...@googlegroups.com
Subject: number of utterances

I'm wondering how to calculate the number of utterances a given speaker produces in each file (I will be searching for *MOT and *FAT). I have a note from many years ago that the way to do this is with the following command:

freq +y +s\** [filename]

But this doesn't actually do what I want. It seems to give the number of words produced by each speaker in a file. How do I find out the number of *utterances*? I've looked through the latest version of the Clan manual but haven't found the answer.

Many thanks,
Misha

--
You received this message because you are subscribed to the Google Groups "chibolts" group.
To view this discussion on the web visit https://groups.google.com/d/msg/chibolts/-/Fxz3IFqq9XoJ.
To post to this group, send email to chib...@googlegroups.com.
To unsubscribe from this group, send email to chibolts+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/chibolts?hl=en.

Leonid Spektor

unread,

Jul 19, 2012, 7:49:45 PM7/19/12

to chib...@googlegroups.com

Misha,

The command 'freq +y +s"\**:" [filename]' give you a breakdown of how many of each particular speaker's utterances there are in a file and the total number of utterances in the file can be found next to label "Total number of items (tokens)". The +s"\**:" option instructs freq to look for speaker tier names only. This utterances count is dependent on three conditions:

1. there aren't any words inside any utterance that start with '*' character and end with ':' character.
2. no utterance has been interrupted and then continued by the same speaker as indicated by "+," and "+." codes.
3. every speaker tier has only one utterance, if your data file passes CLAN's CHECK, then this condition is true.

If you have some utterances that have been interrupted and then continued or if you have more than one utterance per speaker tier, then as Nan suggested in her reply, you should use MLT. Or for more strict count according to Brown use MLU.

Leonid.

Becker, Misha K

unread,

Jul 20, 2012, 2:54:04 PM7/20/12

to chib...@googlegroups.com

Thanks everyone! I think I was thrown off by the word 'tokens' and thought it referred to words instead of utterances. Both the freq +y +s"\**:" and the mlt +t*mot commands give me what I want.

Thank you again,
Misha
________________________________________
From: chib...@googlegroups.com [chib...@googlegroups.com] on behalf of Leonid Spektor [spe...@andrew.cmu.edu]
Sent: Thursday, July 19, 2012 7:49 PM
To: chib...@googlegroups.com
Subject: Re: number of utterances

Rui Huang

unread,

Jul 15, 2014, 5:14:06 PM7/15/14

to chib...@googlegroups.com

Hi Leonid,

I am counting total utterances in Peter01 (from Bloom70), and find there are 3 utterances in one utterance:

*PAT: you mustn't touch it (.) you just look at it (.) okay ?

When I look at XML file, these three sentences just have one utterance ID: <u who="PAT" uID="u56">

"you mustn't touch it ", "you just look at it" are complete sentences to my knowledge. They may be two utterances. Why do you put them together?

Another thing is, what's the difference between "," and "(.)"?

e.g:

*CHI: no, Mommy no go.

*CHI: no (.) Mommy go.

(I got the example from CHAT manual, page 57.)

Thank you!

Rui

Brian MacWhinney

unread,

Jul 15, 2014, 5:30:16 PM7/15/14

to ChiBolts, Rui Huang

Dear Rui,

Back in the 1970s, people thought that it was wrong to separate out sentences. Instead, they focused on units such as turns and utterances. Roger Brown did not take this approach, but Bloom and some others did. It makes little sense to relive the debates of the 1970s on this. Instead we just need to fix the corpora

The corpora with the most extreme problems are not Peter, but Kuczaj and Belfast. Also, some of the corpora in the MPI collections that are not (yet) available through CHILDES.

I try to fix the most obvious cases and this is certainly one. However, I would only break this into two utterances and leave the okay as a final communicator on the second utterance.

The difference between comma and pause is that the comma only indicates an intonational contour, whereas the pause indicates a pause.

Best regards,

— Brian MacWhinney

To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+u...@googlegroups.com.

To post to this group, send email to chib...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/5d800d4a-cee9-4e84-a0f3-c06f2afc3a02%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward