Find and replace question to fix filled pause marking

Nicole Tracy-Ventura

unread,

May 25, 2021, 11:15:19 AM5/25/21

to chibolts

Hello all, I'm looking for a recommendation to save us some time. We have a number of transcripts where the filled pauses were marked with & only (e.g., &um instead of &-um). We want to use FLUCALC so these will be counted automatically. I thought a simple find and replace would be easy to do but the files already have MOR lines and there are plenty of & there. Also, there are some places where &=laughs is used. One last complication is that several different filled pauses are used (e.g., um, eh, uh, ehm, etc.). Is there a simple way of fixing this? Many thanks in advance for any ideas.

Nicole

Brian Macwhinney

unread,

May 25, 2021, 11:22:39 AM5/25/21

to ChiBolts, Nicole Tracy-Ventura

Dear Nicole,
My approach to working with transcripts that have run through MOR is to rely on the fact that one can simply re-run MOR after making changes to the main line. So, In your case, you would fix all of the main line pauses and then just re-run MOR. This works great for all of the languages that use MOR, except perhaps for Japanese which seems to require hand attention when running MOR.

— Brian MacWhinney

> On May 25, 2021, at 11:15 AM, Nicole Tracy-Ventura <nicole.tra...@gmail.com> wrote:
>
> Hello all, I'm looking for a recommendation to save us some time. We have a number of transcripts where the filled pauses were marked with & only (e.g., &um instead of &-um). We want to use FLUCALC so these will be counted automatically. I thought a simple find and replace would be easy to do but the files already have MOR lines and there are plenty of & there. Also, there are some places where &=laughs is used. One last complication is that several different filled pauses are used (e.g., um, eh, uh, ehm, etc.). Is there a simple way of fixing this? Many thanks in advance for any ideas.
>
> Nicole
>

> --
> You received this message because you are subscribed to the Google Groups "chibolts" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+u...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/CA%2B3CKJ5wfY%2BiXpvnVCG3%2BTwQk3w3qqBvEHJ3i6BnGpq09iwycQ%40mail.gmail.com.

Leonid Spektor

unread,

May 25, 2021, 11:55:59 AM5/25/21

to chib...@googlegroups.com

Hi,

I would recommend to use CHSTRING command. You can put all the &... filled pauses that need to be changed into a file and then run CHSTRING command on the data files to make the changes. This assumes that you have a limited and known list of all filled pauses that you want to change. Here is an example of CHSTRING changes file "chstring.cut":

"&um" "&-um"

"&eh" "&-eh"

"&ehm" "&-ehm"

"&uh" "&-uh"

and the command line would be:

chstring +cchstring.cut *.cha

You can run CHSTRING on a few files first to see if it does what you want correctly and then you can run it with +1 option to automatically replace all original files with new changed ones.

Leonid.

On May 25, 2021, at 11:15, Nicole Tracy-Ventura <nicole.tra...@gmail.com> wrote:

Hello all, I'm looking for a recommendation to save us some time. We have a number of transcripts where the filled pauses were marked with & only (e.g., &um instead of &-um). We want to use FLUCALC so these will be counted automatically. I thought a simple find and replace would be easy to do but the files already have MOR lines and there are plenty of & there. Also, there are some places where &=laughs is used. One last complication is that several different filled pauses are used (e.g., um, eh, uh, ehm, etc.). Is there a simple way of fixing this? Many thanks in advance for any ideas.

Nicole

Reply all

Reply to author

Forward