FAVE vs manual alignment time

Sophie Holmes-Elliott

unread,

Oct 7, 2014, 5:27:35 AM10/7/14

to fave-...@googlegroups.com

Hi FAVE-group

I am currently trying to get out department to buy us a high spec mac that we can use for FAVE in house (we already have it up and running on various laptops) so that we can train the undergrads and postgrads on it and it will be available for them to use for dissertations/projects etc.

One of the points I want to stress is how much time it saves so I was thinking about how long it would take to manually do the equivalent alignment of say a one hour sociolinguistic interview, my guess would be months... just wondered if anyone else had thought about this and had a rough idea of the FAVE to manual alignment time save?

Thanks!

Sophie

Daniel Ezra Johnson

unread,

Oct 7, 2014, 6:22:44 AM10/7/14

to fave-...@googlegroups.com

Yeah probably a while. It's maybe hard to say because in manual segmentation/alignment, I don't think anyone ever aligned the stressed vowel of nearly every word, whereas FAVE not only aligns every vowel but also every consonant, which is essentially overkill with respect to any actual study. But still, even if you compared apples to apples, it must be at least a thousand times faster...

--
You received this message because you are subscribed to the Google Groups "FAVE (Force Alignment and Vowel Extraction) Users Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fave-users+...@googlegroups.com.
To post to this group, send email to fave-...@googlegroups.com.
Visit this group at http://groups.google.com/group/fave-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/fave-users/cfde8d91-a9d4-47cf-8107-599dcfc51d14%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Josef Fruehwald

unread,

Oct 7, 2014, 7:37:15 AM10/7/14

to fave-...@googlegroups.com

I would contact Mark Liberman about this, because he's probably got the most experience both with managing an alignment process, and with costing it.

-Joe

To view this discussion on the web visit https://groups.google.com/d/msgid/fave-users/050D65D2-3D38-485B-9AEA-B95C60707CB8%40gmail.com.

Keelan Evanini

unread,

Oct 7, 2014, 9:14:17 AM10/7/14

to fave-...@googlegroups.com

The following article (on which Mark Liberman is a coauthor) cites the following two rates for manual phonetic transcription in the Introduction section: 400 times real time and 30 seconds per phone.

http://www.speech.sri.com/papers/interspeech2013-phone-segmentation.pdf

However, I looked through the two cited articles and wasn't able to figure out exactly how these rates were obtained, so you may want to check the citations yourself (or ask Mark).

Leung & Zue (1984) also provide an estimate of about 30 seconds per phone. They indicate that it took two experienced transcribers about 15 hours to phonetically transcribe 65 sentences, corresponding to approximately 1950 phones (= 27.7 seconds / phone).

Leung, Hong C. and Victor W. Zue. 1984 . A procedure for automatic alignment of phonetic transcriptions with continuous speech. Proceedings of the IEEE International Converence on Acoustics, Speech, and Signal Processing (ICASSP 1984), pp. 73-76.

--Keelan

To view this discussion on the web visit https://groups.google.com/d/msgid/fave-users/CAN00tu_huevxXMsEiPRQxzzh05YRD0uGS9ChMUWgsRqMeQn-3A%40mail.gmail.com.

Kyle Gorman

unread,

Oct 7, 2014, 6:54:21 PM10/7/14

to fave-...@googlegroups.com

> I am currently trying to get out department to buy us a high spec mac that we can use for FAVE in house (we already have it up and running on various laptops) so that we can train the undergrads and postgrads on it and it will be available for them to use for dissertations/projects etc.

The way FAVE is built, it’s somewhat hard to get significantly better performance from a higher-spec machine. FAVE runs fastest on a computer with a very fast processor and a very fast hard disk—RAM doesn’t seem to matter much, assuming you have at least a few GB—but nowadays, even the cheapest Mac laptops have "very fast” hard drives (i.e., SSDs) and you can upgrade the Mac Mini drive to an SSD for US$ 200. And even the priciest processors, like those in the baseline Mac Pro (US$ 2,999) is only going to be about 1.5x than a baseline Mac Mini (which costs 5 times less). Both FAVE-align and FAVE-extract are written to process data in serial, so they don’t take advantage of the Mac Pro’s strengths, unless your lab intends to regularly run more than 4 batches at a time (if that’s the case, you might want to just purchase more low-end Macs instead). That’s not to discourage you from buying hardware to run alignment and extraction, of course.

> Leung & Zue (1984) also provide an estimate of about 30 seconds per phone. They indicate that it took two experienced transcribers about 15 hours to phonetically transcribe 65 sentences, corresponding to approximately 1950 phones (= 27.7 seconds / phone).
>
> Leung, Hong C. and Victor W. Zue. 1984 . A procedure for automatic alignment of phonetic transcriptions with continuous speech. Proceedings of the IEEE International Converence on Acoustics, Speech, and Signal Processing (ICASSP 1984), pp. 73-76.

Here’s how J.P. Hosom (Speaker-independent phoneme alignment using transition-dependent states. Speech Communication 51: 352-368, 2009) summarized that study (I can’t find the original paper, either; does anyone on the list have a PDF of it?):

Leung and Zue (1984) evaluated five American English sentences from the Harvard list of phonetically-balanced sentences, aligned by two people. Manual alignment required about 30 s per phoneme, and they reported approximately 80% agreement within 10 ms, 87% agreement within 15 ms, and 93% agreement within 20ms.

Under perfect conditions (high quality recordings, native adults, little intraspeaker overlap, etc.), it takes maybe 10 minutes to do word-level (“orthographic”) transcription of one minute of speech. By my back-of-the-envelope calculation (making educated guesses about phonemes per second, etc.), orthographic transcription would thus be at least 10x faster than phoneme alignment.

Kyle

Reply all

Reply to author

Forward