Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

Collection of spoken English words

10 views
Skip to first unread message

Robert Meier

unread,
Nov 23, 2024, 6:25:08 PM11/23/24
to 'Darrell Lee' via Upstate Carolina Linux Users Group
Does anyone have or know of a collection of a thousand, few hundred, or so spoken English word samples?

Ideally, I'm looking for a tarball, zip, directory, or other bundle of a thousand, few hundred, or so files.  Each file of which contains a single spoken English/American word in mp3/ogg/wav/...

Does anyone know a good source?

Is there an easy way to construct such from a recording? Using what software?
From text, using text-to-speech software? 
Which software?

mp4/mkv files would also work as I can extract audio using ffmpeg.

Any help appreciated.

Thank you.

Glen Peterson

unread,
Nov 24, 2024, 9:02:03 AM11/24/24
to uc...@googlegroups.com
You could search "how to pronounce" with a few English words to find sites you might be able to scrape for this.  Sorry, I don't know of any collections.

--
You received this message because you are subscribed to the Google Groups "Upstate Carolina Linux Users Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to uclug+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/uclug/CADuw60sHc10LxzX7FFkYCXbHcK4daNK40ST79BNJaSgLQSmahw%40mail.gmail.com.

Dr. Robert Meier

unread,
Nov 24, 2024, 2:49:43 PM11/24/24
to uc...@googlegroups.com

That is a good idea.  I've manually scraped 20 words in 40 minutes so far, from "google: britannica how to pronounce <word>" and "https://media.merriam-webster.com/audio/prons/en/us/mp3/<w>/<word><ext>.mp3'  I'm still looking for a larger collection, or automation method, but this is a start.

Mihai Kulcsar

unread,
Nov 24, 2024, 4:59:39 PM11/24/24
to uc...@googlegroups.com
Not sure of the purpose of this but you could have chatgpt generate a
list of words that are similar in length for pronunciation and then
plug them into a text to speech online and download as one mp3 file.
Then chop it up into shorter mp3s since the length of each work is
predictable and the spacing for the works is the same, you could
estimate a second or two per word or something. That might be a bit
faster than scraping a word every 2 minutes.
> To view this discussion visit https://groups.google.com/d/msgid/uclug/103d92a8-e884-41d0-928f-d4c4a98ce9ee%40gmail.com.
Reply all
Reply to author
Forward
0 new messages