--
You received this message because you are subscribed to the Google Groups "Plover" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ploversteno...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
This is a word frequency list, based on over 9,379,000 words of contemporary fiction gathered online.
Regular plurals are combined with their singular forms (tree, trees; box, boxes). Variations of a verb ending in -ed, -ing or -(e)s are lumped together with their root verb (smile, smiled, smiling, smiles). Adjective forms ending in -er or -est are included with their positive form (sad, sadder, saddest). And words ending in -'s are grouped with the form without the apostrophe (boy, boy's; everything, everything's), except for a few common contractions (it's; that's).
Hey,I've been doing online transcription for about 6 months (qwerty) and I've collected most of my transcripts into a single .txt file so that I could analyze the word frequency in order to figure out the words I should prioritize as text abbreviations. It's a better way of doing it than just randomly learning/abbreviating ten random words a day. Last count I was up to 1.8 million words and about 13.3k minutes of audio.Here's a list of ~7500 words sorted by freq. American English, includes proper names, all lowercase. Words occurring two or less times excluded.https://docs.google.com/spreadsheets/d/1se1aZmRl_b7FjZlFdc3UgeL4kNFPUTWVE5sIUQ3bDpc/edit?usp=sharingI used a free program called kfNgram to analyze the text. Transcripts are typically 2 person interview, single speaker presentations, a handful of sermons. It's a decent word frequency list based on how people actually talk as opposed to, say, a list compiled from literature. I spell most numbers out.