Counting syllables

3,333 views
Skip to first unread message

Tennessee

unread,
Sep 4, 2009, 1:45:58 AM9/4/09
to nltk-users
Hi all,

I'll introduce myself shortly, but I just wanted to jot these thoughts
down on paper before I forgot :)

I did a quick list-search to see if syllables had ever been discussed,
and it didn't come up.

I'm looking at doing a simple Python implementation of the Flesch-
Kincaid Reading Ease calculation (for the English language). The code
is trivial except for the counting of syllables in a word. I started
with the simplest thing that could possibly work, with the number of
syllables in a word equalling 1 plus len(word) mod 3, which works
better than you might imagine :)

A quick look around sourceforge revealed a simple Java calculator
which uses some logic based on vowel splitting and a few special
cases.

I wondered if anyone here had tackled syllable-counting.
http://en.wikipedia.org/wiki/Syllabification has little to say on the
topic other than to suggest looking up the answer in a
dictionary :) ... I suppose I could compile such a list using various
online dictionaries, but I haven't looked into that very hard.

http://www.dict.org/ appears to provide an interface into some
reasonable open-access dictionaries. I notice that nltk has an
interface to WordNet, one of the databases which also helps underlie
dict.org, but I didn't see the syllable information in that API.

Am I missing something (i.e. does nltk provide syllable information
through an API)? Otherwise, does anyone know whether using something
like dict.org to count syllables makes sense?

Anyway, hello list, and I thought I'd just share those thoughts. I'll
let you know how I get on.

Cheers,
-Tennessee

Daniel

unread,
Sep 4, 2009, 8:01:55 AM9/4/09
to nltk-users
Does this help?

Book containing the algorithm: http://www.springerlink.com/content/c257122558175559/
Paper describing a Haiku generator: http://www.springerlink.com/content/c257122558175559/

On Sep 4, 12:45 am, Tennessee <tleeuwenb...@gmail.com> wrote:
> Hi all,
>
> I'll introduce myself shortly, but I just wanted to jot these thoughts
> down on paper before I forgot :)
>
> I did a quick list-search to see if syllables had ever been discussed,
> and it didn't come up.
>
> I'm looking at doing a simple Python implementation of the Flesch-
> Kincaid Reading Ease calculation (for the English language). The code
> is trivial except for the counting of syllables in a word. I started
> with the simplest thing that could possibly work, with the number of
> syllables in a word equalling 1 plus len(word) mod 3, which works
> better than you might imagine :)
>
> A quick look around sourceforge revealed a simple Java calculator
> which uses some logic based on vowel splitting and a few special
> cases.
>
> I wondered if anyone here had tackled syllable-counting.http://en.wikipedia.org/wiki/Syllabificationhas little to say on the
> topic other than to suggest looking up the answer in a
> dictionary :) ... I suppose I could compile such a list using various
> online dictionaries, but I haven't looked into that very hard.
>
> http://www.dict.org/appears to provide an interface into some

Jordan Boyd-Graber

unread,
Sep 4, 2009, 8:18:45 AM9/4/09
to nltk-...@googlegroups.com
The cmudict should be able to help. Here's a simple function that
returns a list of all the possible syllable lengths of a word (as the
dictionary sometimes has multiple pronunciations). It counts up the
number of pronounced vowels:

import curses
from curses.ascii import isdigit
import nltk
from nltk.corpus import cmudict

d = cmudict.dict()

def nsyl(word):
return [len(list(y for y in x if isdigit(y[-1]))) for x in d[word.lower()]]

Cheers,

Jordan
--
--------------------
Jordan Boyd-Graber
920.JBG.YING (920.524.9464)

AIM: ezubaric
j...@princeton.edu
http://www.cs.princeton.edu/~jbg
--------------------

"In theory, there is no difference between theory and practice. But,
in practice, there is."
- Jan L.A. van de Snepscheut

Steven Bird

unread,
Sep 4, 2009, 8:54:11 AM9/4/09
to nltk-...@googlegroups.com
See also nltk_contrib/readability, which includes an implementation of
Flesch-Kincaid and an English syllable counting algorithm.

http://code.google.com/p/nltk/source/browse/trunk/nltk_contrib#nltk_contrib/nltk_contrib/readability

-Steven Bird

Tennessee Leeuwenburg

unread,
Sep 8, 2009, 2:11:28 AM9/8/09
to nltk-...@googlegroups.com
Thanks all for your responses. Talk about getting things handed to you on a plate! 

Cheers,
-T
--
--------------------------------------------------
Tennessee Leeuwenburg
http://myownhat.blogspot.com/
"Don't believe everything you think"
Reply all
Reply to author
Forward
0 new messages