I've been getting a lot of queries in response to my presentation so I
thought I'd start a mailing list so we can all discuss questions and
I also plan to make available the code that I'm using to produce the
graded reader. Because it's closely tied to the particular text and
linguistic data I'm currently dealing with, it will take some time to
make generic but I plan to release stuff incrementally based on your
I want to spend some time going through my current approach and
explaining the different components and the ideas behind them. For the
most part, these ideas can be used independently of one another so if
you don't like one aspect of what I've done, you can still make use of
other aspects. Also I'm still improving things in lots of different
ways and, of course, I look forward to a lot of new ideas coming from
Because the video presentation actually doesn't show much in terms of
results, I've uploaded two files that will give you a flavour of the
current state of my work.
You can get to these files at http://groups.google.com/group/graded-reader
example-reader.html shows the first 50 word forms output by the
current version of my software when run on the Greek text of John's
greek_2.pdf shows lesson 2 of an informal course I'm running for a
couple of friends which uses the graded reader approach.
You'll notice (1) there is a lot of extra information in the lesson
given to students; (2) the order in which words are presented is
There are three reasons for the difference in order:
(1) the ordering in lesson 2 was hand tweaked from what the software
(2) the lesson 2 ordering was produced by an earlier version of the
ordering algorithm that what was used for example-reader.html
(3) example-reader.html used slightly more linguistic information (in
particular, it knew about some verb endings) in the generation of
Note that the goal is to eventually not do any tweaking, but rather to
capture in both the software and input data the criteria that
motivated the manual reordering in the first place.
I'll send separate posts discussing different aspects of what goes in
to producing the automated output.