the "next-best" algorithm

46 views

Skip to first unread message

James Tauber

unread,

Apr 1, 2008, 8:57:09 AM4/1/08

to graded...@googlegroups.com

In the last few posts, I've mentioned a simple algorithm I've used
(one of a number) for ordering items.

The Input

This algorithm, like all the ordering algorithms I've tried takes as
an input, a list of target-item pairs. For example,

T1 I1
T1 I3
T1 I7
T2 I2
T2 I7
T3 I4
...

means that to read T1, you need to know I1, I3, I7; to read T2, you
need to know I2, I7 and so on.

The targets and items can be anything. For the various stats I've
posted here I've used verses for the targets and either lemmas or
inflected forms for the items. In the sample reader online, I use
clauses as the targets and a combination of lemmas, inflected forms
and a little bit of morphology (not much yet). If you want to model
the fact that students can't read a target until they've learnt some
syntactic point or even some cultural point, that can be modeled by
including an appropriate item for this.

I make this point to emphasize that the ordering algorithm is
independent of what we chose as targets and what items we include as
prerequisites to being able to comprehend those targets.

The Output

What this (and my other algorithms) output is what I sometimes in
comments and elsewhere refer to as a "learning programme". (yes, I
tend to use that spelling when referring to any ordered list to be
followed that isn't a computer program)

Such a programme looks like this:

learn I2
learn I5
learn I7
know T2
learn I1
learn I3
know T1

Note that this algorithm will sometimes (as it does in the example
above) prematurely mention an item that could be delayed (in this case
I5) so the optimize-order code I've mentioned previously and uploaded
to Google Code is useful as a post-processing step.

The Algorithm Itself

The algorithm is very simple and follows an iterative process. At each
step, each item not yet learnt is assigned a score. The item with the
highest score is then learnt and the process repeats (with the scores
being recalculated each time on the remaining items).

The score favours items that are the only remaining unlearnt item (or
one of only a few remaining unlearnt items) in a lot of different
targets.

At each step, each unlearnt item receives, for each target the item is
a prerequisite for, an additional score of 1 /
2^num_unlearnt_items_in_target.

In other words, for each target the item is the only unlearnt item in,
the score goes up by 1/2, for each target the item is one of two
unlearnt items in, the score goes up by 1/4, for each target the item
is one of three unlearnt items in, the score goes up by 1/8 and so on.

I haven't done much experimentation to see if this exponential decay
is optimal but it seems to give good results.

Because this algorithm is iterative and picks a single item at each
step rather than exploring multiple ordering possibilities, I'm
tentatively calling this algorithm the "next-best" algorithm.

I've checked in the code as http://code.google.com/p/graded-reader/source/browse/trunk/code/next-best.py

It is important to note that this algorithm currently considers all
items equally easy (or difficult!) to learn and assumes they are
independent. However, it would be relatively easy to augment the
algorithm with difficulty weightings and I plan to do that soon.

Another feature that I'm considering is being able to "pin down"
certain items as not being available until a particular point. You
may, for example, want to delay the introduction of participles but
otherwise have the algorithm come up with its own ordering.