1. The "beam search" they describe: is that just a way of determining
maximum liklihood? Where could I find more information on that? Note
I haven't searched for it yet, but sometimes people can point to a
more understandable/level appropriate page than can a search engine.
2. They talk of using a beam size of N = 5. What is the beam size?
Is that the number of tag candidates? So, when you define a feature
on w_i = "about" and t_i = IN then you evaluate that feature on t_i
across the top 5 tags?
3. Are there additional optimizations that can be done on it such as
some sort of pruning? They describe it as being slow, so one would
expect various optimizations to have been developed.
4. Is there a reference implementation of a POS tagger in mahout or in
some other library? Especially one similar to the one in the paper -
code might help my understanding :)
Sorry about the newbie questions!
Tanton
I recently tried OpenNLP for noun phrase identification, which uses
this algorithm. BeamSearch class can be found in opennlp.tools.util.
OpenNLP using beam size of 10. I changed it to 5. That didn't change
the output but the performance improved quite a bit. From run time of
50 minutes, it came down to 30 minutes with reduced beam size. (This
was for text of 10,000 word documents.) No change in output could be
attributed the kind of input data. May be I should try it with even
small width.
BTW, I must confess that I have read the paper but have very little
understanding of it apart from the general sketch.
--shashi
I believe OpenNLP uses MaxEnt for POS (and other things)
>
> My Questions:
> - What is it about this model that allows it to be classified as a
> Maximum Entropy Model?
> - I got a bit lost in formula (2) where they explain how this can be
> expressed in Maximum Entropy formalization. I am not sure what the
> advantage of expressing this way is and if that expression is used in
> practice or is just theory to tie the approach back to the Maximum
> Entropy model.
I took it to be the latter.