b = Classisifer::Bayes.new
lsi = Classifier::LSI.new
LSI is Latent Semantic Indexer, which can search, classify and cluster
data based on underlying semantic relations. It uses more resources
than the Bayesian classifier and even requires an external library, but
can still be Marshalled for Madeline or DRB's sake. For more
information on the algorithms used, please consult
http://en.wikipedia.org/wiki/Latent_Semantic_Indexing
I also added an #untrain method to reverse the effects of training the
Bayesian classifier. LSI can also untrain itself. To upgrade, try:
gem update classifier
Or see this site:
http://rubyforge.org/projects/classifier/
Again, all feedback is appreciated.
-Lucas Carlson
http://tech.rufy.com/
George.
This is kind of off topic, but does anyone know if there is an
implementation of principle component analysis (PCA) easily usable from
ruby? Bayesian is pretty powerful but you can do some pretty rediculous
things with PCA. Essentially it's a method of compression on random
data, but it does this by finding correspondences in each matrix row.
Anyhow it's as useful as Bayesian methods for finding correspondances.
It might even be more useful given that you can then easily generate
data points that would occur near your input points. Though I suppose
that usefulness depends on what your using it for. I thought about
implementing it back when I helped out with this project [1]. But given
time constraints it made more sense to just manually use matlab to do
it. It would be awesome to play with in ruby though.
Charles Comstock
[1] http://www.cs.wustl.edu/~jdt1/vision/final/
[2] http://www.imm.dtu.dk/~aam/
-Lucas Carlson
In Classifier::LSI, I just do SVD on a term-document matrix to reduce
its rank, then break apart the columns and do inner-products on the
resultant vectors. I've worked with it quite a bit now and I've
experienced some really amazing results (you can see in the unit tests,
it's pretty smart, it isn't easily fooled by lots of text matches).