Comparing documents

Guilherme Silveira

unread,

Mar 21, 2012, 10:06:05 AM3/21/12

to s-spac...@googlegroups.com

Hello,

I was able to compare documents using LSA by invoking space.getDocumentVector(...) and using the cosineSimilarity.

I am sorry if my question is too basic but is it possible to use any of the other algorithms to generate document vectors too? If you can point me on what to read/do, I will go ahead.

We are using S-Space to find similar answers between students so we can find students that are in need of help and the results so far are really good.

Regards

Guilherme Silveira

David Jurgens

unread,

Mar 21, 2012, 3:10:50 PM3/21/12

to s-spac...@googlegroups.com

Hi Guilherme,

Currently, most of the algorithms we have are based around words, rather than documents. However, you could also try the VectorSpaceModel, which is very similar to LSA but doesn't perform the singular value decomposition on the documents.

Another alternative is to use any of the algorithms to build a SemanticSpace object and then use the DocumentVectorBuilder class to construct vectors for new documents. This class will sum the vectors of the words in the new document. By summing the vectors, the DocumentVectorBuilder is using a weak form of composition: documents that contain the same kinds of terms end up having very similar vectors. However, this approach is very simple and ignores lots of important things like word order ("john killed the dog" versus "the dog killed john") and negation ("she is pretty" versus "she is not pretty").

My guess is that LSA will give you better documents, but you still might see reasonable results with DocumentVectorBuilder depending on the content of your students' answers. If we can be of any help, please let us know,

Thanks,

David

wiem lahbib

unread,

Mar 25, 2015, 1:23:08 PM3/25/15

to s-spac...@googlegroups.com

Hi Guilherme,

How did you use the cosineSimilarity??

Thanks

wiem

Reply all

Reply to author

Forward