--
You received this message because you are subscribed to the Google Groups "Semantic Vectors" group.
To post to this group, send email to semanti...@googlegroups.com.
To unsubscribe from this group, send email to semanticvecto...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/semanticvectors?hl=en.
Another question - are you seeing large values in your term vectors,
your document vectors, or both?
In the code at
http://code.google.com/p/semanticvectors/source/browse/trunk/src/pitt/search/semanticvectors/LSA.java,
we explicitly normalize term vectors but take doc vectors straight
from the U matrix of the SVD decomposition.
The docvectors part wouldn't surprise me, because large values can
occur in SVD depending on your choice of representation.
In the decomposition A = U * S * V, U is left singular vectors, V is
right singular vectors, S is singular values. By moving multiplicative
factors around from one matrix to another, you can have quite a lot of
leeway. You can choose parameters so that at least one of U and V are
unitary matrices, possibly both, I'd have to check the maths /
literature to make sure.
I'm not an SVD expert and I haven't actually checked what convention
the library uses. So I can't answer the question "What exactly is
going on?" off the top of my head, but at least I can say that it's
not altogether surprising to find large values in the document
vectors. If you're getting large values in your term vectors, that's
another matter and is definitely weird.
Best wishes,
Dominic