Dear Carmen,
Hopefully it's okay me jumping in here. Anyone can feel free to correct me.
1) Negative values in an SVD are common and meaningful. Imagine the columns in your document vector matrix as (more or less) independent semantic concepts. E.g., the first dimension (column) refers to nature, the second one to motor sport etc. Then, you can interpret a positive value in your vector for the first dimension as "this document rather contains words related with nature" (the values are also referred to as "loadings": how much load of this semantic concept is carried in the document). In case of a zero-value, there are words that do not have something to do with nature but they could also occur in the context of nature (there is no relationship). A negative value, in turn, means that there are especially words in the document that deal with "the opposite" of nature (whatever that is), meaning that these are words that particularly come up in contexts that do n o t deal with nature and these words do n o t show up if the topic is nature. You can interpret the loadings more or less like correlation coefficients (it's of course somewhat more complex than in correlation coefficients because the SVD considers indirect relations across contexts).
2) Whether negative values affect your clustering depends on the distance metric you are using. Typically, in LSA you would use cosine (to put it more precisely: arccosine), which can perfectly deal with negative values. Right now, I don't even come up with a distance metric that can't deal with them, despite metrics for binary variables, that of course are not applicable here at all. But make sure you know which computations are carried out for your distance metric and you can decide whether this would be a problem.
3) I am not sure what you mean by VSM because LSA is also a vector space model. But I guess in the VSM you "only" use the term document frequency? In this case, I'd say it would rather be a special case that there is no difference between LSA and VSM and you should stick to the more parsimonious model without latent concepts. But maybe, there might be also ways to perform a better LSA so that the LSA model can improve.
Best regards,
Fabian
-----Ursprüngliche Nachricht-----
Von:
s-space-re...@googlegroups.com [mailto:
s-space-re...@googlegroups.com] Im Auftrag von Carmen Torres López
Gesendet: Mittwoch, 25. November 2015 22:31
An:
s-space-re...@googlegroups.com
Betreff: questions about LSA representation
--
You received this message because you are subscribed to the Google Groups "Semantic Space Research - Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
s-space-research...@googlegroups.com.
To post to this group, send email to
s-space-re...@googlegroups.com.
Visit this group at
http://groups.google.com/group/s-space-research-dev.
For more options, visit
https://groups.google.com/d/optout.