On Jan 24, 8:46 am, PaulR <
p...@rudin.co.uk> wrote:
> Both LSI and LDA give a mechanism for determining topics from a
> corpus, and expressing documents in terms of those topics. Both have
> an online implementation (which I'm interested in at the moment).
>
> I wondered if people had any feel for how they differ in terms of the
> results?
Visually: LDA topics typically "look better", hands down. More
coherent and easier to interpret.
Actual quality of doc-doc similarity: notoriously difficult to judge
objectively, but there doesn't seem to be a fundamental difference.
People on this mailing list have mentioned they actually preferred the
LSI results, but that's just anecdotal.
> Also - are there significant performance differences between the
> gensim implementations of the two?
Not "significant", but LSI is faster by a constant factor. LSI also
has the nice property that the topics are nested -- once you compute
400 topics, you get a model for 200 topics for free, with no extra
computation. This is not true for LDA, where you'll have to re-train
when changing the dimensionality.
HTH,
Radim