i'll take a stab at this
perhaps model_lda did not converge ?
is there a way to test/quantify convergence?
> model_lda = models.ldamodel.LdaModel(corpus=mm_tfidf, id2word=id2word,
> num_topics=100, update_every=1, chunksize=10000, passes=1)i'll take a stab at this
perhaps model_lda did not converge ?
is there a way to test/quantify convergence?
there's convergence -- but i don't know if that's the same thing -- is
it possible to have convergence w/out having a good 'original document
approximation'
matrix factorization methods try to minimize a cost function that
quantifies topic + topic assignment approximation
is there there anything similar for lda
> - Build the TFIDF, LSA and LDA-online models
> model_tfidf = models.TfidfModel(mm_bow, id2word=id2word, normalize=True)
> model_lsi = models.lsimodel.LsiModel(corpus=mm_tfidf, id2word=id2word,
> num_topics=400)
> model_lda = models.ldamodel.LdaModel(corpus=mm_tfidf, id2word=id2word,
> num_topics=100, update_every=1, chunksize=10000, passes=1)
LDA model works over word counts (integers) -- but here, you run it
over tf-idf (real-valued). The underlying probability model doesn't
make sense this way... although it works numerically and may even give
good (better?) results. Just saying :-)
>>> mm = gensim.corpora.MmCorpus('wiki_en_tfidf.mm')
>>> lda = gensim.models.ldamodel.LdaModel(corpus=mm, id2word=id2word, num_topics=100, update_every=1, chunksize=10000, passes=1)
> LSA
> Extract from army article -> 0.842584
> Extract from Chile article -> 0.120402
> Extract from gun article -> 0.253231
>
> LDA
> Extract from army article -> 0.115992
> Extract from Chile article -> 0.046896
> Extract from gun article -> 0.156798
The similarity numbers are not necessarily comparable across methods.
By default, gensim uses cosine similarity (~angle between the topic
vectors), so that the range is guaranteed to be <-1, 1>, but that
doesn't mean score 0.1 with one method means the same thing as 0.1
with another. A score of 0.5 could mean "extremely similar" under one
method (with its internal parameters) and "no similarity to speak of"
for another. The absolute scores are only comparable within the same
model.
thank you Radim -- and thank you gensim community for the prior 665
messages -- i couldn't have done it w/out you!
i've been going to therapy -- trying to curb my 'evilishness' --
http://www.youtube.com/watch?v=jMIDpJ8H7H0
> In gensim, you can see the approximate variational error with `LdaModel.bound()`.
Radim -- do mind helping me understand what this method does -- i'm
trying to go through the code, but i'm too new -- a little guidance
would go a long way
Can you post your new LDA topics, once the training is done? I'm
interested in seeing the difference.
You're right! It looks better. I still likes the LSA results more, but
I guess that's subjective.
> What are you using the topic modelling for?
So far I am doing some exploratory analysis. I want to see the
relationship between links between articles and their vector space
representation.
Alejandro.