I'm using the LDA implmentation from Gensim and I wanted to use my estimated LDA model and corpus in the LDAVis tool.
The tutorial of taking a Gensim corpus and lda model is really helpful (linkhttp://nbviewer.ipython.org/github/bmabey/pyLDAvis/blob/master/notebooks/Gensim%20Newsgroup.ipynb#topic=0&lambda=1&term=) but I'm having issues with my implementation.
I use the memory friendly implementation of my corpus and dont' store it in memory, which I think may be the root of my problem. Does anyone know how I can implement pyLDAvis.gensim.prepare on a streaming corpus?
When I run:
import pyLDAvis.gensim as gensimvis
vis_data = gensimvis.prepare(ldamodel, mycorpus, mycorpus.dictionary)
Traceback (most recent call last):
File "", line 1, in
vis_data = gensimvis.prepare(ldamod, corpus, corpus.dictionary)File "//anaconda/lib/python2.7/site-packages/pyLDAvis/gensim.py", line 97, in prepare
opts = fp.merge(_extract_data(topic_model, corpus, dictionary, doc_topic_dist), kwargs)File "//anaconda/lib/python2.7/site-packages/pyLDAvis/gensim.py", line 33, in _extract_data
assert doc_lengths.shape[0] == len(corpus), 'Document lengths and corpus have different sizes {} != {}'.format(doc_lengths.shape[0], len(corpus))TypeError: object of type 'MyCorpus' has no len()
Unfortunately PyLdaVis doesn't support streaming corpora yet. You can request this feature in that project's repo https://github.com/bmabey/pyldavis
Regards
Lev