LDAviz and Streaming Corpus

Francisco Arceo

unread,

Nov 24, 2015, 10:32:53 AM11/24/15

to gensim

I'm using the LDA implmentation from Gensim and I wanted to use my estimated LDA model and corpus in the LDAVis tool.

The tutorial of taking a Gensim corpus and lda model is really helpful (linkhttp://nbviewer.ipython.org/github/bmabey/pyLDAvis/blob/master/notebooks/Gensim%20Newsgroup.ipynb#topic=0&lambda=1&term=) but I'm having issues with my implementation.

I use the memory friendly implementation of my corpus and dont' store it in memory, which I think may be the root of my problem. Does anyone know how I can implement pyLDAvis.gensim.prepare on a streaming corpus?

When I run:

import pyLDAvis.gensim as gensimvis
vis_data = gensimvis.prepare(ldamodel, mycorpus, mycorpus.dictionary)

I get the following error:

Traceback (most recent call last):
File "", line 1, in
vis_data = gensimvis.prepare(ldamod, corpus, corpus.dictionary)
File "//anaconda/lib/python2.7/site-packages/pyLDAvis/gensim.py", line 97, in prepare
opts = fp.merge(_extract_data(topic_model, corpus, dictionary, doc_topic_dist), kwargs)
File "//anaconda/lib/python2.7/site-packages/pyLDAvis/gensim.py", line 33, in _extract_data
assert doc_lengths.shape[0] == len(corpus), 'Document lengths and corpus have different sizes {} != {}'.format(doc_lengths.shape[0], len(corpus))
TypeError: object of type 'MyCorpus' has no len()

Samira

unread,

Dec 2, 2015, 7:19:04 AM12/2/15

to gensim

hello Francisco,

I am also willing to use LDAviz to visualize the output of my LDA model , generated with gensim, but I could not even install pyLDAviz package.

I receive some error related c++ ...

have you face any similar problems while installation?

Giancarlo Facoetti

unread,

Feb 16, 2017, 8:39:55 AM2/16/17

to gensim

Hello Francisco,

you should add the __len__ method to the class that represents your streaming corpus

--------------------

class MyStreamingCorpus()

...

def __len__(self):

return len(self.documents) # this must return the number of documents in your corpus

--------------------

Hope this helps

best

Giancarlo

Lev Konstantinovskiy

unread,

Feb 17, 2017, 9:16:08 AM2/17/17

to gensim

Hi Francisco,

Unfortunately PyLdaVis doesn't support streaming corpora yet. You can request this feature in that project's repo https://github.com/bmabey/pyldavis

Regards
Lev

Reply all

Reply to author

Forward