LDAviz and Streaming Corpus

469 views
Skip to first unread message

Francisco Arceo

unread,
Nov 24, 2015, 10:32:53 AM11/24/15
to gensim

I'm using the LDA implmentation from Gensim and I wanted to use my estimated LDA model and corpus in the LDAVis tool.


The tutorial of taking a Gensim corpus and lda model is really helpful (linkhttp://nbviewer.ipython.org/github/bmabey/pyLDAvis/blob/master/notebooks/Gensim%20Newsgroup.ipynb#topic=0&lambda=1&term=) but I'm having issues with my implementation.


I use the memory friendly implementation of my corpus and dont' store it in memory, which I think may be the root of my problem. Does anyone know how I can implement pyLDAvis.gensim.prepare on a streaming corpus? 


When I run:


import pyLDAvis.gensim as gensimvis
vis_data
= gensimvis.prepare(ldamodel, mycorpus, mycorpus.dictionary)


I get the following error: 

Traceback (most recent call last):

File "", line 1, in 
vis_data = gensimvis.prepare(ldamod, corpus, corpus.dictionary)

File "//anaconda/lib/python2.7/site-packages/pyLDAvis/gensim.py", line 97, in prepare
opts = fp.merge(_extract_data(topic_model, corpus, dictionary, doc_topic_dist), kwargs)

File "//anaconda/lib/python2.7/site-packages/pyLDAvis/gensim.py", line 33, in _extract_data
assert doc_lengths.shape[0] == len(corpus), 'Document lengths and corpus have different sizes {} != {}'.format(doc_lengths.shape[0], len(corpus))

TypeError: object of type 'MyCorpus' has no len()

Samira

unread,
Dec 2, 2015, 7:19:04 AM12/2/15
to gensim
hello Francisco, 
I am also willing to use LDAviz to visualize the output of my LDA model , generated with gensim, but I could not even install pyLDAviz package. 
I receive some error related c++ ... 
have you face any similar problems while installation?

Giancarlo Facoetti

unread,
Feb 16, 2017, 8:39:55 AM2/16/17
to gensim
Hello Francisco,

you should add the __len__ method to the class that represents your streaming corpus

--------------------
class MyStreamingCorpus()
    ...

    def __len__(self):
        return len(self.documents) # this must return the number of documents in your corpus
--------------------
Hope this helps
best
Giancarlo

Lev Konstantinovskiy

unread,
Feb 17, 2017, 9:16:08 AM2/17/17
to gensim
Hi Francisco,

Unfortunately PyLdaVis doesn't support streaming corpora yet. You can request this feature in that project's repo https://github.com/bmabey/pyldavis

Regards
Lev

Reply all
Reply to author
Forward
0 new messages