PyLDAVis: TypeError: cannot sort an Index object in-place, use sort_values instead

450 views
Skip to first unread message

Amar B.

unread,
Oct 10, 2016, 10:20:32 AM10/10/16
to gensim

I am trying to visualize LDA topics in Python using PyLDAVis but I can't seem to get it right. My model has a vocab size of 150K words and about 16 Million tokens were taken to train it.

I am doing it outside of an iPython notebook and this is the code that I wrote to do it.


model_filename = "150k_LdaModel_topics_"+ topics +"_passes_"+passes +".model"

dictionary
= gensim.corpora.Dictionary.load('LDADictSpecialRemoved150k.dict')

corpus
= gensim.corpora.MmCorpus('LDACorpusSpecialRemoved150k.mm')

ldamodel
= gensim.models.ldamodel.LdaModel.load(model_filename)

import pyLDAvis.gensim

vis
= pyLDAvis.gensim.prepare(ldamodel, corpus, dictionary)

pyLDAvis
.save_html(vis, "topic_viz_"+topics+"_passes_"+passes+".html")


I get the following error after 2-3 hours of running code on a high speed server with >30GBs of RAM. Can someone help where I am going wrong?


Traceback (most recent call last):

 
File "create_vis.py", line 36, in <module>
    vis
= pyLDAvis.gensim.prepare(ldamodel, corpus, dictionary)
 
File "/local/lib/python2.7/site-packages/pyLDAvis/gensim.py", line 110, in prepare
   
return vis_prepare(**opts)
 
File "/local/lib/python2.7/site-packages/pyLDAvis/_prepare.py", line 398, in prepare
    token_table        
= _token_table(topic_info, term_topic_freq, vocab, term_frequency)
 
File "/local/lib/python2.7/site-packages/pyLDAvis/_prepare.py", line 267, in _token_table
    term_ix
.sort()
 
File "/local/lib/python2.7/site-packages/pandas/indexes/base.py", line 1703, in sort
   
raise TypeError("cannot sort an Index object in-place, use "
TypeError: cannot sort an Index object in-place, use sort_values instead



I think this is an issue with gensim dictionary and corpus. I am trimming the dictionary size to 150K from 2M and I think somehow that part is causing the error.


Can someone help?


Thanks,

Amar

Lev Konstantinovskiy

unread,
Oct 11, 2016, 1:14:29 AM10/11/16
to gensim
Hi Amar,

PyLDAVis is a great package that integrates with Gensim. However it is not a part of Gensim. You will have a better chance of getting an answer at the PyLDAVis issue page .

Regards
Lev

Amar B.

unread,
Oct 11, 2016, 8:58:48 AM10/11/16
to gensim
Hi Lev,
I reported it there already. I posted here to see if someone here might know what's going on.

Thanks,
Amar

Kenneth Orton

unread,
Oct 14, 2016, 8:54:32 AM10/14/16
to gensim
I think you posted this issue on Github https://github.com/bmabey/pyLDAvis/issues/74 and the issue was resolved in an update.
There is a proper way to 'trim' or filter a saved corpus/dictionary. It was originally answered in #8 here: https://github.com/piskvorky/gensim/wiki/Recipes-&-FAQ

After you update the corpus/dictionary to less tokens you need to create a new LDA model from the corpus/dictionary
Here is an example, (most of this is not my code but I did include source):
import copy
import gensim
from gensim.models import VocabTransform
import logging

logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)

DEFAULT_DICT_SIZE = 100000

# filter the dictionary
old_dict = gensim.corpora.Dictionary.load('data.new_old/wiki_dict.dict')
new_dict = copy.deepcopy(old_dict)
new_dict.filter_extremes(no_below=20, no_above=0.1, keep_n=DEFAULT_DICT_SIZE)
new_dict.save('data.new_old/filtered.dict')

# transform the corpus
corpus = gensim.corpora.MmCorpus('data.new_old/wiki_corpus.mm')
old2new = {old_dict.token2id[token]:new_id for new_id, token in new_dict.iteritems()}
vt = VocabTransform(old2new)
gensim.corpora.MmCorpus.serialize('data.new_old/filtered_corpus.mm', vt[corpus], id2word=new_dict, progress_cnt=10000)

# create lda model from filtered data
bow_corpus = gensim.corpora.MmCorpus('data.new_old/filtered_corpus.mm')
dictionary = gensim.corpora.Dictionary.load('data.new_old/filtered.dict')
lda = gensim.models.ldamodel.LdaModel(corpus=bow_corpus, id2word=dictionary, num_topics=100, update_every=1, chunksize=10000, passes=1)
lda.save('data.new_old/lda_filtered.model')


Reply all
Reply to author
Forward
0 new messages