Topic modelling calling gensim.models.ldamodel.LdaModel() producing NaN & same words in each topic

275 views
Skip to first unread message

khaj...@ualberta.ca

unread,
Aug 27, 2018, 12:10:45 PM8/27/18
to Gensim
Hello there,

I have an issue when calling gensim.models.ldamodel.LdaModel(). Here is the assignment I want to solve 

you will use Gensim's LDA (Latent Dirichlet Allocation) model to model topics in `newsgroup_data`. You will first need to finish the code in the cell below by using gensim.models.ldamodel.LdaModel constructor to estimate LDA model parameters on the corpus, and save to the variable `ldamodel`. Extract 10 topics using `corpus` and `id_map`, and with `passes=25` and `random_state=34`.



my code is:

import pickle
import gensim
from sklearn.feature_extraction.text import CountVectorizer

# Load the list of documents
with open('newsgroups', 'rb') as f:
    newsgroup_data = pickle.load(f)

# Use CountVectorizor to find three letter tokens, remove stop_words, 
# remove tokens that don't appear in at least 20 documents,
# remove tokens that appear in more than 20% of the documents
vect = CountVectorizer(min_df=20, max_df=0.2, stop_words='english', 
                       token_pattern='(?u)\\b\\w\\w\\w+\\b')
# Fit and transform
X = vect.fit_transform(newsgroup_data)

# Convert sparse matrix to gensim corpus.
corpus = gensim.matutils.Sparse2Corpus(X, documents_columns=False)

# Mapping from word IDs to words (To be used in LdaModel's id2word parameter)
id_map = dict((v, k) for k, v in vect.vocabulary_.items())

# Use the gensim.models.ldamodel.LdaModel constructor to estimate 
# LDA model parameters on the corpus, and save to the variable `ldamodel`

ldamodel = gensim.models.ldamodel.LdaModel(corpus, passes=25, num_topics=10, id2word = id_map, random_state=34) 

Then I need to 

Use `ldamodel`, find a list of the 10 topics and the most significant 10 words in each topic. 

My code:

ldamodel.print_topics(num_topics=10, num_words=10)

and it returns:

[(0,
  'nan*"prevent" + nan*"pretty" + nan*"pro" + nan*"prices" + nan*"price" + nan*"previous" + nan*"problem" + nan*"practice" + nan*"present" + nan*"problems"'),
 (1,
  'nan*"prevent" + nan*"pretty" + nan*"pro" + nan*"prices" + nan*"price" + nan*"previous" + nan*"problem" + nan*"practice" + nan*"present" + nan*"problems"'),
 (2,
  'nan*"prevent" + nan*"pretty" + nan*"pro" + nan*"prices" + nan*"price" + nan*"previous" + nan*"problem" + nan*"practice" + nan*"present" + nan*"problems"'),
 (3,
  'nan*"prevent" + nan*"pretty" + nan*"pro" + nan*"prices" + nan*"price" + nan*"previous" + nan*"problem" + nan*"practice" + nan*"present" + nan*"problems"'),
 (4,
  'nan*"prevent" + nan*"pretty" + nan*"pro" + nan*"prices" + nan*"price" + nan*"previous" + nan*"problem" + nan*"practice" + nan*"present" + nan*"problems"'),
 (5,
  'nan*"prevent" + nan*"pretty" + nan*"pro" + nan*"prices" + nan*"price" + nan*"previous" + nan*"problem" + nan*"practice" + nan*"present" + nan*"problems"'),
 (6,
  'nan*"prevent" + nan*"pretty" + nan*"pro" + nan*"prices" + nan*"price" + nan*"previous" + nan*"problem" + nan*"practice" + nan*"present" + nan*"problems"'),
 (7,
  'nan*"prevent" + nan*"pretty" + nan*"pro" + nan*"prices" + nan*"price" + nan*"previous" + nan*"problem" + nan*"practice" + nan*"present" + nan*"problems"'),
 (8,
  'nan*"prevent" + nan*"pretty" + nan*"pro" + nan*"prices" + nan*"price" + nan*"previous" + nan*"problem" + nan*"practice" + nan*"present" + nan*"problems"'),
 (9,
  'nan*"prevent" + nan*"pretty" + nan*"pro" + nan*"prices" + nan*"price" + nan*"previous" + nan*"problem" + nan*"practice" + nan*"present" + nan*"problems"')]

I was wondering why all are NaN & same words in each topic?

Here are the version of libraries:

Python 3.6.4
numpy  1.14.0 
scipy  1.0.0  
six    1.11.0     

smart-open      1.6.0  

gensim       3.5.0    


I really appriciate any help.


Thanks,

Maryam

 



khaj...@ualberta.ca

unread,
Aug 27, 2018, 12:20:14 PM8/27/18
to Gensim
Sorry, I just uninstall and install numpy again and it seems it is working 
uninstall numpy and then reinstall numpy using

>>pip uninstall numpy
>>pip install -U numpyNow 

Now numpy has version of 1.15.1. It resolved the issue of NaN!
Reply all
Reply to author
Forward
0 new messages