Maybe someone can help me out here with syntax. I'm trying to compare results of gensim lda to sklearns implementation but i cannot figure out how i need to feed in the data.
Gensim works just fine
id2word_l = corpora.Dictionary(longer_docs)
mm_l = [id2word_l.doc2bow(text) for text in longer_docs]
lda_l = models.ldamulticore.LdaMulticore(corpus = mm_l,
id2word=id2word_l,
num_topics=40,
minimum_probability=0,
chunksize=10000,
passes=20,
workers=24)
This is where I must be doing something wrong
from gensim.models import TfidfModel
from sklearn.decomposition import LatentDirichletAllocation
lda = LatentDirichletAllocation(n_topics=40, max_iter=5,
learning_method='online',
learning_offset=50.,
random_state=0)
tfidf = TfidfModel(mm_l)
corpus_tfidf = tfidf[mm_l]
lda.fit(corpus_tfidf)
Gives me the following error ValueError: Expected 2D array, got 1D array instead:
not sure how to transform the 'list' to a 2d array how it wants. Anyone have experience doing this comparison?