Radim Řehůřek <me@...> writes:
>
>
> Hello Artem,
> no, that is not correct. The list returned from `lda[doc_bow]`
contains 2-tuples (topic, probability). Words don't come into it
anymore; you cannot convert the topic into a word...
>
> Each topic is itself a prob. distribution over words; have a look at
`lda.print_topics()`
and
http://radimrehurek.com/gensim/tut2.html#transforming-vectors
>
> Best,
> Radim
>
>
> On Sunday, September 22, 2013 1:44:50 AM UTC+2, Artem Yankov
wrote:Answering my own question in case someone run into the same
problem.It returned only probabilities for first 5 topics, because I
trained LDA model with num_topics=5. So apparently, for a new document,
it would just estimate probabilities for those 5 topics. As for
converting results of inference back to words, looks like there's no
built-in solution, but a simple method would do:def
get_topics(dictionary, topics, prob=0.5): return
[dictionary.id2token[topic[0]] for topic in topics if topic[1] >
prob]where topics is a list of tuples (id, probability)On Friday,
September 20, 2013 9:43:09 PM UTC-7, Artem Yankov wrote:
> I trained LDA with a bunch of text and now trying to infer topics for
a new document.I'm doing it like in the tutorial:doc_lda =
lda[doc_bow]and that returns me the following list:[(0,
0.060771759757132261), (1, 0.11843179910545466), (2,
0.34692926700963628), (3, 0.19344052553420571), (4,
0.28042664859357103)]I would think those tuples are word ids mapped to
their probabilities, but I always get only first 4 idswhen size of the
dictionary is about 44,000 words. Am I misunderstanding something?
Another question, is there a simple way to map this result to the actual
list of infered topics?Thanks.
>
>
>
>
>
>
>
Hi Radim
I trained LDA with n_topics = 10 as follows:
"lda = gensim.models.LdaModel(corpus, id2word = dictionary, num_topics =
n_topics)"
Now when I apply lda[doc2bow], I am getting probability distribution on
<10 topics for some input cases!, and also the sum(individual topic's
probabilities)<1.
What can be thee reason for this behaviour?
Thanks
Shubham