Regarding topic words generated in Dynamic Topics Model

25 views
Skip to first unread message

charu james

unread,
Aug 25, 2022, 6:28:13 AM8/25/22
to Gensim
Hello,

While working with dynamic LDA, I realized that words from future timestamp was occurring in the current timestamp. Is this expected? I have been using the Neurips and New York times datasets for training. I have also tried with small sample dataset to see if words from different timestamps occur in the early timestamps. Below I have shown the code used for the sample dataset. I am using a genism wrapper for Dynamic Topic Model (https://radimrehurek.com/gensim_3.8.3/models/wrappers/dtmmodel.html). I have also trained with genism ldaseqmodel (https://radimrehurek.com/gensim/models/ldaseqmodel.html).

 

In the code, you can see, that I have given nine documents. Five docs in the first timestamp and four docs in the second timestamp. In the output, you can see the topic words “graph” and “trees” in the first timestamp, but those words are from documents in the second timestamp. Is this expected in the topics generated on different timestamps? Kindly provide an answer. 

 

Code:

from gensim.corpora import Dictionary

from gensim.models.wrappers import DtmModel

import os

os.chdir("../../../../..")

dtm_path = "/scratch/james/large_files/AllfromHome/EMNLP/DLDA_C++/DTM_HOME/dtm-linux64"

 

common_texts = [

    ['human', 'interface', 'computer'],

    ['survey', 'user', 'computer', 'system', 'response', 'time'],

    ['eps', 'user', 'interface', 'system'],

    ['system', 'human', 'system', 'eps'],

    ['user', 'response', 'time'],

    ['trees'],

    ['graph', 'trees'],

    ['graph', 'minors', 'trees'],

    ['graph', 'minors', 'survey']

]

common_dictionary = Dictionary(common_texts)

common_corpus = [common_dictionary.doc2bow(text) for text in common_texts]

model = DtmModel(dtm_path, corpus=common_corpus, id2word=common_dictionary,num_topics=1, time_slices=[5,4])

print(model.show_topic(topicid=0, time=0, num_words=5))

print(model.show_topic(topicid=0, time=1, num_words=5))


Output:

[(0.1394453578916981, 'system'), (0.10686028263957828, 'user'), (0.10641178588011702, 'graph'), (0.10641178588011702, 'trees'), (0.07492795235292479, 'eps')]

[(0.138615046975501, 'system'), (0.10752510958147092, 'graph'), (0.10752510958147092, 'trees'), (0.10637521554211087, 'user'), (0.07522760231339463, 'minors')]

 

Thanks & Regards,

Charu Karakkaparambil James

Reply all
Reply to author
Forward
0 new messages