Hello Gabriel and everyone!
I am new to gensim and topic modeling. Over the weekend, I taught myself LDA and ran it on a corpus of open-ended gender narratives I collected from a large online study (which I oversampled for sexual and gender diversity). This morning, I tried getting chatGPT to help me create a corpus and run CTM to extract 20 topics from my data. The topics from CTM were noticeably more interpretable than those from LDA and chatGPT was surprisingly helpful (at least it was early in the morning before demand picked up).
In fact, chatGPT gave me the python code it used, starting with the following:
===================================================
import pandas as pd
import nltk
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
from nltk.tokenize import word_tokenize
from gensim.corpora import Dictionary
from gensim.models import ctmodel
===================================================
After it preprocessed the corpus and converted it into a bag of words (not showing that code to save space),
it used the following code to identify the topics requested.
===================================================
# Apply CTM to identify the top 20 themes
num_topics = 20
ctm_model = ctmodel.CtModel(bow_corpus, num_topics=num_topics)
ctm_model.fit(lemmatized_tokens)
topics = ctm_model.get_topics()
===================================================
Does "ctmodel.CtModel" exist somewhere within gensim?
I have spent the entire day trying to reproduce those results in python on my own, outside of chatGPT (so that I can ensure the stability of the findings, create the necessary tables/figures, and make my de-identified data and syntax available). However, I keep getting an error that gensim does not contain an element like that.
Did you create a CTM function for gensim? Would you be willing to share it with me? I would be sure to give you the appropriate credit.
Any help would be greatly appreciated. Now that I've familiarized myself with the gensim code, I'd rather stick with that if at all possible. Of course, if there isn't a ctmodel function in gensim, then I can either use the BERTopic or the topicmodels modules. I just didn't anticipate going down an 8hr rabbit hole trying to get some code to run!
Thanks!
Ron Rogge
Associate Professor of Clinical Psychology
University of Rochester