Hello thank you for your reply
for the coherence aware topic modeling, could you refer me some resources
I used ngram implementation (bigram and trigram) in my preprocessing steps, I also extended the stop word list with the high frequency words in the vocabulary and I used as well the id2word.filter_extremes when generation the id2word vector.
I am using c_v as a metric for the coherence score.
for the data cleaning, I removed hashtags, URLs, links, punctuations, RT tags, @ tags and emojis.
for the data preprocessing, I used tokenization, removing stop words, implementing ngrams and lemmatization.
Thank you for your feedback
kind regards