Topic Modeling using R-Studio

209 views
Skip to first unread message

Neeraj Kaushik

unread,
Dec 31, 2023, 6:26:05 PM12/31/23
to dataanalysistraining
Dear Friends

Text Analysis involves the following dimensions:
1. Importing text
2. Cleaning the text
3. Counting the words and Visualization (Wordcloud, Wordcloud2, Comparison Cloud)
4. Sentiment Analysis
5. Topic Modeling
6. Predictive Analysis

I've explained the nuances of Text Analysis in the following playlists:

In the first playlist, I've explained the nuances of a R-package QUANTEDA which is a comprehensive package and contains almost all capabilities of other packages. There is one more package TIDYTEXT (CRAN - Package tidytext) which is also very powerful. It is developed by a team consisting of Gabriela De Queiroz, Colin Fay, Emil Hvitfeldt, Os Keyes, Kanishka Misra, Tim Mastny, Jeff Erickson, David Robinson, Julia Silge

Julia Silge (https://juliasilge.com/about/) is a prolific author and has written many books and has provided them FREE to all

1. Text Mining with R: A Tidy Approach by Julia Silge and David Robinson https://www.tidytextmining.com/
2. TIDY MODELING WITH R by MAX KUHN AND JULIA SILGE https://www.tmwr.org/
3. Supervised Machine Learning for Text Analysis in R by EMIL HVITFELDT AND JULIA SILGE https://smltar.com/

I've explained the working of Tidytext in these videos:

TidyText-1 Introduction to TidyText R-package: https://youtu.be/zURDfHHks2k

Happy Learning
Neeraj

Prof. Arunkumar Dubey

unread,
Dec 31, 2023, 8:10:57 PM12/31/23
to dataanalys...@googlegroups.com
Good one Sir. 

Thank you for the invaluable support to the research community. 

Best wishes 

--
The members of this group are expected to follow the following Protocols:
1. Please search previous posts in the group before posting the question.
2. Don't write the query in someone's post. Always use the option of New topic for the new question. You can do this by writing to dataanaly...@googlegroups.com
3. It’s better to give a proper subject to your post/query. It'll help others while searching.
4. Never write Open-ended queries. This group intends to help research scholars, NOT TO WORK FOR THEM.
5. Never write words like URGENT in your posts. People will help when they are free.
6. Never upload any information about National Seminars/Conferences. Send such information
in personal emails and feel free to share any RESEARCH-related information.
7. No Happy New Year, Happy Diwali, Happy Holi, Happy Birthday, Happy Anniversary, etc. allowed in this group.
8. Asking or sharing Research Papers is NOT ALLOWED.
9. You can share your questionnaire only once.
---
You received this message because you are subscribed to the Google Groups "DataAnalysis" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataanalysistrai...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataanalysistraining/CAAd%3Dc8Nn6e416K%3DtDhtw22sdOWw17wJ60KSGjei4SAYwY03d2Q%40mail.gmail.com.

Neeraj Kaushik

unread,
Jan 1, 2024, 6:21:24 PM1/1/24
to dataanalysistraining
Dear Friends

Just like the Data Analysis, conceptually we can think of classifying the Text Analysis into -

a) Univariate- Read it as handling the individual words and their Visualization
b) Bivariate - Read it as finding the association between the words and their Visualization
c) Multivariate - Read it as Topic modeling and Predictive analysis

I've explained the nuances of handling the individual words and their Visualization (Wordcloud, Wordcloud2, Comparison cloud) in this video:

TidyText-2 Import, Clean, Count and Visualization: https://youtu.be/HLRLT6xEqec

Happy Learning
Neeraj

Neeraj Kaushik

unread,
Jan 2, 2024, 7:05:52 PM1/2/24
to dataanalysistraining
Dear Friends

Bivariate analysis in the content of text analysis includes finding association between the words. It captures the words that are frequently used together and the same are represented using Network plot.
I've explain the Word Association in this video:

TidyText-3 Word Association: https://youtu.be/eKn296zwOAA

Happy Learning
Neeraj

Neeraj Kaushik

unread,
Jan 4, 2024, 10:44:01 AM1/4/24
to dataanalysistraining
Dear Friends

Topic Modeling (Identifying themes from the given text) is sort of a multivariate technique in the realms of Qualitative analysis. The basic postulate here is that every document contains themes (denoted by Gamma) and every theme contains the words (denoted by Beta).

I've tried to explain the concept of Topic Modeling in very easy language in this video:

TidyText-4 Introduction to Topic Modeling: https://youtu.be/tZtIJfmxzhQ

Happy Learning
Neeraj

Neeraj Kaushik

unread,
Jan 5, 2024, 9:42:04 AM1/5/24
to dataanalysistraining
Dear Friends

I've explained the practical working on Topic Modeling using R-package tidytext in this video

TidyText-5 Working on Topic Modeling: https://youtu.be/RC2jB1ocdx0

Happy Learning
Neeraj

Neeraj Kaushik

unread,
Jan 5, 2024, 7:06:09 PM1/5/24
to dataanalysistraining
Dear Friends

There are three ways to determine the number of topics:
1. Apriori method (Decided by the user)
2. Mathematical score method (Perplexity score or Coherence score)
3. Predictive analysis for determining number of topics

I've explained the second method Mathematical score method (Perplexity score) to determine the number of topics in this video:

TidyText-6 Working on Topic Modeling (Perplexity score method): https://youtu.be/lRaYdZOKbSE

Happy Learning
Neeraj

Purnima Rao

unread,
Jan 6, 2024, 1:20:43 AM1/6/24
to dataanalys...@googlegroups.com
Thanks lot sir.....very useful🙏

--

Neeraj Kaushik

unread,
Jan 7, 2024, 9:30:16 AM1/7/24
to dataanalysistraining
Dear Friends
There are many R-packages which analyze the text and cluster the words in to certain topics.
A million dollar question here is how to analyze the topics?
I've provided a method which uses 4 parameters:
1. Beta (Probability that a word will be associated with which topic)
2. Gamma (PProbability that a topic is associated with which document)
3. KWIC (Key Word In Context)
4. Network plot of word associations

I've explain the same in this video:
TidyText-7 Analysis of Topic Modeling (Beta, Gamma, KWIC and Network plot): https://youtu.be/AVKo__odhnI

Happy Learning
Neeraj

samiran

unread,
Jan 7, 2024, 6:29:33 PM1/7/24
to dataanalys...@googlegroups.com
Dear Sir
Just to inform that my research area is on Text analysis in R , mainly Topic Modeling and other methods. If anyone interested to collaborate I am open for that . Most interestingly this method can be applied for any kind of social sciences research. 

Thank you 
Regards
 Dr Samiran Sur

--

Zertaj Fatima

unread,
Jan 8, 2024, 1:53:46 AM1/8/24
to dataanalys...@googlegroups.com
Sir,
I am trying topic modelling but not getting results, since I am using data from dimensions database. 

Neeraj Kaushik

unread,
Jan 8, 2024, 7:33:22 AM1/8/24
to dataanalys...@googlegroups.com
Dear Zertaj
Plz share what sort of data you have and what problem you are facing.
Best wishes

Neeraj Kaushik

unread,
Jan 10, 2024, 6:08:15 PM1/10/24
to dataanalysistraining
Dear Friends

The term Predictive Text Analysis means predicting a continuous (also called metric) DV based on some IDV created from the text (reviews or comments). There are different methods for the same

1. Compute the sentiment analysis of the text and use the sentiment score as IDV
2. Compute the proportion of every topic in every comment, review, or text and use that score as IDV   

I've explained the first method of using sentiment analysis in this video:
TidyText-8 Predictive Analysis using Sentiment score: https://youtu.be/mvkZc6nr_uk

Happy Learning
Neeraj

Neeraj Kaushik

unread,
Jan 11, 2024, 7:14:15 PM1/11/24
to dataanalysistraining
Dear Friends

In the second method of Predictive modeling, we use LDA or sLDA function to do 3 things parallely:

1. Perform Topic modeling - certain words are clubbed (grouped) into a topic
2. Find the proportion of each topic in each comment/review/sentence
3. Use this proportion as IDV to generate a regression model for a given metric DV

So the entire process will include the following 5 steps:
1. Import text
2. Split the text into 2 parts: training and testing dataset
3. Now take the training dataset and clean text (remove stopwords and numbers)
4. Create the necessary inputs for the input function using a function Laxicalize
5. Decide for the number of topics (K) and run sLDA function. This will give regression output.
6. Now take the testing data->clean ->lexicalize and use this data as input in the model created in step-5 to predict the Yc (Predicted value of Y). 
7. Check for the accuracy of Yc (also called Y hat) by comparing it with the actual Y value (DV  of Testing data.
8. If required, go to step5 and change the value of K and repeat the process.

I've explained this process and the working in 2 videos.

The first video is as follows:
TidyText-9 Predictive Analysis using sLDA function (Part-1): https://youtu.be/wuNMJu4HgyM

Happy Learning
Neeraj

Zertaj Fatima

unread,
Jan 12, 2024, 2:27:17 AM1/12/24
to dataanalys...@googlegroups.com
Dear Respected Sir,
I Don't have access to scopus data when,I am using dimensions data,it is showing incomplete in biblioshiny, as well as in topic modelling in R .
Thanks & Regards,
Zertaj 

--
The members of this group are expected to follow the following Protocols:
1. Please search previous posts in the group before posting the question.
2. Don't write the query in someone's post. Always use the option of New topic for the new question. You can do this by writing to dataanaly...@googlegroups.com
3. It’s better to give a proper subject to your post/query. It'll help others while searching.
4. Never write Open-ended queries. This group intends to help research scholars, NOT TO WORK FOR THEM.
5. Never write words like URGENT in your posts. People will help when they are free.
6. Never upload any information about National Seminars/Conferences. Send such information
in personal emails and feel free to share any RESEARCH-related information.
7. No Happy New Year, Happy Diwali, Happy Holi, Happy Birthday, Happy Anniversary, etc. allowed in this group.
8. Asking or sharing Research Papers is NOT ALLOWED.
9. You can share your questionnaire only once.
---
You received this message because you are subscribed to the Google Groups "DataAnalysis" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataanalysistrai...@googlegroups.com.

Neeraj Kaushik

unread,
Jan 12, 2024, 9:15:40 PM1/12/24
to dataanalysistraining
Dear Friends

I've explained the second part of Predictive modeling in this video:

TidyText-10 Predictive Analysis using sLDA function (Part-2): https://youtu.be/_FhhditcHGs

Happy Learning
Neeraj

samiran

unread,
Jan 25, 2024, 11:14:50 AM1/25/24
to dataanalys...@googlegroups.com
Hello Sir
Hope you are doing well.
I am facing a problem with Twitter trend location syntax. After authentication of the API, it shows the error 'Error in twInterfaceObj$doAPICall("trends/available", ...) :
  Forbidden (HTTP 403).
Please give some input to solve this one.

Regards
Samiran Sur

--
The members of this group are expected to follow the following Protocols:
1. Please search previous posts in the group before posting the question.
2. Don't write the query in someone's post. Always use the option of New topic for the new question. You can do this by writing to dataanaly...@googlegroups.com
3. It’s better to give a proper subject to your post/query. It'll help others while searching.
4. Never write Open-ended queries. This group intends to help research scholars, NOT TO WORK FOR THEM.
5. Never write words like URGENT in your posts. People will help when they are free.
6. Never upload any information about National Seminars/Conferences. Send such information
in personal emails and feel free to share any RESEARCH-related information.
7. No Happy New Year, Happy Diwali, Happy Holi, Happy Birthday, Happy Anniversary, etc. allowed in this group.
8. Asking or sharing Research Papers is NOT ALLOWED.
9. You can share your questionnaire only once.
---
You received this message because you are subscribed to the Google Groups "DataAnalysis" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataanalysistrai...@googlegroups.com.

Neeraj Kaushik

unread,
Jan 25, 2024, 8:43:01 PM1/25/24
to dataanalys...@googlegroups.com

samiran

unread,
Jan 26, 2024, 8:04:46 AM1/26/24
to dataanalys...@googlegroups.com
Dear Sir

Thank you for your prompt response. 

Regards
Samiran Sur

Neeraj Kaushik

unread,
Apr 23, 2024, 10:04:40 PM4/23/24
to dataanalysistraining
Dear Friends

One of the key decisions in Topic modeling is to determine how many topics shd be extracted from the given text.
Two packages help in this regard

stm::topicquality provides a graphical representation of all topics and show the overlap
topicmodels::LDA provides the perplexity score to determine the optimal number of topics (lower perplexity is better)

I've explained both of them in the video:
TidyText-11 Determining Number of Topics: https://youtu.be/8Vj8EG-XT1g


Happy Learning
Neeraj

Neeraj Kaushik

unread,
Apr 24, 2024, 7:22:24 PM4/24/24
to dataanalysistraining
Dear Friends

Visualization is a great tool for representing the words in each topic.
I've explained the R-code for the same in this video:

TidyText-12 Visualization of words in each Topic: https://youtu.be/b_tkUcD3lYE

You are explore & learn about the topic modeling in detail from

Happy Learning
Neeraj

Dr.Anita Tanwar

unread,
Apr 24, 2024, 11:14:21 PM4/24/24
to dataanalys...@googlegroups.com
Respected Neeraj Sir
Greetings
Kindly suggest any video on ANN model and PNN model in R that helps to learn from scratch and one video for merging the scopus database and wos database in R for bibliometrix
. I will be highly obliged to you

Regards
Dr. Anita Tanwar
Chitkara Business School, Chitkara University.

Keep learning keep growing

--
The members of this group are expected to follow the following Protocols:
1. Please search previous posts in the group before posting the question.
2. Don't write the query in someone's post. Always use the option of New topic for the new question. You can do this by writing to dataanaly...@googlegroups.com
3. It’s better to give a proper subject to your post/query. It'll help others while searching.
4. Never write Open-ended queries. This group intends to help research scholars, NOT TO WORK FOR THEM.
5. Never write words like URGENT in your posts. People will help when they are free.
6. Never upload any information about National Seminars/Conferences. Send such information
in personal emails and feel free to share any RESEARCH-related information.
7. No Happy New Year, Happy Diwali, Happy Holi, Happy Birthday, Happy Anniversary, etc. allowed in this group.
8. Asking or sharing Research Papers is NOT ALLOWED.
9. You can share your questionnaire only once.
---
You received this message because you are subscribed to the Google Groups "DataAnalysis" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataanalysistrai...@googlegroups.com.

Neeraj Kaushik

unread,
Apr 24, 2024, 11:23:15 PM4/24/24
to dataanalys...@googlegroups.com
Here are my inputs:
1. ANN in R -> Work in progress
2. Merging Scopus and WoS data: How to merge scopus and WOS data - YouTube

Dr.Anita Tanwar

unread,
Apr 24, 2024, 11:25:52 PM4/24/24
to dataanalys...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages