Respected Sir
First of all, thank you for providing extensive knowledge on various research domains through your YouTube channel. I am currently learning text analysis for teaching and academic purposes. While following your videos, I observed that few line codes related to cleaning of corpus have been used during the analysis process. However, as far as I could find, a detailed explanation of these codes has not been covered in a sequential manner in any of the videos.
I would be grateful if you could kindly explain the following codes:
#Clean corpus created by OCR
t=tokens_select(t,
c("[\\d-]", "[[:punct:]]", "^.{1,2}$"),
selection =
"remove",
valuetype =
"regex",
verbose = TRUE
)
What I would like to know is as follows:
1. What is the full form of OCR?
2. What is the meaning and interpretation of the commands used in the code?
I tried asking ChatGPT, but I did not receive a satisfactory explanation.
If you could help me by creating a video on this topic, or by referring me to a relevant video created by someone else, it would be very helpful. Although I am able to perform the analysis without a detailed understanding of these commands, I would like to learn the underlying concepts as well.
However, if you feel that this is not an important topic or is not required for the intended purpose, that is perfectly fine with me too.
Thank you for your time and support.
Thanks & Regards
Manoj Kumar Mishra
Assistant Professor | Marwadi University
Scopus ID: https://www.scopus.com/authid/detail.uri?authorId=58202009700
ORCID ID: 0000-0003-1857-9076
Web of Science ResearcherID: AFA-7185-2022
Email ID: manojkum...@marwadieducation.edu.in/misram...@gmail.com
Mobile: 9839661837/8707576779