Query for R-code in Quanteda Pacakge Playlist

0 views

Skip to first unread message

Manoj Mishra Study and Motivation

unread,

Jun 10, 2026, 11:53:50 PM (2 hours ago) Jun 10

to dataanalys...@googlegroups.com, Neeraj Kaushik

Respected Sir

First of all, thank you for providing extensive knowledge on various research domains through your YouTube channel. I am currently learning text analysis for teaching and academic purposes. While following your videos, I observed that few line codes related to cleaning of corpus have been used during the analysis process. However, as far as I could find, a detailed explanation of these codes has not been covered in a sequential manner in any of the videos.

I would be grateful if you could kindly explain the following codes:

#Clean corpus created by OCR
t=tokens_select(t,
c("[\\d-]", "[[:punct:]]", "^.{1,2}$"),
selection = "remove",
valuetype = "regex",
verbose = TRUE
)

What I would like to know is as follows:

1. What is the full form of OCR?

2. What is the meaning and interpretation of the commands used in the code?

I tried asking ChatGPT, but I did not receive a satisfactory explanation.

If you could help me by creating a video on this topic, or by referring me to a relevant video created by someone else, it would be very helpful. Although I am able to perform the analysis without a detailed understanding of these commands, I would like to learn the underlying concepts as well.

However, if you feel that this is not an important topic or is not required for the intended purpose, that is perfectly fine with me too.

Thank you for your time and support.

Thanks & Regards

Manoj Kumar Mishra

Assistant Professor | Marwadi University

Scopus ID: https://www.scopus.com/authid/detail.uri?authorId=58202009700

ORCID ID: 0000-0003-1857-9076

Web of Science ResearcherID: AFA-7185-2022

Email ID: manojkum...@marwadieducation.edu.in/misram...@gmail.com

Mobile: 9839661837/8707576779

Reply all

Reply to author

Forward

0 new messages