Creating a new Corpus using R

81 views
Skip to first unread message

Utku Turk

unread,
Jul 13, 2020, 2:27:57 AM7/13/20
to CorpLing with R
Hi everybody,

I am quite new in corpus linguistics, and I am doing it out of curiosity. I have certain texts and dictionary material in a language that has not been documented with a corpus before (Pontic Greek). I want to learn how to create an e-lexicon and corpus. Preferably, I want to have an either website or shiny app with it. But it is in the future. I want to focus on how to deal with texts and dictionary material from nothing. Most of the explanation online and in the relevant books are for analyzing the data or stats of a corpus. I do not know where to start or how to start. I would appreciate your help greatly!

Best,
Utku

Stefan Th. Gries

unread,
Jul 13, 2020, 10:58:03 PM7/13/20
to CorpLing with R
Well, this list is less concerned with compiling a new corpus than it
is with processing an existing corpus but maybe this kind reference
might help:
- <https://www.springer.com/gp/book/9789402408799>
- <http://icar.cnrs.fr/ecole_thematique/contaci/documents/Baude/wynne.pdf>
- <https://www.palgrave.com/gp/book/9781403943668> and its other volumes
- <https://www.oxfordhandbooks.com/view/10.1093/oxfordhb/9780190610029.001.0001/oxfordhb-9780190610029-e-14>
- the forthcoming MIT Open Handbook of Linguistic Data Management.

Utku Turk

unread,
Jul 14, 2020, 2:59:29 AM7/14/20
to CorpLing with R
Thank you for your help! 

14 Temmuz 2020 Salı tarihinde saat 05:58:03 UTC+3 itibarıyla Stefan Th. Gries şunları yazdı:

steffen.schaub

unread,
Jul 14, 2020, 4:32:33 AM7/14/20
to CorpLing with R
Hi Utku,

I'm currently teaching a course on creating and annotating corpora using R. It's primarily geared towards extracting web data (Twitter, YouTube comments, online articles, etc.), but perhaps I can help. Feel free to contact me directly.

Best,
Steffen

Utku Turk

unread,
Jul 14, 2020, 4:48:10 AM7/14/20
to CorpLing with R
Hi Steffen! I contacted you over Twitter. I could not find a way to send you an e-mail using this platform. Sorry for bothering on another platform.

14 Temmuz 2020 Salı tarihinde saat 11:32:33 UTC+3 itibarıyla steffen.schaub şunları yazdı:

steffen.schaub

unread,
Jul 14, 2020, 5:08:27 AM7/14/20
to CorpLing with R
That's fine with me :o)

Anna Jordanous

unread,
Jul 14, 2020, 6:00:41 AM7/14/20
to CorpLing with R
Dear Utku Turk
In terms of working with Pontic Greek, you may wish to post on the Humanist Discussion group , which many digital humanities scholars subscribe to. I should imagine several people are working with Pontic Greek or closely related languages in that research community. They would either have useful comments on guiding you (either from working in Pontic Greek or other older languages). You will probably also find people interested in following what you are doing.


regards,
Anna
Reply all
Reply to author
Forward
0 new messages