Regarding the errors wrking with UTF-8 files

32 views

Skip to first unread message

RB

unread,

Apr 7, 2013, 1:11:00 AM4/7/13

to corplin...@googlegroups.com

Hai everybody.

I got a small error, while I was working in R with the Telugu text file saved in UTF-8 format, R is not showing the desired output when i prompt for read lines for some of the files.(its not loading correctly, the function which i gave was :
corpus.file<-scan(select.list(

dir()), what="char", sep="\n", encoding="UTF-8")
Read 1 item)

second, how to give the command to know each word boundary in the Telugu text file.( when i followed the book and gave the commands , R gave the results in some other format)
pls help me to understand in detail.

I have attached one of me sample text.
Pls find the .text attached.

Kindly guide me to go further.

Thanking you.

sample telugu.txt

gasyoun

unread,

Jan 4, 2014, 3:49:23 PM1/4/14

to corplin...@googlegroups.com

For Sanskrit texts in devanagari I use SLP1. Maybe you want to convert it to SLP1 which is like ASCII and than work with it? See https://groups.google.com/forum/#!topic/bvparishat/cNoHQNYriks - Indian scripts are hard to work with and I know - it's