Regarding the errors wrking with UTF-8 files

32 views
Skip to first unread message

RB

unread,
Apr 7, 2013, 1:11:00 AM4/7/13
to corplin...@googlegroups.com
 Hai everybody.


I got a small error, while I was working in R with the Telugu text file saved in UTF-8 format, R is not   showing the desired output when i prompt for read lines for some of the files.(its not loading correctly, the function which i gave was :
corpus.file<-scan(select.list(
dir()), what="char", sep="\n", encoding="UTF-8")
Read 1 item)

second, how to give the command to know  each word boundary in the Telugu text file.( when i followed the book and gave the commands , R gave the results in some other format)
 pls help me to understand in detail.

I have attached one of me sample text.
Pls find the .text attached.


Kindly guide me to go further.


Thanking you.
sample telugu.txt

gasyoun

unread,
Jan 4, 2014, 3:49:23 PM1/4/14
to corplin...@googlegroups.com
For Sanskrit texts in devanagari I use SLP1. Maybe you want to convert it to SLP1 which is like ASCII and than work with it? See https://groups.google.com/forum/#!topic/bvparishat/cNoHQNYriks - Indian scripts are hard to work with and I know - it's 
since 2002 I'm dealing with them.
Reply all
Reply to author
Forward
0 new messages