encoding problem of read.csv on Windows system

55 views
Skip to first unread message

Alvin Chen

unread,
Mar 8, 2017, 9:17:21 PM3/8/17
to CorpLing with R
Dear all,

I have one encoding question and was wondering if anyone could give me more to-the-point advice.

This is happening in the Windows system only, unfortunately.

I have a csv file (a sample as the attached) in UTF-8. I tried to read this file in Windows using:

raw.data<- scan("test.csv", what = "c", sep="\n", encoding="UTF-8")

It works fine and all the non-ASCII characters show up perfectly. But this is not what I want of course. I would like to read in this csv file as a data frame.

So when I use the read.csv to do so:

raw.data <- read.csv("test.csv",header = TRUE, encoding = "UTF-8")

I got error messages:

Error in type.convert(data[[i]], as.is = as.is[i], dec = dec, numerals = numerals,  : 
  invalid multibyte string at '<e6><9e><97>?寡,'

The file contains not only Chinese characters but also Japanese characters, and other symbols. Is it the main reason why WINDOWS can't handle this? (Other OSs work fine in R for read.csv though)

I tried to google for the solutions quite a bit and it seems that there's a library "readr" to deal with this problem. But this is really not a good solution. I was wondering if any one would have great suggestions for the great population who are still kidnapped by the WINDOWS....

Here's my sessionInfo()

R version 3.3.1 (2016-06-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=Chinese (Traditional)_Taiwan.950  LC_CTYPE=Chinese (Traditional)_Taiwan.950   
[3] LC_MONETARY=Chinese (Traditional)_Taiwan.950 LC_NUMERIC=C                                
[5] LC_TIME=Chinese (Traditional)_Taiwan.950    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] tools_3.3.1

Million thanks in advance!

Best,

Alvin Chen
test.csv

Hardie, Andrew

unread,
Mar 9, 2017, 4:12:30 AM3/9/17
to corplin...@googlegroups.com

This problem may be because your test.csv file does not have a UTF-8 BOM at the start. My guess is that this is confusing some Windows library that R is using, leading it to interpret something in your file incorrectly, and thus generate an invalid byte sequence,  thus the encoding error.

 

Suggestion:

 

·         Open the CSV in Notepad++

·         Change the encoding from ”UTF-8 without BOM” to  “UTF-8”

·         Save

·         Retry the R code

 

(I got a similar error just opening your file in MS Excel until I added the BOM; the procedure above fixed it for Excel at least.)

 

best

 

Andrew.

--
You received this message because you are subscribed to the Google Groups "CorpLing with R" group.
To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with...@googlegroups.com.
To post to this group, send email to corplin...@googlegroups.com.
Visit this group at https://groups.google.com/group/corpling-with-r.
For more options, visit https://groups.google.com/d/optout.

Alvin Chen

unread,
Mar 9, 2017, 5:29:00 AM3/9/17
to corplin...@googlegroups.com
Hi Andrew,

Thank you for the suggestion. 

I tried to convert it to "UTF-8" but the problem is the same. R throws the same error message.

Also, what I don't understand is that why scan() works perfectly to read in the file but read.csv() does not?
All the characters show up correctly in the results using scan():

> raw.data<- scan("../data/Tana-2017-02-08/test.csv", what = "c", sep="\n", encoding="UTF-8")
Read 20 items
> raw.data
 [1] "Autor,Translator,Title,Periodical,Issue,Date"                                                                 
 [2] "林培英,,人山人海合歡此日──双十節の盛典を語る(十月十一日上),一陽週報,7,1945-10-27"                               
 [3] "司會者更與,,人山人海合歡此日──双十節の盛典を語る(十月十一日上),一陽週報,7,1945-10-27"                           
 [4] "吳北海,,人山人海合歡此日──双十節の盛典を語る(十月十一日上),一陽週報,7,1945-10-27"                               
 [5] "葉陶,,人山人海合歡此日──双十節の盛典を語る(十月十一日上),一陽週報,7,1945-10-27"                                 
 [6] "許靑鸞,,人山人海合歡此日──双十節の盛典を語る(十月十一日上),一陽週報,7,1945-10-27"                               
 [7] "楊克煌,,人山人海合歡此日──双十節の盛典を語る(十月十一日上),一陽週報,7,1945-10-27"                               
 [8] "林茂雄,,人山人海合歡此日──双十節の盛典を語る(十月十一日上),一陽週報,7,1945-10-27"                               
 [9] "柯萬蛟,,人山人海合歡此日──双十節の盛典を語る(十月十一日上),一陽週報,7,1945-10-27"                               
[10] "吳淸水,,人山人海合歡此日──双十節の盛典を語る(十月十一日上),一陽週報,7,1945-10-27"                               
[11] "林月鏡,,人山人海合歡此日──双十節の盛典を語る(十月十一日上),一陽週報,7,1945-10-27"                               
[12] "鍾逸人,,人山人海合歡此日──双十節の盛典を語る(十月十一日上),一陽週報,7,1945-10-27"                               
[13] "吳北海,,若き血潮,一陽週報,6,1945-10-06"                                                                         
[14] "國父,,中國國民黨黨歌☆,一陽週報,3,1945-09-15"                                                                    
[15] "孫中山,,中國革命史綱要(上)──序文‧第一章革命の主義──一、民族主義‧二、民權主義‧三、民生主義,一陽週報,7,1945-10-27"
[16] "孫中山,,民國敎育家の任務,一陽週報,8,1945-11-03"                                                                 
[17] "孫中山,,中國革命史綱要(中)──第二章革命の方略‧第三章革命運動──一、黨の樹立,一陽週報,8,1945-11-03"                
[18] "孫中山,,中國革命史綱要(下の一)──二、宣傳‧三、義擧,一陽週報,9,1945-11-17"                                        
[19] "孫中山,,中國革命史綱要(下の二)──第四章辛亥の役,一陽週報,10,1945-11-24"                                          
[20] "孫文,,欲改造新國家當實行三民主義☆,一陽週報,7,1945-10-27" 


Best,

Alvin

To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with-r+unsubscribe@googlegroups.com.
To post to this group, send email to corpling-with-r@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "CorpLing with R" group.
To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with-r+unsubscribe@googlegroups.com.
To post to this group, send email to corpling-with-r@googlegroups.com.

Hardie, Andrew

unread,
Mar 9, 2017, 8:43:39 AM3/9/17
to corplin...@googlegroups.com

Clearly, read.csv() must be using different underlying tools to process the character encoding than does scan(), or else you wouldn’t get an error from one but not the other.  And the fact the error only happens on Window suggests it is a Windows system call that is the ultimate problem (ie a utility function used somewhere in the call tree by the Windows version of read.csv() but not by the Unix versions).

 

How about opening your table in Excel and saving as plain text, i.e. TSV, rather than CSV? The TSV might then be possible to import into R as you would not need the read.csv() function.

 

best

 

Andrew.

To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with...@googlegroups.com.
To post to this group, send email to corplin...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "CorpLing with R" group.

To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with...@googlegroups.com.
To post to this group, send email to corplin...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "CorpLing with R" group.

To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with...@googlegroups.com.
To post to this group, send email to corplin...@googlegroups.com.

Alvin Chen

unread,
Mar 9, 2017, 9:11:43 AM3/9/17
to corplin...@googlegroups.com
Dear Andrew,

As you suggested, I tried to save the file into a TSV using MS-Excel. And it is true that i can use read.delim() to read the file into a data frame. No error messages! 

BUT, some of the characters do not show up correctly (i.e. the Japanese characters). And this is still not something we want though. 

(When I open the new TSV file in Notepad++, those Japanese characters do not show up correctly as well......)

Alvin

> raw.data <- read.delim("test-tab-delimited.txt",header = TRUE, encoding = "utf-8", sep = "\t")
> raw.data
        Autor Translator                                                                            Title Periodical Issue       Date
1      林培英         NA                                   人山人海合歡此日──?十節?盛典?語?(十月十一日上)   一陽週報     7 1945/10/27
2  司會者更與         NA                                   人山人海合歡此日──?十節?盛典?語?(十月十一日上)   一陽週報     7 1945/10/27
3      吳北海         NA                                   人山人海合歡此日──?十節?盛典?語?(十月十一日上)   一陽週報     7 1945/10/27
4        葉陶         NA                                   人山人海合歡此日──?十節?盛典?語?(十月十一日上)   一陽週報     7 1945/10/27
5       許?鸞         NA                                   人山人海合歡此日──?十節?盛典?語?(十月十一日上)   一陽週報     7 1945/10/27
6      楊克煌         NA                                   人山人海合歡此日──?十節?盛典?語?(十月十一日上)   一陽週報     7 1945/10/27
7      林茂雄         NA                                   人山人海合歡此日──?十節?盛典?語?(十月十一日上)   一陽週報     7 1945/10/27
8      柯萬蛟         NA                                   人山人海合歡此日──?十節?盛典?語?(十月十一日上)   一陽週報     7 1945/10/27
9       吳?水         NA                                   人山人海合歡此日──?十節?盛典?語?(十月十一日上)   一陽週報     7 1945/10/27
10     林月鏡         NA                                   人山人海合歡此日──?十節?盛典?語?(十月十一日上)   一陽週報     7 1945/10/27
11     鍾逸人         NA                                   人山人海合歡此日──?十節?盛典?語?(十月十一日上)   一陽週報     7 1945/10/27
12     吳北海         NA                                                                          若?血潮   一陽週報     6  1945/10/6
13       國父         NA                                                                  中國國民黨黨歌☆   一陽週報     3  1945/9/15
14     孫中山         NA 中國革命史綱要(上)──序文‧第一章革命?主義──一、民族主義‧二、民權主義‧三、民生主義   一陽週報     7 1945/10/27
15     孫中山         NA                                                                   民國?育家?任務   一陽週報     8  1945/11/3
16     孫中山         NA                  中國革命史綱要(中)──第二章革命?方略‧第三章革命運動──一、黨?樹立   一陽週報     8  1945/11/3
17     孫中山         NA                                          中國革命史綱要(下?一)──二、宣傳‧三、義?   一陽週報     9 1945/11/17
18     孫中山         NA                                             中國革命史綱要(下?二)──第四章辛亥?役   一陽週報    10 1945/11/24
19       孫文         NA                                                      欲改造新國家當實行三民主義☆   一陽週報     7 1945/10/27

*****
Alvin C.-H. Chen, Ph.D. (陳正賢)
Assistant Professor
Department of English
National Taiwan Normal University
國立台灣師範大學英語系
Taipei City, 106, Taiwan

To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with-r+unsubscribe@googlegroups.com.
To post to this group, send email to corpling-with-r@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "CorpLing with R" group.

To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with-r+unsubscribe@googlegroups.com.
To post to this group, send email to corpling-with-r@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "CorpLing with R" group.

To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with-r+unsubscribe@googlegroups.com.
To post to this group, send email to corpling-with-r@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "CorpLing with R" group.
To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with-r+unsubscribe@googlegroups.com.
To post to this group, send email to corpling-with-r@googlegroups.com.

Hardie, Andrew

unread,
Mar 9, 2017, 9:31:50 AM3/9/17
to corplin...@googlegroups.com

Strange, I can’t reproduce this. When I save your test CSV file as type “Unicode Text (*.txt)”, the result is a TSV file where all the characters show up fine in Notepad++ , including those instances of U+53cc () that are wrong in your file.

 

(The “Unicode Text” file is UTF-16 not UTF-8 though, but this is not hard to switch…)

 

You could try copy-pasting from Excel to a blank text file in N++ to generate TSV, instead of the save-as “.txt” method, and maybe that might generate something with the right characters for use with R?

 

best

 

Andrew.

 

 

From: corplin...@googlegroups.com [mailto:corplin...@googlegroups.com] On Behalf Of Alvin Chen
Sent: 09 March 2017 14:12
To: corplin...@googlegroups.com
Subject: Re: [CorpLing with R] encoding problem of read.csv on Windows system

 

Dear Andrew,

To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with...@googlegroups.com.
To post to this group, send email to corplin...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "CorpLing with R" group.

To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with...@googlegroups.com.
To post to this group, send email to corplin...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "CorpLing with R" group.

To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with...@googlegroups.com.
To post to this group, send email to corplin...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "CorpLing with R" group.

To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with...@googlegroups.com.
To post to this group, send email to corplin...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "CorpLing with R" group.

To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with...@googlegroups.com.
To post to this group, send email to corplin...@googlegroups.com.

Alvin Chen

unread,
Mar 9, 2017, 9:45:01 AM3/9/17
to corplin...@googlegroups.com
Dear Andrew,

Ok. Now I copied directly from Excel to Notepad++ and indeed those characters show up fine in notepad++. BUT when I use the read.delim() to read the text file, I got the same error message AGAIN........(I also tried to convert the text from "UTF 8 without BOM" to "UTF 8", and still not working)

Alvin

> raw.data <- read.delim("test-tab-delimited.txt",header = TRUE, encoding = "utf-8", sep = "\t")
Error in type.convert(data[[i]], as.is = as.is[i], dec = dec, numerals = numerals,  : 
  invalid multibyte string at '<e6><9e><97>?寡 '

*****
Alvin C.-H. Chen, Ph.D. (陳正賢)
Assistant Professor
Department of English
National Taiwan Normal University
國立台灣師範大學英語系
Taipei City, 106, Taiwan

On Thu, Mar 9, 2017 at 10:31 PM, Hardie, Andrew <a.ha...@lancaster.ac.uk> wrote:

Strange, I can’t reproduce this. When I save your test CSV file as type “Unicode Text (*.txt)”, the result is a TSV file where all the characters show up fine in Notepad++ , including those instances of U+53cc () that are wrong in your file.

 

(The “Unicode Text” file is UTF-16 not UTF-8 though, but this is not hard to switch…)

 

You could try copy-pasting from Excel to a blank text file in N++ to generate TSV, instead of the save-as “.txt” method, and maybe that might generate something with the right characters for use with R?

 

best

 

Andrew.

 

 

From: corpling-with-r@googlegroups.com [mailto:corpling-with-r@googlegroups.com] On Behalf Of Alvin Chen
Sent: 09 March 2017 14:12

To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with-r+unsubscribe@googlegroups.com.
To post to this group, send email to corpling-with-r@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "CorpLing with R" group.

To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with-r+unsubscribe@googlegroups.com.
To post to this group, send email to corpling-with-r@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "CorpLing with R" group.

To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with-r+unsubscribe@googlegroups.com.
To post to this group, send email to corpling-with-r@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "CorpLing with R" group.

To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with-r+unsubscribe@googlegroups.com.
To post to this group, send email to corpling-with-r@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "CorpLing with R" group.

To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with-r+unsubscribe@googlegroups.com.
To post to this group, send email to corpling-with-r@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "CorpLing with R" group.
To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with-r+unsubscribe@googlegroups.com.
To post to this group, send email to corpling-with-r@googlegroups.com.

Hardie, Andrew

unread,
Mar 9, 2017, 9:49:53 AM3/9/17
to corplin...@googlegroups.com

Hmm, I’m out of ideas I’m afraid…

 

best

 

Andrew.

 

From: corplin...@googlegroups.com [mailto:corplin...@googlegroups.com] On Behalf Of Alvin Chen
Sent: 09 March 2017 14:45
To: corplin...@googlegroups.com
Subject: Re: [CorpLing with R] encoding problem of read.csv on Windows system

 

Dear Andrew,

 

Ok. Now I copied directly from Excel to Notepad++ and indeed those characters show up fine in notepad++. BUT when I use the read.delim() to read the text file, I got the same error message AGAIN........(I also tried to convert the text from "UTF 8 without BOM" to "UTF 8", and still not working)

 

Alvin

 

> raw.data <- read.delim("test-tab-delimited.txt",header = TRUE, encoding = "utf-8", sep = "\t")

Error in type.convert(data[[i]], as.is = as.is[i], dec = dec, numerals = numerals,  : 

  invalid multibyte string at '<e6><9e><97>?   '


*****
Alvin C.-H. Chen, Ph.D. (
陳正賢)
Assistant Professor
Department of English
National Taiwan Normal University
國立台灣師範大學英語系

Taipei City, 106, Taiwan

 

On Thu, Mar 9, 2017 at 10:31 PM, Hardie, Andrew <a.ha...@lancaster.ac.uk> wrote:

Strange, I can’t reproduce this. When I save your test CSV file as type “Unicode Text (*.txt)”, the result is a TSV file where all the characters show up fine in Notepad++ , including those instances of U+53cc () that are wrong in your file.

 

(The “Unicode Text” file is UTF-16 not UTF-8 though, but this is not hard to switch…)

 

You could try copy-pasting from Excel to a blank text file in N++ to generate TSV, instead of the save-as “.txt” method, and maybe that might generate something with the right characters for use with R?

 

best

 

Andrew.

 

 

From: corplin...@googlegroups.com [mailto:corplin...@googlegroups.com] On Behalf Of Alvin Chen
Sent: 09 March 2017 14:12

To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with...@googlegroups.com.
To post to this group, send email to corplin...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "CorpLing with R" group.

To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with...@googlegroups.com.
To post to this group, send email to corplin...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "CorpLing with R" group.

To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with...@googlegroups.com.
To post to this group, send email to corplin...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "CorpLing with R" group.

To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with...@googlegroups.com.
To post to this group, send email to corplin...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "CorpLing with R" group.

To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with...@googlegroups.com.
To post to this group, send email to corplin...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "CorpLing with R" group.

To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with...@googlegroups.com.
To post to this group, send email to corplin...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "CorpLing with R" group.

To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with...@googlegroups.com.
To post to this group, send email to corplin...@googlegroups.com.

Alvin Chen

unread,
Mar 9, 2017, 4:33:37 PM3/9/17
to corplin...@googlegroups.com
Hi Andrew,

Still, really appreciate your time on this. 

The encoding problem in Windows has been giving me a lot of headache. And this one is the most bizarre one to me. I think in Windows, if a text file contains non-ASCII characters from more than TWO languages, problems may arise. Again, this does not happen in Mac or Linux.

Hope that some others could show me more light in the "window".

Alvin

*****
Alvin C.-H. Chen, Ph.D. (陳正賢)
Assistant Professor
Department of English
National Taiwan Normal University
國立台灣師範大學英語系
Taipei City, 106, Taiwan

On Thu, Mar 9, 2017 at 10:49 PM, Hardie, Andrew <a.ha...@lancaster.ac.uk> wrote:

Hmm, I’m out of ideas I’m afraid…

 

best

 

Andrew.

 

From: corpling-with-r@googlegroups.com [mailto:corpling-with-r@googlegroups.com] On Behalf Of Alvin Chen
Sent: 09 March 2017 14:45


Subject: Re: [CorpLing with R] encoding problem of read.csv on Windows system

 

Dear Andrew,

 

Ok. Now I copied directly from Excel to Notepad++ and indeed those characters show up fine in notepad++. BUT when I use the read.delim() to read the text file, I got the same error message AGAIN........(I also tried to convert the text from "UTF 8 without BOM" to "UTF 8", and still not working)

 

Alvin

 

> raw.data <- read.delim("test-tab-delimited.txt",header = TRUE, encoding = "utf-8", sep = "\t")

Error in type.convert(data[[i]], as.is = as.is[i], dec = dec, numerals = numerals,  : 

  invalid multibyte string at '<e6><9e><97>?   '


*****
Alvin C.-H. Chen, Ph.D. (
陳正賢)
Assistant Professor
Department of English
National Taiwan Normal University
國立台灣師範大學英語系

Taipei City, 106, Taiwan

 

On Thu, Mar 9, 2017 at 10:31 PM, Hardie, Andrew <a.ha...@lancaster.ac.uk> wrote:

Strange, I can’t reproduce this. When I save your test CSV file as type “Unicode Text (*.txt)”, the result is a TSV file where all the characters show up fine in Notepad++ , including those instances of U+53cc () that are wrong in your file.

 

(The “Unicode Text” file is UTF-16 not UTF-8 though, but this is not hard to switch…)

 

You could try copy-pasting from Excel to a blank text file in N++ to generate TSV, instead of the save-as “.txt” method, and maybe that might generate something with the right characters for use with R?

 

best

 

Andrew.

 

 

From: corpling-with-r@googlegroups.com [mailto:corpling-with-r@googlegroups.com] On Behalf Of Alvin Chen
Sent: 09 March 2017 14:12

To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with-r+unsubscribe@googlegroups.com.
To post to this group, send email to corpling-with-r@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "CorpLing with R" group.

To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with-r+unsubscribe@googlegroups.com.
To post to this group, send email to corpling-with-r@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "CorpLing with R" group.

To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with-r+unsubscribe@googlegroups.com.
To post to this group, send email to corpling-with-r@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "CorpLing with R" group.

To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with-r+unsubscribe@googlegroups.com.
To post to this group, send email to corpling-with-r@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "CorpLing with R" group.

To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with-r+unsubscribe@googlegroups.com.
To post to this group, send email to corpling-with-r@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "CorpLing with R" group.

To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with-r+unsubscribe@googlegroups.com.
To post to this group, send email to corpling-with-r@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "CorpLing with R" group.

To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with-r+unsubscribe@googlegroups.com.
To post to this group, send email to corpling-with-r@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "CorpLing with R" group.
To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with-r+unsubscribe@googlegroups.com.
To post to this group, send email to corpling-with-r@googlegroups.com.

Stefan Th. Gries

unread,
Mar 9, 2017, 5:10:25 PM3/9/17
to CorpLing with R
It is very weird tho that I didn't have real problems with a Win10
virtual box in Linux Mint 18.1. Reading things in with scan was fine,
with read.csv didn't show the characters right but gave unicode code
points, but I didn't get Alvin's error messages in the virtual box ...
Not that helpful, but just FYI ...

Alvin Chen

unread,
Mar 9, 2017, 5:17:46 PM3/9/17
to corplin...@googlegroups.com
I wonder if anyone would get a similar message for those who also run Windows (for me Win7) on a c950 locale?

As I said earlier, there's been a discussion on using the library "readr" to deal with this problem <http://people.fas.harvard.edu/~izahn/posts/reading-data-with-non-native-encoding-in-r/>. But I still want to fix this with native R functions....

Best,
Alvin

*****
Alvin C.-H. Chen, Ph.D. (陳正賢)
Assistant Professor
Department of English
National Taiwan Normal University
國立台灣師範大學英語系
Taipei City, 106, Taiwan

Alvin Chen

unread,
Mar 9, 2017, 5:19:52 PM3/9/17
to corplin...@googlegroups.com
Oh. And I forgot the mention. Even though I use the read_csv() in the library "readr", the problem is not solved though. It turns out that the read_csv() won't throw an error message (which is better than read.csv()), BUT the Japanese characters still don't show up correctly in R.

Alvin

*****
Alvin C.-H. Chen, Ph.D. (陳正賢)
Assistant Professor
Department of English
National Taiwan Normal University
國立台灣師範大學英語系
Taipei City, 106, Taiwan

Reply all
Reply to author
Forward
0 new messages