This problem may be because your test.csv file does not have a UTF-8 BOM at the start. My guess is that this is confusing some Windows library that R is using, leading it to interpret something in your file incorrectly, and thus generate an invalid byte sequence, thus the encoding error.
Suggestion:
· Open the CSV in Notepad++
· Change the encoding from ”UTF-8 without BOM” to “UTF-8”
· Save
· Retry the R code
(I got a similar error just opening your file in MS Excel until I added the BOM; the procedure above fixed it for Excel at least.)
best
Andrew.
--
You received this message because you are subscribed to the Google Groups "CorpLing with R" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
corpling-with...@googlegroups.com.
To post to this group, send email to
corplin...@googlegroups.com.
Visit this group at https://groups.google.com/group/corpling-with-r.
For more options, visit https://groups.google.com/d/optout.
To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with-r+unsubscribe@googlegroups.com.
To post to this group, send email to corpling-with-r@googlegroups.com.
Visit this group at https://groups.google.com/group/corpling-with-r.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "CorpLing with R" group.
To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with-r+unsubscribe@googlegroups.com.
To post to this group, send email to corpling-with-r@googlegroups.com.
Clearly, read.csv() must be using different underlying tools to process the character encoding than does scan(), or else you wouldn’t get an error from one but not the other. And the fact the error only happens on Window suggests it is a Windows system call that is the ultimate problem (ie a utility function used somewhere in the call tree by the Windows version of read.csv() but not by the Unix versions).
How about opening your table in Excel and saving as plain text, i.e. TSV, rather than CSV? The TSV might then be possible to import into R as you would not need the read.csv() function.
best
Andrew.
To unsubscribe from this group and stop receiving emails from it, send an email to
corpling-with...@googlegroups.com.
To post to this group, send email to
corplin...@googlegroups.com.
Visit this group at
https://groups.google.com/group/corpling-with-r.
For more options, visit
https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "CorpLing with R" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
corpling-with...@googlegroups.com.
To post to this group, send email to
corplin...@googlegroups.com.
Visit this group at
https://groups.google.com/group/corpling-with-r.
For more options, visit
https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "CorpLing with R" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
corpling-with...@googlegroups.com.
To post to this group, send email to
corplin...@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with-r+unsubscribe@googlegroups.com.
To post to this group, send email to corpling-with-r@googlegroups.com.
Visit this group at https://groups.google.com/group/corpling-with-r.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "CorpLing with R" group.
To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with-r+unsubscribe@googlegroups.com.
To post to this group, send email to corpling-with-r@googlegroups.com.
Visit this group at https://groups.google.com/group/corpling-with-r.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "CorpLing with R" group.
To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with-r+unsubscribe@googlegroups.com.
To post to this group, send email to corpling-with-r@googlegroups.com.
Visit this group at https://groups.google.com/group/corpling-with-r.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "CorpLing with R" group.
To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with-r+unsubscribe@googlegroups.com.
To post to this group, send email to corpling-with-r@googlegroups.com.
Strange, I can’t reproduce this. When I save your test CSV file as type “Unicode Text (*.txt)”, the result is a TSV file where all the characters show up fine in Notepad++ , including those instances of U+53cc (双) that are wrong in your file.
(The “Unicode Text” file is UTF-16 not UTF-8 though, but this is not hard to switch…)
You could try copy-pasting from Excel to a blank text file in N++ to generate TSV, instead of the save-as “.txt” method, and maybe that might generate something with the right characters for use with R?
best
Andrew.
From: corplin...@googlegroups.com [mailto:corplin...@googlegroups.com]
On Behalf Of Alvin Chen
Sent: 09 March 2017 14:12
To: corplin...@googlegroups.com
Subject: Re: [CorpLing with R] encoding problem of read.csv on Windows system
Dear Andrew,
To unsubscribe from this group and stop receiving emails from it, send an email to
corpling-with...@googlegroups.com.
To post to this group, send email to
corplin...@googlegroups.com.
Visit this group at
https://groups.google.com/group/corpling-with-r.
For more options, visit
https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "CorpLing with R" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
corpling-with...@googlegroups.com.
To post to this group, send email to
corplin...@googlegroups.com.
Visit this group at
https://groups.google.com/group/corpling-with-r.
For more options, visit
https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "CorpLing with R" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
corpling-with...@googlegroups.com.
To post to this group, send email to
corplin...@googlegroups.com.
Visit this group at
https://groups.google.com/group/corpling-with-r.
For more options, visit
https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "CorpLing with R" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
corpling-with...@googlegroups.com.
To post to this group, send email to
corplin...@googlegroups.com.
Visit this group at
https://groups.google.com/group/corpling-with-r.
For more options, visit
https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "CorpLing with R" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
corpling-with...@googlegroups.com.
To post to this group, send email to
corplin...@googlegroups.com.
Strange, I can’t reproduce this. When I save your test CSV file as type “Unicode Text (*.txt)”, the result is a TSV file where all the characters show up fine in Notepad++ , including those instances of U+53cc (双) that are wrong in your file.
(The “Unicode Text” file is UTF-16 not UTF-8 though, but this is not hard to switch…)
You could try copy-pasting from Excel to a blank text file in N++ to generate TSV, instead of the save-as “.txt” method, and maybe that might generate something with the right characters for use with R?
best
Andrew.
From: corpling-with-r@googlegroups.com [mailto:corpling-with-r@googlegroups.com] On Behalf Of Alvin Chen
Sent: 09 March 2017 14:12
To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with-r+unsubscribe@googlegroups.com.
To post to this group, send email to corpling-with-r@googlegroups.com.
Visit this group at https://groups.google.com/group/corpling-with-r.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "CorpLing with R" group.
To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with-r+unsubscribe@googlegroups.com.
To post to this group, send email to corpling-with-r@googlegroups.com.
Visit this group at https://groups.google.com/group/corpling-with-r.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "CorpLing with R" group.
To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with-r+unsubscribe@googlegroups.com.
To post to this group, send email to corpling-with-r@googlegroups.com.
Visit this group at https://groups.google.com/group/corpling-with-r.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "CorpLing with R" group.
To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with-r+unsubscribe@googlegroups.com.
To post to this group, send email to corpling-with-r@googlegroups.com.
Visit this group at https://groups.google.com/group/corpling-with-r.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "CorpLing with R" group.
To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with-r+unsubscribe@googlegroups.com.
To post to this group, send email to corpling-with-r@googlegroups.com.
Visit this group at https://groups.google.com/group/corpling-with-r.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "CorpLing with R" group.
To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with-r+unsubscribe@googlegroups.com.
To post to this group, send email to corpling-with-r@googlegroups.com.
Hmm, I’m out of ideas I’m afraid…
best
Andrew.
From: corplin...@googlegroups.com [mailto:corplin...@googlegroups.com]
On Behalf Of Alvin Chen
Sent: 09 March 2017 14:45
To: corplin...@googlegroups.com
Subject: Re: [CorpLing with R] encoding problem of read.csv on Windows system
Dear Andrew,
Ok. Now I copied directly from Excel to Notepad++ and indeed those characters show up fine in notepad++. BUT when I use the read.delim() to read the text file, I got the same error message AGAIN........(I also tried to convert the text from "UTF 8 without BOM" to "UTF 8", and still not working)
*****
Alvin C.-H. Chen, Ph.D. (陳正賢)
Assistant Professor
Department of English
National Taiwan Normal University
國立台灣師範大學英語系
Taipei City, 106, Taiwan
On Thu, Mar 9, 2017 at 10:31 PM, Hardie, Andrew <a.ha...@lancaster.ac.uk> wrote:
Strange, I can’t reproduce this. When I save your test CSV file as type “Unicode Text (*.txt)”, the result is a TSV file where all the characters show up fine in Notepad++ , including those instances of U+53cc (双) that are wrong in your file.
(The “Unicode Text” file is UTF-16 not UTF-8 though, but this is not hard to switch…)
You could try copy-pasting from Excel to a blank text file in N++ to generate TSV, instead of the save-as “.txt” method, and maybe that might generate something with the right characters for use with R?
best
Andrew.
From:
corplin...@googlegroups.com [mailto:corplin...@googlegroups.com]
On Behalf Of Alvin Chen
Sent: 09 March 2017 14:12
To unsubscribe from this group and stop receiving emails from it, send an email to
corpling-with...@googlegroups.com.
To post to this group, send email to
corplin...@googlegroups.com.
Visit this group at
https://groups.google.com/group/corpling-with-r.
For more options, visit
https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "CorpLing with R" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
corpling-with...@googlegroups.com.
To post to this group, send email to
corplin...@googlegroups.com.
Visit this group at
https://groups.google.com/group/corpling-with-r.
For more options, visit
https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "CorpLing with R" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
corpling-with...@googlegroups.com.
To post to this group, send email to
corplin...@googlegroups.com.
Visit this group at
https://groups.google.com/group/corpling-with-r.
For more options, visit
https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "CorpLing with R" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
corpling-with...@googlegroups.com.
To post to this group, send email to
corplin...@googlegroups.com.
Visit this group at
https://groups.google.com/group/corpling-with-r.
For more options, visit
https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "CorpLing with R" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
corpling-with...@googlegroups.com.
To post to this group, send email to
corplin...@googlegroups.com.
Visit this group at
https://groups.google.com/group/corpling-with-r.
For more options, visit
https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "CorpLing with R" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
corpling-with...@googlegroups.com.
To post to this group, send email to
corplin...@googlegroups.com.
Visit this group at
https://groups.google.com/group/corpling-with-r.
For more options, visit
https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "CorpLing with R" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
corpling-with...@googlegroups.com.
To post to this group, send email to
corplin...@googlegroups.com.
Hmm, I’m out of ideas I’m afraid…
best
Andrew.
From: corpling-with-r@googlegroups.com [mailto:corpling-with-r@googlegroups.com] On Behalf Of Alvin Chen
Sent: 09 March 2017 14:45
Subject: Re: [CorpLing with R] encoding problem of read.csv on Windows system
Dear Andrew,
Ok. Now I copied directly from Excel to Notepad++ and indeed those characters show up fine in notepad++. BUT when I use the read.delim() to read the text file, I got the same error message AGAIN........(I also tried to convert the text from "UTF 8 without BOM" to "UTF 8", and still not working)
Alvin
*****
Alvin C.-H. Chen, Ph.D. (陳正賢)
Assistant Professor
Department of English
National Taiwan Normal University
國立台灣師範大學英語系Taipei City, 106, Taiwan
On Thu, Mar 9, 2017 at 10:31 PM, Hardie, Andrew <a.ha...@lancaster.ac.uk> wrote:
Strange, I can’t reproduce this. When I save your test CSV file as type “Unicode Text (*.txt)”, the result is a TSV file where all the characters show up fine in Notepad++ , including those instances of U+53cc (双) that are wrong in your file.
(The “Unicode Text” file is UTF-16 not UTF-8 though, but this is not hard to switch…)
You could try copy-pasting from Excel to a blank text file in N++ to generate TSV, instead of the save-as “.txt” method, and maybe that might generate something with the right characters for use with R?
best
Andrew.
From: corpling-with-r@googlegroups.com [mailto:corpling-with-r@googlegroups.com] On Behalf Of Alvin Chen
Sent: 09 March 2017 14:12
To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with-r+unsubscribe@googlegroups.com.
To post to this group, send email to corpling-with-r@googlegroups.com.
Visit this group at https://groups.google.com/group/corpling-with-r.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "CorpLing with R" group.
To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with-r+unsubscribe@googlegroups.com.
To post to this group, send email to corpling-with-r@googlegroups.com.
Visit this group at https://groups.google.com/group/corpling-with-r.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "CorpLing with R" group.
To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with-r+unsubscribe@googlegroups.com.
To post to this group, send email to corpling-with-r@googlegroups.com.
Visit this group at https://groups.google.com/group/corpling-with-r.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "CorpLing with R" group.
To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with-r+unsubscribe@googlegroups.com.
To post to this group, send email to corpling-with-r@googlegroups.com.
Visit this group at https://groups.google.com/group/corpling-with-r.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "CorpLing with R" group.
To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with-r+unsubscribe@googlegroups.com.
To post to this group, send email to corpling-with-r@googlegroups.com.
Visit this group at https://groups.google.com/group/corpling-with-r.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "CorpLing with R" group.
To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with-r+unsubscribe@googlegroups.com.
To post to this group, send email to corpling-with-r@googlegroups.com.
Visit this group at https://groups.google.com/group/corpling-with-r.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "CorpLing with R" group.
To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with-r+unsubscribe@googlegroups.com.
To post to this group, send email to corpling-with-r@googlegroups.com.
Visit this group at https://groups.google.com/group/corpling-with-r.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "CorpLing with R" group.
To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with-r+unsubscribe@googlegroups.com.
To post to this group, send email to corpling-with-r@googlegroups.com.