non-ASCII characters in R

293 views
Skip to first unread message

Connor Dibble

unread,
Jun 20, 2017, 6:47:44 PM6/20/17
to Davis R Users' Group
Hi All,

I am getting an error on reading some .txt files because for some unknown reason my instruments occasionally record an É (note the accent) instead of a W when recording longitudes. R doesn’t like these characters because they’re non-ASCII. I get an error: multibyte string type of response. Does anyone know a workaround in R?

Alternatively, I’m working on a bash/linux fix that will seek out the non-ASCII characters and replace them. The first half of that I have figured out, but don’t know how to do the replacement part.

I’ve opened a stack overflow thread on this if you care to take a look at the data structure and my bash grep commands.


Connor Dibble
Graduate Group in Ecology 
University of California, Davis




Alex Mandel

unread,
Jun 21, 2017, 11:46:29 AM6/21/17
to davi...@googlegroups.com, Connor Dibble
What's the file encoding?
http://mindspill.net/computing/linux-notes/determine-and-change-file-character-encoding/

I would encourage you to work in UTF-8, R is actually fine with such
characters but not if the file says it's ASCII encoded. Though in your
case it's the wrong character.

Can you provide a small sample file for testing solutions against.

As a poster in stackexchange pointed out, iconv in R is usually part of
the solution. See the sub argument.

Thanks,
Alex Mandel, PhD

Center for Spatial Sciences
http://spatial.ucdavis.edu
Geospatial and Farming Systems Research Consortium
http://gfc.ucdavis.edu
University of California, Davis
> https://stackoverflow.com/questions/44663963/bash-linux-find-non-ascii-character-in-a-txt-file-and-replace-it-with-an-ascii?noredirect=1#comment76313604_44663963 <https://stackoverflow.com/questions/44663963/bash-linux-find-non-ascii-character-in-a-txt-file-and-replace-it-with-an-ascii?noredirect=1#comment76313604_44663963>

Alex Mandel

unread,
Jun 21, 2017, 12:04:06 PM6/21/17
to davi...@googlegroups.com, Connor Dibble
Yup, really need a test file. Making a file with just that character
doesn't cause and issue for R, but my text editor saved the file as UTF8.

Also R is fine with:
sub("È","W","È")

So you probably just need to force R to read the file in as UTF8 and
replace the character.

Enjoy,
Alex Mandel, PhD

Center for Spatial Sciences
http://spatial.ucdavis.edu
Geospatial and Farming Systems Research Consortium
http://gfc.ucdavis.edu
University of California, Davis


Reply all
Reply to author
Forward
0 new messages