[R] How to substitute special characters within a data frame?

0 views
Skip to first unread message

Yingfu Xie

unread,
Aug 15, 2008, 5:54:22 AM8/15/08
to r-h...@r-project.org, Yingfu Xie
Hello all,

I have a data frame in R, imported from an excel file in Swedish. The original file contains several columns that have special characters, such as \¨{a}, \¨{o}, and so on. After import such special characters are represented in the data frame by "\\345", "\\366" etc (don't ask me why). For example, a word "Hårkan" becomes ''H\\345rkan".

Now my question is if it is possible to substitute those "H\\345rkan" by "Haarkan" or simply "Harkan" in R, ideally by finding those "\\345" and then replacing.

Thanks in advance,
Yingfu

[[alternative HTML version deleted]]

Henrique Dallazuanna

unread,
Aug 15, 2008, 7:20:04 AM8/15/08
to Yingfu Xie, r-h...@r-project.org
Try this:

gsub("\\\\345", "a", "H\\345rkan")

But see:

cat("H\345rkan\n")

> ______________________________________________
> R-h...@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

--
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O

______________________________________________
R-h...@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Prof Brian Ripley

unread,
Aug 15, 2008, 6:43:36 AM8/15/08
to Yingfu Xie, r-h...@r-project.org
You've not told us the 'at a minimum' information requested in the posting
guide. What OS? What locale? And how did you 'import'?

But here's a guess. If you change \\345 to \345, it should render
correctly in a Latin-1 locale:

> "H\345rkan"
[1] "Hårkan"

If this a UTF-8 locale, convert it

> iconv("H\345rkan", "latin1")
[1] "Hårkan"

and if you have an unsuitable locale, e.g. a Chinese one

> iconv("H\345rkan", "latin1", "ASCII//TRANSLIT")
[1] "Harkan"

or

> gsub("\\\\345", "aa", "H\\345rkan")
[1] "Haarkan"


On Fri, 15 Aug 2008, Yingfu Xie wrote:

> Hello all,
>
> I have a data frame in R, imported from an excel file in Swedish. The
> original file contains several columns that have special characters,

> such as \?{a}, \?{o}, and so on. After import such special characters

> are represented in the data frame by "\\345", "\\366" etc (don't ask me

> why). For example, a word "H?rkan" becomes ''H\\345rkan".

That's odd: the quotes do not match.

We do need to ask you 'why', as we have nothing reproducible here.

> Now my question is if it is possible to substitute those "H\\345rkan" by
> "Haarkan" or simply "Harkan" in R, ideally by finding those "\\345" and
> then replacing.
>
> Thanks in advance,
> Yingfu
>
> [[alternative HTML version deleted]]

Please don't (as the posting guide asked). Properly encoded plain text
has a chance of working.


--
Brian D. Ripley, rip...@stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595

Yingfu Xie

unread,
Aug 15, 2008, 9:57:05 AM8/15/08
to r-h...@r-project.org, Prof Brian Ripley
Thanks to Prof. Ripley and Henrique, gsub does do the job. In addition, we can use like gsub("\\\\345","aa", the column of the data frame) to replace all such characters in this column.

By the way, I am using Windows Vista, R 2.6.1, in Sweden. As for the \\345 instead of \345, that is because, for some reasons as incomplete end line problem and missing data, I first imported the data into S-plus using S-plus's utility, dumped it out and restored it in R.

Thank you,
Yingfu

> "H\345rkan"
[1] "Hårkan"

or

______________________________________________

Reply all
Reply to author
Forward
0 new messages