I have a data frame in R, imported from an excel file in Swedish. The original file contains several columns that have special characters, such as \¨{a}, \¨{o}, and so on. After import such special characters are represented in the data frame by "\\345", "\\366" etc (don't ask me why). For example, a word "Hårkan" becomes ''H\\345rkan".
Now my question is if it is possible to substitute those "H\\345rkan" by "Haarkan" or simply "Harkan" in R, ideally by finding those "\\345" and then replacing.
Thanks in advance,
Yingfu
[[alternative HTML version deleted]]
gsub("\\\\345", "a", "H\\345rkan")
But see:
cat("H\345rkan\n")
> ______________________________________________
> R-h...@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
--
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O
______________________________________________
R-h...@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
But here's a guess. If you change \\345 to \345, it should render
correctly in a Latin-1 locale:
> "H\345rkan"
[1] "Hårkan"
If this a UTF-8 locale, convert it
> iconv("H\345rkan", "latin1")
[1] "Hårkan"
and if you have an unsuitable locale, e.g. a Chinese one
> iconv("H\345rkan", "latin1", "ASCII//TRANSLIT")
[1] "Harkan"
or
> gsub("\\\\345", "aa", "H\\345rkan")
[1] "Haarkan"
On Fri, 15 Aug 2008, Yingfu Xie wrote:
> Hello all,
>
> I have a data frame in R, imported from an excel file in Swedish. The
> original file contains several columns that have special characters,
> such as \?{a}, \?{o}, and so on. After import such special characters
> are represented in the data frame by "\\345", "\\366" etc (don't ask me
> why). For example, a word "H?rkan" becomes ''H\\345rkan".
That's odd: the quotes do not match.
We do need to ask you 'why', as we have nothing reproducible here.
> Now my question is if it is possible to substitute those "H\\345rkan" by
> "Haarkan" or simply "Harkan" in R, ideally by finding those "\\345" and
> then replacing.
>
> Thanks in advance,
> Yingfu
>
> [[alternative HTML version deleted]]
Please don't (as the posting guide asked). Properly encoded plain text
has a chance of working.
--
Brian D. Ripley, rip...@stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
By the way, I am using Windows Vista, R 2.6.1, in Sweden. As for the \\345 instead of \345, that is because, for some reasons as incomplete end line problem and missing data, I first imported the data into S-plus using S-plus's utility, dumped it out and restored it in R.
Thank you,
Yingfu
> "H\345rkan"
[1] "Hårkan"
or
______________________________________________