You must *start* with a string in the same encoding that the website uses - whether that's LAtin1, UTF8 as in this case, or whatever - or the URL-encoding will produce an incorrect address.
Best
Andrew.
> --
> You received this message because you are subscribed to the
> Google Groups "CorpLing with R" group.
> To post to this group, send email to corplin...@googlegroups.com.
> To unsubscribe from this group, send email to
> corpling-with...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/corpling-with-r?hl=en.
>
>
y <- sapply(x[z], function(x) paste0("%", as.character(charToRaw(x)), collapse = ""))
So, I define myURLencode as follows:
myURLencode = function(URL, reserved = FALSE) {
OK <- paste("[^-ABCDEFGHIJKLMNOPQRSTUVWXYZ", "abcdefghijklmnopqrstuvwxyz0123456789$_.+!*'(),", if (!reserved) ";/?:@=&", "]", sep = "")
x <- strsplit(URL, "")[[1L]]
z <- grep(OK, x)
if (length(z)) {
y <- sapply(x[z], function(x) paste("%", as.hexmode(utf8ToInt(x)), sep = "", collapse = ""))
x[z] <- y
}
paste(x, collapse = "")
}
Then:
myURLencode("México")
[1] "M%e9xico"
John
--
You received this message because you are subscribed to the Google Groups "CorpLing with R" group.
To view this discussion on the web visit https://groups.google.com/d/msg/corpling-with-r/-/WJQ8JnW_F3EJ.
To post to this group, send email to corplin...@googlegroups.com.
To unsubscribe from this group, send email to corpling-with...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/corpling-with-r?hl=en.
Hi Earl,
I think you have your iconv arguments the wrong way round.
Your original text is clearly UTF-8 because when you url-encode it without using iconv you get a UTF-8 byte sequence. So, you need to convert from UTF-8 to Win1252 before url-encoding if you want e-acute to come out as just %e9. The order of arguments for iconv is string-from-to.
In other words,
URLencode(iconv("México", "UTF-8", "WINDOWS-1252"))
should do the trick.
best
Andrew.
--
I’m not an specialist on R internals by any means, but that looks like a deficiency (I shan’t say bug) in URLencode – basically, it is saying that it refuses to URLencode a string in another encoding than the current locale’s encoding (which is UTF-8). (To be more specific, strsplit refuses to deal with strings that aren’t current locale ie UTF-8, and URLencode uses strsplit).
You might be able to get round this by stashing the iconv return value in a variable, setting the locale to a Win-1252 charset, and then calling URLencode on the stored value. (and re-setting the locale afterwards of course).
Chalk this up as reason number seventeen or so that I don’t like or trust locales...
best
Andrew.
--
You received this message because you are subscribed to the Google Groups "CorpLing with R" group.
To view this discussion on the web visit https://groups.google.com/d/msg/corpling-with-r/-/HW71Gs9C3mYJ.