Yes, we can generally keep up with UTF-8 encoded URLs and we'll
generally show them to users in our search results (but link to your
server with the URLs properly escaped). I would recommend that you
also use escaped URLs in your links, to make sure that your site is
compatible with older browsers that don't understand straight UTF-8
URLs.
The only situations where I have seen issues are those where the URL
uses non-ASCII characters but the server is expecting an encoding
other than UTF-8. However, if you can make sure that you use UTF-8 all
around, you should be fine.
Submit the URLs in the sitemap
(if you want them indexed),
and if there is a problem, you should get an error message
in the sitemaps page of Google Webmaster Tools.
> Yes, we can generally keep up with UTF-8 encoded URLs and we'll
> generally show them to users in our search results (but link to your
> server with the URLs properly escaped). I would recommend that you
> also use escaped URLs in your links, to make sure that your site is
> compatible with older browsers that don't understand straight UTF-8
> URLs.
> The only situations where I have seen issues are those where the URL
> uses non-ASCII characters but the server is expecting an encoding
> other than UTF-8. However, if you can make sure that you use UTF-8 all
> around, you should be fine.
The language is Vietnamese, which is mostly like European Accented
chars, apart from a few special cases. You've all given me good
advice, I'll follow-up the links you provided, and feedback here if
anything unusual happens. Server side we're ok, so I was particularly
interested in whether search spiders would make any sense of the utf8
part of the url.
> I had an idea but after searching the groups to
> help provide an answer and then reading the replies, I was unsure.
> Thanks for the clarification :-)
> Abracadabra
> On Aug 30, 8:53 am, JohnMu wrote:
> > Hi georgedawes and welcome to the groups!
> > Yes, we can generally keep up with UTF-8 encoded URLs and we'll
> > generally show them to users in our search results (but link to your
> > server with the URLs properly escaped). I would recommend that you
> > also use escaped URLs in your links, to make sure that your site is
> > compatible with older browsers that don't understand straight UTF-8
> > URLs.
> > The only situations where I have seen issues are those where the URL
> > uses non-ASCII characters but the server is expecting an encoding
> > other than UTF-8. However, if you can make sure that you use UTF-8 all
> > around, you should be fine.
I see your advice about escaped urls, so we should replace all non
standard chars with escape codes?
The server currently produces straight utf8 urls.
Thanks for the help. I searched everywhere on this topic, including
SearchEngineWatch, etc and couldn't find anyone willing to commit a
straight answer ;)
> Yes, we can generally keep up with UTF-8 encoded URLs and we'll
> generally show them to users in our search results (but link to your
> server with the URLs properly escaped). I would recommend that you
> also use escaped URLs in your links, to make sure that your site is
> compatible with older browsers that don't understand straight UTF-8
> URLs.
> The only situations where I have seen issues are those where the URL
> uses non-ASCII characters but the server is expecting an encoding
> other than UTF-8. However, if you can make sure that you use UTF-8 all
> around, you should be fine.
For the Googlebot it would generally be ok to use UTF8 characters in
the URLs and in the links shown on your pages. If everything is UTF8
then you should be fine. If you have some things that are not in UTF8,
then I would suggest escaping these characters in the links (as far as
I know, this will generally be done for the URLs on the server side
anyway).
If your users are not all using modern browsers that can understand
UTF8 characters in URLs, then I would also suggest escaping them in
links on your pages. Modern browsers will recognize the escaped
characters and display them as UTF8 characters anyway.
So for maximum compatibility (for your users) I would suggest escaping
these characters in your links, but the Googlebot will recognize them
non-escaped as well (provided everything is in UTF8).
> For the Googlebot it would generally be ok to use UTF8 characters in
> the URLs and in the links shown on your pages. If everything is UTF8
> then you should be fine. If you have some things that are not in UTF8,
> then I would suggest escaping these characters in the links (as far as
> I know, this will generally be done for the URLs on the server side
> anyway).
> If your users are not all using modern browsers that can understand
> UTF8 characters in URLs, then I would also suggest escaping them in
> links on your pages. Modern browsers will recognize the escaped
> characters and display them as UTF8 characters anyway.
> So for maximum compatibility (for your users) I would suggest escaping
> these characters in your links, but the Googlebot will recognize them
> non-escaped as well (provided everything is in UTF8).