Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Discussions > Crawling, indexing, and ranking > Rewrite URL in UTF8.... Google compliant?
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  10 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
georgedawes  
View profile  
 More options Aug 30 2008, 3:24 am
From: georgedawes
Date: Sat, 30 Aug 2008 00:24:24 -0700 (PDT)
Local: Sat, Aug 30 2008 3:24 am
Subject: Rewrite URL in UTF8.... Google compliant?
Does anyone know if using UTF8 'clean urls' is Google compliant. e.g
www.domain.com/mysite/utf8texthere

Most new browsers handle the encoding correctly, but will the Google
spider recognise the url, follow and index pages?


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Robbo  
View profile  
 More options Aug 30 2008, 7:03 am
From: Robbo
Date: Sat, 30 Aug 2008 04:03:35 -0700 (PDT)
Local: Sat, Aug 30 2008 7:03 am
Subject: Re: Rewrite URL in UTF8.... Google compliant?

Could you clarify your question, please?

Do you mean the ABC abc type of UTF-8 characters or accented European
language characters or Russian/Chinese/Arabic type of characters?


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
JohnMu Google employee  
View profile  
 More options Aug 30 2008, 8:53 am
From: JohnMu
Date: Sat, 30 Aug 2008 05:53:10 -0700 (PDT)
Local: Sat, Aug 30 2008 8:53 am
Subject: Re: Rewrite URL in UTF8.... Google compliant?
Hi georgedawes and welcome to the groups!

Yes, we can generally keep up with UTF-8 encoded URLs and we'll
generally show them to users in our search results (but link to your
server with the URLs properly escaped). I would recommend that you
also use escaped URLs in your links, to make sure that your site is
compatible with older browsers that don't understand straight UTF-8
URLs.

Here's an example search page which shows some of those URLs:
http://www.google.com/search?q=inurl%3A%E9%96%A2%E9%80%A3%E8%A8%98%E4...

The only situations where I have seen issues are those where the URL
uses non-ASCII characters but the server is expecting an encoding
other than UTF-8. However, if you can make sure that you use UTF-8 all
around, you should be fine.

Hope it helps!
John


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
cristina  
View profile  
 More options Aug 30 2008, 8:59 am
From: cristina
Date: Sat, 30 Aug 2008 05:59:10 -0700 (PDT)
Local: Sat, Aug 30 2008 8:59 am
Subject: Re: Rewrite URL in UTF8.... Google compliant?
As Robbo wrote, it depends on the URLs.
Also it depends on how / if your server supports UTF-8 encoding of
URLs.

Have a look at
http://www.google.com/support/webmasters/bin/answer.py?answer=35653
that is a Google help page about encoding URLs in the sitemap,
and it gives an example of a UTF-8 encoded URL.

Submit the URLs in the sitemap
(if you want them indexed),
and if there is a problem, you should get an error message
in the sitemaps page of Google Webmaster Tools.

Cristina.

On Aug 30, 8:24 am, georgedawes wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
cristina  
View profile  
 More options Aug 30 2008, 9:01 am
From: cristina
Date: Sat, 30 Aug 2008 06:01:22 -0700 (PDT)
Local: Sat, Aug 30 2008 9:01 am
Subject: Re: Rewrite URL in UTF8.... Google compliant?
Sorry John, I posted before I saw your reply.

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Tim Abracadabra  
View profile  
 More options Aug 30 2008, 9:28 am
From: Tim Abracadabra
Date: Sat, 30 Aug 2008 06:28:15 -0700 (PDT)
Local: Sat, Aug 30 2008 9:28 am
Subject: Re: Rewrite URL in UTF8.... Google compliant?
Thanks John!

I had an idea but after searching the groups to
help provide an answer and then reading the replies, I was unsure.

Thanks for the clarification :-)
Abracadabra
On Aug 30, 8:53 am, JohnMu wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
georgedawes  
View profile  
 More options Aug 30 2008, 11:43 pm
From: georgedawes
Date: Sat, 30 Aug 2008 20:43:55 -0700 (PDT)
Local: Sat, Aug 30 2008 11:43 pm
Subject: Re: Rewrite URL in UTF8.... Google compliant?
Thanks to all that have replied.

The language is Vietnamese, which is mostly like European Accented
chars, apart from a few special cases. You've all given me good
advice, I'll follow-up the links you provided, and feedback here if
anything unusual happens. Server side we're ok, so I was particularly
interested in whether search spiders would make any sense of the utf8
part of the url.

Have a good weekend all!

On Aug 30, 8:28 pm, Tim Abracadabra wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
georgedawes  
View profile  
 More options Aug 31 2008, 12:12 am
From: georgedawes
Date: Sat, 30 Aug 2008 21:12:24 -0700 (PDT)
Local: Sun, Aug 31 2008 12:12 am
Subject: Re: Rewrite URL in UTF8.... Google compliant?
All parts are UTF8, server, db..

Can I clarify this part:

I see your advice about escaped urls, so we should replace all non
standard chars with escape codes?
The server currently produces straight utf8 urls.

Thanks for the help. I searched everywhere on this topic, including
SearchEngineWatch, etc and couldn't find anyone willing to commit a
straight answer ;)

On Aug 30, 7:53 pm, JohnMu wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
JohnMu Google employee  
View profile  
 More options Aug 31 2008, 5:26 pm
From: JohnMu
Date: Sun, 31 Aug 2008 14:26:49 -0700 (PDT)
Local: Sun, Aug 31 2008 5:26 pm
Subject: Re: Rewrite URL in UTF8.... Google compliant?
Hi georgedawes

For the Googlebot it would generally be ok to use UTF8 characters in
the URLs and in the links shown on your pages. If everything is UTF8
then you should be fine. If you have some things that are not in UTF8,
then I would suggest escaping these characters in the links (as far as
I know, this will generally be done for the URLs on the server side
anyway).

If your users are not all using modern browsers that can understand
UTF8 characters in URLs, then I would also suggest escaping them in
links on your pages. Modern browsers will recognize the escaped
characters and display them as UTF8 characters anyway.

So for maximum compatibility (for your users) I would suggest escaping
these characters in your links, but the Googlebot will recognize them
non-escaped as well (provided everything is in UTF8).

Hope it helps!
John


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
georgedawes  
View profile  
 More options Aug 31 2008, 8:57 pm
From: georgedawes
Date: Sun, 31 Aug 2008 17:57:05 -0700 (PDT)
Local: Sun, Aug 31 2008 8:57 pm
Subject: Re: Rewrite URL in UTF8.... Google compliant?
Great. Makes sense. Now for a morning of matching the escape codes :)
Thanks for all your help

On Sep 1, 4:26 am, JohnMu wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »