Account Options

  1. Sign in
The old Google Groups will be going away soon.
Switch to the new Google Groups.
Google Groups Home
« Groups Home
Discussions > Google webmaster tools > UTF vs US-ASCII reporting - Scary....
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  10 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Burt  
View profile  
 More options Aug 23 2008, 5:12 am
From: Burt
Date: Sat, 23 Aug 2008 02:12:01 -0700 (PDT)
Local: Sat, Aug 23 2008 5:12 am
Subject: UTF vs US-ASCII reporting - Scary....
I have a thread on this before, but I would like to raise it again...

I got a scare yesterday when I saw in webmaster tools that my
perfectly valid XHTML site, encoded in UTF-8 was listed as have all
its content in US-ASCII.

It seems that the webmaster tools (reports) (bot?) don't react to the
encoding in the meta header of the page itself, but only to the header
the server sends...

Most servers (I think) just report the (php) content encoding as being
in text/html and not as  'Content-Type: text/html; charset=UTF-8' as
it should be.

As far as I can see, there are no indexing problems...

And the webmaster tools probably have a good reason for not looking in
the meta-header of the page, but still, when you see this, you think
something is really wrong...

I thought, because of my encoding in the meta-header was in lower case
"utf-8" that that was the problem, but it seems not.

Anyway, I added the header('Content-Type: text/html; charset=UTF-8');
to my pages...

And hope these changes will not have a bad effect on rankings, or
something...


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
JohnMu Google employee  
View profile  
 More options Aug 23 2008, 10:16 am
From: JohnMu
Date: Sat, 23 Aug 2008 07:16:03 -0700 (PDT)
Local: Sat, Aug 23 2008 10:16 am
Subject: Re: UTF vs US-ASCII reporting - Scary....
Hi Burt!
That's an interesting question, but it's hard to say much without
knowing the site that you were looking at. Is it the one in your
profile?

In general, especially for mostly English language content, there is
no big difference since US-ASCII (7-bit ASCII) is a subset of UTF-8.
English language content can be seen as either type, so it doesn't
really matter which one we determine. Also keep in mind that while we
trust that YOU ;) are able to give us the correct content-type, this
is not always the case on the web in general, so sometimes we try to
determine it ourselves. That could mean that we recognize it as being
US-ASCII even though it could also be seen as UTF-8.

In the end, as long as you can see that we're listing your keywords
and your site properly in the search results, it's probably ok
regardless of what is shown in the statistics. So far, I have only
seen 2-3 cases where we incorrectly recognized the text encoding --
and in those cases, the pages didn't render properly in my browsers
either, so this is definitely something you would notice.

Hope it helps!
John


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Burt  
View profile  
 More options Aug 23 2008, 11:52 am
From: Burt
Date: Sat, 23 Aug 2008 08:52:18 -0700 (PDT)
Local: Sat, Aug 23 2008 11:52 am
Subject: Re: UTF vs US-ASCII reporting - Scary....
Hi John,

No, its not that one...

But its the one currently in my profile :)

about 4,420 (reported) pages of 100% XHTML US-ASCII ;)

I understand the bot, and its reasons to decide things on its own,
instead of just relying on what its told ;)

I mean, some individualism is even appreciated, in a character right ?

And the poor thing has been around !

Must have been to awful places....

But, nonetheless, while I tried to make life easy for it, it doesn't
recognize me for what I am...

I felt sooo misunderstood, you know....

Maybe you could ask the bot to look in the meta-header as well?

;-)

On Aug 23, 4:16 pm, JohnMu wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Burt  
View profile  
 More options Aug 23 2008, 6:14 pm
From: Burt
Date: Sat, 23 Aug 2008 15:14:08 -0700 (PDT)
Local: Sat, Aug 23 2008 6:14 pm
Subject: Re: UTF vs US-ASCII reporting - Scary....
Anyways.

My provider supplied me this link:

http://www.askapache.com/htaccess/setting-charset-in-htaccess.html

And I have added the  'Content-Type: text/html; charset=UTF-8'
to my php header file.

And I will be watching the webmaster tools, to see the green of US-
ASCII turn into UTF green.

And green is good.

Larry said so ;)


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Alden Bates  
View profile  
 More options Aug 24 2008, 5:59 pm
From: Alden Bates
Date: Sun, 24 Aug 2008 14:59:25 -0700 (PDT)
Local: Sun, Aug 24 2008 5:59 pm
Subject: Re: UTF vs US-ASCII reporting - Scary....
On Aug 24, 10:14 am, Burt wrote:

> And I have added the  'Content-Type: text/html; charset=UTF-8'
> to my php header file.

> And I will be watching the webmaster tools, to see the green of US-
> ASCII turn into UTF green.

That won't make any difference. I've had that on one of my sites for
years, and WMT reports it as being in US-ASCII. (I'm actually
considering changing the encoding reported on my pages to US-ASCII,
since I don't use any UTF characters anyway.)

Alden


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
webado  
View profile  
 More options Aug 24 2008, 6:29 pm
From: webado
Date: Sun, 24 Aug 2008 15:29:43 -0700 (PDT)
Local: Sun, Aug 24 2008 6:29 pm
Subject: Re: UTF vs US-ASCII reporting - Scary....
US-ASCII is a subset of UTF-8. if all the characters you use fit in US-
ASCII, then regardless of whether you use utf-8 or iso-88590-1 (what I
use)  for instance those pages will be reported as US-ASCII. If you as
much as have a single accented character on a page, then it will no
longer be reported as US-ASCII .

Whether it's good or bad, I don't know. I think it's purely not
relevant.

On Aug 24, 5:59 pm, Alden Bates wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
webado  
View profile  
 More options Aug 24 2008, 6:35 pm
From: webado
Date: Sun, 24 Aug 2008 15:35:45 -0700 (PDT)
Local: Sun, Aug 24 2008 6:35 pm
Subject: Re: UTF vs US-ASCII reporting - Scary....
Burt, for the site in yr uprofile I see the code has this in it:
charset=windows-1252

UTF-8 is kind of far.

You must have changed the site in yoru profile because this one isn't
even xhtml.

On Aug 24, 6:29 pm, webado wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Burt  
View profile  
 More options Aug 24 2008, 7:02 pm
From: Burt
Date: Sun, 24 Aug 2008 16:02:33 -0700 (PDT)
Local: Sun, Aug 24 2008 7:02 pm
Subject: Re: UTF vs US-ASCII reporting - Scary....
Changed it.

On Aug 25, 12:35 am, webado wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
keyskastle  
View profile  
 More options Sep 4 2008, 11:29 am
From: keyskastle
Date: Thu, 4 Sep 2008 08:29:27 -0700 (PDT)
Local: Thurs, Sep 4 2008 11:29 am
Subject: Re: UTF vs US-ASCII reporting - Scary....
On the subject of ASCII files - (I did a forum search & this was the
only thead that came up)

This morning, I was 'cleaning out' old files from my host root
directory - old images no longer in use and a sub-domain that I am
moving to a different host...
I ran across an ASCII file in my root directory named .us_states.dat
BG (before listing my sites in Google) I had no concern with this
file.

Viewing the options, edit, show... in my control panel and opening the
URL to read the source (source shows a notepad 'doc' no html) I am
unable to find a date the file was published, the author or any other
indication as to how this file became part of my root. I did not
knowingly, produce said file.

My first two websites built last year were done in a Soho 'template'
program. I converted one to Dreamweaver and now using a wysiwyg
program (which also allows for html coding). The new program will
'auto' generate a sitemap, if I direct it to. I mention that because
this particular ASCII appears to me as a "site map" of the US states/
abbreviations, nothing more nothing less.

Now tht I am listed with Google, I wonder if this type of file in my
root is a concern?
OR - is this a question for my host?
THANKS!

On Aug 24, 7:02 pm, Burt wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Pete Bardo  
View profile  
 More options Sep 5 2008, 2:07 pm
From: Pete Bardo
Date: Fri, 5 Sep 2008 11:07:22 -0700 (PDT)
Local: Fri, Sep 5 2008 2:07 pm
Subject: Re: UTF vs US-ASCII reporting - Scary....
As long as there are no direct links to .us_states.dat Google won't be
able to find it, unless it gets listed in your site map. That depends
on how your new program generates a site map.

Your state dat file may be used by one of your CMS to generate a state
drop-down menu on forms. It's a bit strange to find a file of that
nature in your web site path, much less in the root, but it's entirely
possible. You might change the file name and see what happens to your
site. If you have a dynamic page with a state drop-down menu, try that
page first. If it's fine--or if your page is static, try regenerating
the page from your CMS. If you get no errors, it's probably ok to
remove the file or move it to another folder just in case you need to
find it later. Your host may be able to help answer your question, too.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »