I have a thread on this before, but I would like to raise it again...
I got a scare yesterday when I saw in webmaster tools that my
perfectly valid XHTML site, encoded in UTF-8 was listed as have all
its content in US-ASCII.
It seems that the webmaster tools (reports) (bot?) don't react to the
encoding in the meta header of the page itself, but only to the header
the server sends...
Most servers (I think) just report the (php) content encoding as being
in text/html and not as 'Content-Type: text/html; charset=UTF-8' as
it should be.
As far as I can see, there are no indexing problems...
And the webmaster tools probably have a good reason for not looking in
the meta-header of the page, but still, when you see this, you think
something is really wrong...
I thought, because of my encoding in the meta-header was in lower case
"utf-8" that that was the problem, but it seems not.
Anyway, I added the header('Content-Type: text/html; charset=UTF-8');
to my pages...
And hope these changes will not have a bad effect on rankings, or
something...
Hi Burt!
That's an interesting question, but it's hard to say much without
knowing the site that you were looking at. Is it the one in your
profile?
In general, especially for mostly English language content, there is
no big difference since US-ASCII (7-bit ASCII) is a subset of UTF-8.
English language content can be seen as either type, so it doesn't
really matter which one we determine. Also keep in mind that while we
trust that YOU ;) are able to give us the correct content-type, this
is not always the case on the web in general, so sometimes we try to
determine it ourselves. That could mean that we recognize it as being
US-ASCII even though it could also be seen as UTF-8.
In the end, as long as you can see that we're listing your keywords
and your site properly in the search results, it's probably ok
regardless of what is shown in the statistics. So far, I have only
seen 2-3 cases where we incorrectly recognized the text encoding --
and in those cases, the pages didn't render properly in my browsers
either, so this is definitely something you would notice.
> Hi Burt!
> That's an interesting question, but it's hard to say much without
> knowing the site that you were looking at. Is it the one in your
> profile?
> And I have added the 'Content-Type: text/html; charset=UTF-8'
> to my php header file.
> And I will be watching the webmaster tools, to see the green of US-
> ASCII turn into UTF green.
That won't make any difference. I've had that on one of my sites for
years, and WMT reports it as being in US-ASCII. (I'm actually
considering changing the encoding reported on my pages to US-ASCII,
since I don't use any UTF characters anyway.)
US-ASCII is a subset of UTF-8. if all the characters you use fit in US-
ASCII, then regardless of whether you use utf-8 or iso-88590-1 (what I
use) for instance those pages will be reported as US-ASCII. If you as
much as have a single accented character on a page, then it will no
longer be reported as US-ASCII .
Whether it's good or bad, I don't know. I think it's purely not
relevant.
> > And I have added the 'Content-Type: text/html; charset=UTF-8'
> > to my php header file.
> > And I will be watching the webmaster tools, to see the green of US-
> > ASCII turn into UTF green.
> That won't make any difference. I've had that on one of my sites for
> years, and WMT reports it as being in US-ASCII. (I'm actually
> considering changing the encoding reported on my pages to US-ASCII,
> since I don't use any UTF characters anyway.)
> US-ASCII is a subset of UTF-8. if all the characters you use fit in US-
> ASCII, then regardless of whether you use utf-8 or iso-88590-1 (what I
> use) for instance those pages will be reported as US-ASCII. If you as
> much as have a single accented character on a page, then it will no
> longer be reported as US-ASCII .
> Whether it's good or bad, I don't know. I think it's purely not
> relevant.
> On Aug 24, 5:59 pm, Alden Bates wrote:
> > On Aug 24, 10:14 am, Burt wrote:
> > > And I have added the 'Content-Type: text/html; charset=UTF-8'
> > > to my php header file.
> > > And I will be watching the webmaster tools, to see the green of US-
> > > ASCII turn into UTF green.
> > That won't make any difference. I've had that on one of my sites for
> > years, and WMT reports it as being in US-ASCII. (I'm actually
> > considering changing the encoding reported on my pages to US-ASCII,
> > since I don't use any UTF characters anyway.)
> Burt, for the site in yr uprofile I see the code has this in it:
> charset=windows-1252
> UTF-8 is kind of far.
> You must have changed the site in yoru profile because this one isn't
> even xhtml.
> On Aug 24, 6:29 pm, webado wrote:
> > US-ASCII is a subset of UTF-8. if all the characters you use fit in US-
> > ASCII, then regardless of whether you use utf-8 or iso-88590-1 (what I
> > use) for instance those pages will be reported as US-ASCII. If you as
> > much as have a single accented character on a page, then it will no
> > longer be reported as US-ASCII .
> > Whether it's good or bad, I don't know. I think it's purely not
> > relevant.
> > On Aug 24, 5:59 pm, Alden Bates wrote:
> > > On Aug 24, 10:14 am, Burt wrote:
> > > > And I have added the 'Content-Type: text/html; charset=UTF-8'
> > > > to my php header file.
> > > > And I will be watching the webmaster tools, to see the green of US-
> > > > ASCII turn into UTF green.
> > > That won't make any difference. I've had that on one of my sites for
> > > years, and WMT reports it as being in US-ASCII. (I'm actually
> > > considering changing the encoding reported on my pages to US-ASCII,
> > > since I don't use any UTF characters anyway.)
On the subject of ASCII files - (I did a forum search & this was the
only thead that came up)
This morning, I was 'cleaning out' old files from my host root
directory - old images no longer in use and a sub-domain that I am
moving to a different host...
I ran across an ASCII file in my root directory named .us_states.dat
BG (before listing my sites in Google) I had no concern with this
file.
Viewing the options, edit, show... in my control panel and opening the
URL to read the source (source shows a notepad 'doc' no html) I am
unable to find a date the file was published, the author or any other
indication as to how this file became part of my root. I did not
knowingly, produce said file.
My first two websites built last year were done in a Soho 'template'
program. I converted one to Dreamweaver and now using a wysiwyg
program (which also allows for html coding). The new program will
'auto' generate a sitemap, if I direct it to. I mention that because
this particular ASCII appears to me as a "site map" of the US states/
abbreviations, nothing more nothing less.
Now tht I am listed with Google, I wonder if this type of file in my
root is a concern?
OR - is this a question for my host?
THANKS!
> > Burt, for the site in yr uprofile I see the code has this in it:
> > charset=windows-1252
> > UTF-8 is kind of far.
> > You must have changed the site in yoru profile because this one isn't
> > even xhtml.
> > On Aug 24, 6:29 pm, webado wrote:
> > > US-ASCIIis a subset of UTF-8. if all the characters you use fit in US-
> > >ASCII, then regardless of whether you use utf-8 or iso-88590-1 (what I
> > > use) for instance those pages will be reported as US-ASCII. If you as
> > > much as have a single accented character on a page, then it will no
> > > longer be reported as US-ASCII.
> > > Whether it's good or bad, I don't know. I think it's purely not
> > > relevant.
> > > On Aug 24, 5:59 pm, Alden Bates wrote:
> > > > On Aug 24, 10:14 am, Burt wrote:
> > > > > And I have added the 'Content-Type: text/html; charset=UTF-8'
> > > > > to my php header file.
> > > > > And I will be watching the webmaster tools, to see the green of US-
> > > > >ASCIIturn into UTF green.
> > > > That won't make any difference. I've had that on one of my sites for
> > > > years, and WMT reports it as being in US-ASCII. (I'm actually
> > > > considering changing the encoding reported on my pages to US-ASCII,
> > > > since I don't use any UTF characters anyway.)
As long as there are no direct links to .us_states.dat Google won't be
able to find it, unless it gets listed in your site map. That depends
on how your new program generates a site map.
Your state dat file may be used by one of your CMS to generate a state
drop-down menu on forms. It's a bit strange to find a file of that
nature in your web site path, much less in the root, but it's entirely
possible. You might change the file name and see what happens to your
site. If you have a dynamic page with a state drop-down menu, try that
page first. If it's fine--or if your page is static, try regenerating
the page from your CMS. If you get no errors, it's probably ok to
remove the file or move it to another folder just in case you need to
find it later. Your host may be able to help answer your question, too.