Page not being indexed

tapirgal

unread,

Mar 5, 2008, 11:10:59 AM3/5/08

to Only Validation + Navigation = Crawlability

Hello,

This is Sheryl in Oregon (was oregontapir). I changed my Google
account, so I hope this works.

I have no complaints about how Google indexes most of my site, but I
can't seem to get it to find this page:

http://www.tapirback.com/images/animals/mammals/aquatic/pinnipeds/sea-lions-01.htm

I even linked it from one of my blogs, which get indexed really fast
and seem to bring up related pages well.

Any ideas? The page has been up since last October. Thanks.

Sheryl

webado

unread,

Mar 5, 2008, 11:38:10 AM3/5/08

to Only Validation + Navigation = Crawlability

That page is nto idnexed at all. A site: query does not show it.

The highest level indexed is http://www.tapirback.com/images/animals/
which liks to your particular page.

You have errors on that page and probably similar errors on all pages
of your site:
http://validator.w3.org/check?verbose=1&uri=http://www.tapirback.com/images/animals/

if your page does NOT use an xhtml doctype you must not close any tags
with /> at all.

In particualr those meta and link tags in the head of the page. This
results in a premature end of the ehad and of the page, with content
not being processed further.

I recommend you validate all the pages on yrou site, starting with the
homepage.

It could be a lot, I agree.

This can validate the entie site but watch out for a lot of output:
http://htmlhelp.com/tools/validator/

You may be able to fix things easily if you are using any kind of
templates or if you are abel to use search and replace for all pages
you have on your pc.

On Mar 5, 11:10 am, tapirgal <she...@tapirback.com> wrote:
> Hello,
>
> This is Sheryl in Oregon (was oregontapir). I changed my Google
> account, so I hope this works.
>
> I have no complaints about how Google indexes most of my site, but I
> can't seem to get it to find this page:
>

> http://www.tapirback.com/images/animals/mammals/aquatic/pinnipeds/sea...

tapirgal

unread,

Mar 5, 2008, 1:00:40 PM3/5/08

to Only Validation + Navigation = Crawlability

Thanks, Webado. It would have been awhile before I found that problem.
Actually, many, many of my 1300+ pages do validate, such as the sea
lion page itself validates, but I wouldn't have realized that one or
several upstream had cut off the search. At least it validates on CSE
v 8.04.

The gift shop validates - it's the recreational (fun for me, less time
to play with) areas and my old tapir pages I didn't follow up on. :/

Sheryl

tapirgal

unread,

Mar 6, 2008, 2:40:18 AM3/6/08

to Only Validation + Navigation = Crawlability

Well, it seems that my main mistake on these pages was using a meta
tag that close with " /> which should be reserved for XHTML and I
should have only used the closing bracket without the slash and space
in my HTML 4.01 Transitional doc.

What I don't understand is why these pages validated cleanly with CSE
Validator Pro v8.04 with the latest updates. I thought that was a good
piece of software. Any comments on this software? I've been using it
on my whole site. I found it easier to work with than any of the
online pages I found (such as the W3C page) and to understand the
corrections when I had so many pages with so many mistakes a couple of
years ago, and I kept using the program, which obviously didn't
consider the current mitake a mistake. <br /> are also not seen in CSE
as a mistake in HTML 4.01 Transitional.

Sheryl

webado

unread,

Mar 6, 2008, 10:05:56 AM3/6/08

to Only Validation + Navigation = Crawlability

So the conclusion is CSE is not as good as you think it is.

You cannot replace the real valiadator.

Tidal_Wave_One

unread,

Mar 6, 2008, 2:47:49 PM3/6/08

to Only Validation + Navigation = Crawlability

Hi Sheryl,

CSE HTML Validator is good software. Those issues you mention are
extremely unlikely to affect search engine rankings. Also, remember
that CSE HTML Validator is designed for practical use, not simply to
check for strict compliance like the W3C validator. This makes CSE
HTML Validator more useful for most people.

If you want CSE HTML Validator to report errors for "/>" in HTML, then
you can uncheck the "XML compatibility" option in the Validator Engine
Options. This option is checked by default because it's not a problem
with today's browser/user agents.

If you want results more like the W3C validator, please see this page:
http://www.htmlvalidator.com/htmlval/v80/docs/validate_to_w3c_standards.htm

Also see:
http://www.htmlvalidator.com/htmlval/whycseisbetter.html

I hope this helps.

--
Albert Wiersch
AI Internet Solutions
sup...@htmlvalidator.com
http://www.htmlvalidator.com/

On Mar 6, 1:40 am, tapirgal <she...@tapirback.com> wrote:
> Well, it seems that my main mistake on these pages was using a meta
> tag that close with " /> which should be reserved for XHTML and I
> should have only used the closing bracket without the slash and space
> in my HTML 4.01 Transitional doc.
>

> What I don't understand is why these pages validated cleanly withCSEValidatorPro v8.04 with the latest updates. I thought that was a good

webado

unread,

Mar 6, 2008, 4:11:59 PM3/6/08

to Only Validation + Navigation = Crawlability

Albert, it is VERY important the head of a page NOT be broken. Using
XHTML syntax in a non-xhtml document (e.g. no xhtml doctype) breaks
the head in the worst way, resulting in the page content not being
well parsed if at all.

On 6 mar, 14:47, Tidal_Wave_One <goo...@wiersch.com> wrote:
> Hi Sheryl,
>
> CSE HTML Validator is good software. Those issues you mention are
> extremely unlikely to affect search engine rankings. Also, remember
> that CSE HTML Validator is designed for practical use, not simply to
> check for strict compliance like the W3C validator. This makes CSE
> HTML Validator more useful for most people.
>
> If you want CSE HTML Validator to report errors for "/>" in HTML, then
> you can uncheck the "XML compatibility" option in the Validator Engine
> Options. This option is checked by default because it's not a problem
> with today's browser/user agents.
>

> If you want results more like the W3C validator, please see this page:http://www.htmlvalidator.com/htmlval/v80/docs/validate_to_w3c_standar...

>
> Also see:http://www.htmlvalidator.com/htmlval/whycseisbetter.html
>
> I hope this helps.
>
> --
> Albert Wiersch
> AI Internet Solutions

> supp...@htmlvalidator.comhttp://www.htmlvalidator.com/

>
> On Mar 6, 1:40 am, tapirgal <she...@tapirback.com> wrote:
>
>
>
> > Well, it seems that my main mistake on these pages was using a meta
> > tag that close with " /> which should be reserved for XHTML and I
> > should have only used the closing bracket without the slash and space
> > in my HTML 4.01 Transitional doc.
>
> > What I don't understand is why these pages validated cleanly withCSEValidatorPro v8.04 with the latest updates. I thought that was a good
> > piece of software. Any comments on this software? I've been using it
> > on my whole site. I found it easier to work with than any of the
> > online pages I found (such as the W3C page) and to understand the
> > corrections when I had so many pages with so many mistakes a couple of
> > years ago, and I kept using the program, which obviously didn't
> > consider the current mitake a mistake. <br /> are also not seen in CSE
> > as a mistake in HTML 4.01 Transitional.
>

> > Sheryl- Masquer le texte des messages précédents -
>
> - Afficher le texte des messages précédents -

Tidal_Wave_One

unread,

Mar 6, 2008, 4:26:23 PM3/6/08

to Only Validation + Navigation = Crawlability

It sounds like you're talking about strict standards compliance, but
what modern search engine or web browser do you know of that will not
be able to parse or handle "/>" at the end of a meta tag? I'm not
aware of any but would like to know if there is indeed one.

Thanks,
Albert

webado

unread,

Mar 6, 2008, 4:56:23 PM3/6/08

to only-va...@googlegroups.com

Googlebot.

It won't complain. But the robot simply appears to skip over everything. Seen lots of sites doing poorly, pages dropping out of the index, or gone suppplemental (when that still existed officially). They often have this in common - broken head.

Tidal_Wave_One

unread,

Mar 6, 2008, 5:52:41 PM3/6/08

to Only Validation + Navigation = Crawlability

But a "broken head" to googlebot is probably something other
(something much more serious) than simply using "/>" to end a meta
tag. Do you have any supporting links that say that "/>" is
responsible for googlebot's skipping of info? I seriously doubt that
is the problem. I think it is far too speculative to simply blame it
on a "broken head" because of "/>". It is much more likely that it is
caused by more serious issues or just changes in how or why pages are
indexed.

Also, I think many pages have "broken heads" since many pages are
poorly written. But many of these "broken pages" still do well in the
search engines.

I'm not saying that validation is not important of course. It is
important for many reasons, but I think to mess up a search engine
ranking due to an "invalid" document requires a serious validation or
document structure issue and not simply using "/>".

webado

unread,

Mar 6, 2008, 6:17:53 PM3/6/08

to Only Validation + Navigation = Crawlability

I use my logic.

For a robot to parse pages semantically, all markup has to be good,
unbroken.

Modern Googelbots parse semantically.

There there's ancecdotal evidence of the broken mata tag causing pages
to go suppelemental (back when that still existed). Hence the urban
legend that "submitting to sitemaps makes the site tank".

> > common - broken head.- Masquer le texte des messages précédents -

tapirgal

unread,

Mar 6, 2008, 6:36:40 PM3/6/08

to Only Validation + Navigation = Crawlability

In a few days I'll let you know if my sea lion page has been botted or
not. The only change I made was to correct the code we've been talking
about in the meta tags and in one (?) break tag. I made the
corrections to it in the pages above it in the hierarchy on the site
and there's also a direct link to it from one of my blogs. I only had
these code issues in a few pages and I only had a few (same) pages not
indexed at all (as far as I know). Even my funky funky 12-year-old
tapir pages with no doctypes come up well in Google (not that I
shouldn't fix them, have simply not taken the time since they are
being found).

Anyway, it will be interesting to see if the sea lion page fixes put
that page on the map. Have to run now. Will check tonight if I
remember.

Sheryl

Tidal_Wave_One

unread,

Mar 7, 2008, 10:14:23 AM3/7/08

to Only Validation + Navigation = Crawlability

I use my logic as well, and it tells me that search engines are able
to parse markup that is not strictly correct. Just look at all the
indexed pages that would fail a strict validation by the W3C validator
or CSE HTML Validator. Of course I'm not against validation, but I
never tell anyone that a simple "error" according the strict standards
would cause a search engine issue. There are minor issues that should
pose no problem to a modern search engine robot. I'm convinced that
using "/>" is one of them. I'm just looking at this from a practical
standpoint, not a strict validation sense.

--
Albert Wiersch
AI Internet Solutions

sup...@htmlvalidator.com
http://www.htmlvalidator.com/

webado

unread,

Mar 7, 2008, 12:36:26 PM3/7/08

to Only Validation + Navigation = Crawlability

On 7 mar, 10:14, Tidal_Wave_One <goo...@wiersch.com> wrote:
> I
> never tell anyone that a simple "error" according the strict standards
> would cause a search engine issue.

It depends on the error. Breaking the head that way is a costly
whopper. Of course it doesn't preclude the existence of OTHER
problems elsewhere. In their absence this would be it.

If you are the one who programmed the CSE validator, please do
yourself an everybody esle a favor and provide full validation by
default.

If you want to be lenient, make that an option.

Tidal_Wave_One

unread,

Mar 7, 2008, 1:02:20 PM3/7/08

to Only Validation + Navigation = Crawlability

>
> It depends on the error. Breaking the head that way is a costly
> whopper. Of course it doesn't preclude the existence of OTHER
> problems elsewhere. In their absence this would be it.

I think we'll just have to agree to disagree about this issue.

> If you are the one who programmed the CSE validator, please do
> yourself an everybody esle a favor and provide full validation by
> default.

By default, we provide what we think will be most useful in a
practical sense, not a strict standards sense. If CSE HTML Validator
only checked to strict standards, then many of the advantages of CSE
HTML Validator would be lost because it wouldn't be able to check for
nearly as many potential real-world issues as it does now.

webado

unread,

Mar 7, 2008, 1:57:34 PM3/7/08

to only-va...@googlegroups.com

You can disagree with me. I certainly disagre with you.

"most useful in a
practical sense, not a strict standards sense."

What precisely are:

" real-world issues"

in the context of an html validator?

If yours is an html code validator those statements are meaningless.

I simply cannot endorse use of this product to anybody asking for my opinion.

Message has been deleted

Albert Wiersch

unread,

Mar 7, 2008, 3:25:22 PM3/7/08

to Only Validation + Navigation = Crawlability

> What precisely are:
> " real-world issues"
> in the context of an html validator?

I've replied to you privately. I wanted to make the reply public but
it looks like I clicked on the wrong link.

--
Albert Wiersch
AI Internet Solutions

supp...@htmlvalidator.com
http://www.htmlvalidator.com/

webado

unread,

Mar 7, 2008, 10:24:11 PM3/7/08

to Only Validation + Navigation = Crawlability

I got your email.

I emailed you back.

Robbo - W3C Rocks!

unread,

Mar 10, 2008, 6:18:21 PM3/10/08

to Only Validation + Navigation = Crawlability

Albert

I believe that the job of a validation diagnostic tool is to report
ALL deviations from the correct standard.

The tool may PRIORITIZE results by grading warnings eg from SEVERE to
TRIVIAL.

But I think that it is wrong for any validation diagnostic tool to
simply ignore or conceal the error.

Whether the page author wishes to spend time fixing that error is
their own choice.

Whether any given error is "significant" depends not only on the error
itself but on the context, the purpose and priorities decided by the
user - not by the validation tool.

Something which is "unimportant" one day may be very significant on
another day or from a different perspective.

But to claim that the error is not reported because (it is claimed
that) it will not affect search engine rankings is, in my opinion, a
spurious argument.

Incidentally, I believe that your validation tool reports many other
things including things which are not likely to affect search
engines. So why treat this one differently?

Robbo

Albert Wiersch

unread,

Mar 10, 2008, 10:47:47 PM3/10/08

to Only Validation + Navigation = Crawlability

> I believe that the job of a validation diagnostic tool is to report
> ALL deviations from the correct standard.

That's fine, but I've found that the vast majority of people just want
their pages to display properly and they want to address real issues
that affect real-world browsers. They don't really care if there is a
"non-issue" that a strict DTD validator doesn't like.

But you can configure CSE HTML Validator to be more standards
compliant or use the included DTD based validator.

But if all one cares about is complying strictly with the standard and
doesn't care about finding other problems (that the W3C can't find)
but that can seriously affect someone's experience, then CSE HTML
Validator is not for them. CSE HTML Validator is designed to go beyond
the standards so if one doesn't want to do that, then CSE HTML
Validator may not be for them.

> The tool may PRIORITIZE results by grading warnings eg from SEVERE to
> TRIVIAL.

It does that.

> But I think that it is wrong for any validation diagnostic tool to
> simply ignore or conceal the error.

CSE HTML Validator defines an error as something that is serious and
should be given attention.

> Whether the page author wishes to spend time fixing that error is
> their own choice.

Which is why CSE HTML Validator is very configurable. The user can
configure it to their liking.

> Whether any given error is "significant" depends not only on the error
> itself but on the context, the purpose and priorities decided by the
> user - not by the validation tool.

Again, which is why CSE HTML Validator is configurable... but by
default it is configured to be what most people are interested in, not
the select few who want strict compliance.

> Something which is "unimportant" one day may be very significant on
> another day or from a different perspective.

And it takes that into account by enforcing better style where it
makes the most sense. For example, by default it requires end tags for
optional end tags like "td", "tr", and "head" in HTML. The W3C
validator would let you get by with this even though it's not a good
idea.

> But to claim that the error is not reported because (it is claimed
> that) it will not affect search engine rankings is, in my opinion, a
> spurious argument.

The issue in question is not reported by default because it is not a
known issue with any modern search engines or user agents/browsers.

> Incidentally, I believe that your validation tool reports many other
> things including things which are not likely to affect search
> engines. So why treat this one differently?

Our tool is not just for search engines - its an overall quality tool
as well. The same philosophy applies in everything it does in that it
tries to report issues that are real-world problems and ignore or
downplay issues that aren't. but it can be configured to work
differently if the author chooses it.

If someone brings an issue to my attention, then I will research it
and if it is indeed an issue, I will try to address it. So if you have
any evidence that using "/>" in an HTML document has any detrimental
effect in modern search engines or user agents then I'd be happy to
reconsider the default of allowing it. But so far I have seen nothing
to show that that is the case.

Christina S

unread,

Mar 10, 2008, 11:40:22 PM3/10/08

to only-va...@googlegroups.com

> If someone brings an issue to my attention, then I will research it
> and if it is indeed an issue, I will try to address it. So if you have
> any evidence that using "/>" in an HTML document has any detrimental
> effect in modern search engines or user agents then I'd be happy to
> reconsider the default of allowing it. But so far I have seen nothing
> to show that that is the case.

Well I am bringing it to your attention now. I have been looking at the
effect of this particular meta tag for 2-3 years now.
At the time when Google was reporting supplemental pages, the effect almost
immediately after people added this uncorrected meta tag to their homepages
was to make all or most other pages go supplemental very quickly, with a
marked drop in expected SERPs for their typical queries. it got to be
extremely predictable, as soon as one would complain of their pages going
supplemental, I'd go and see exactly this meta tag with the wrong closure
used in non xhtml documents. Most of the time those are web pages which
don't have any doctype at all. And even if they also have tons of other
validation errors, this particular one broke the camel's back.

Now there is no more mention of supplemental pages. But the effect is the
same, the pages might remain indexed (occasionally disappear), but cached
less frequently, even while the homepage is freshly cached. yes, cached, but
its content isn't actually indexed.

You've not been following the endless heated discussions we've been having
in the Webmaster groups on this. It's true no Googlers have stated that
this is a factor - nor have they denied it. I could only speculate as to why
they are not. My theory is that saying yes, breaking the head in this way
gets the content of the page to be ignored, would attract the ire of many
webmaster who've been having problems ever since they jumped on the
bandwagon of sitemaps. Even if they were to point out that a simple run
through the validator would have pointed out the problem, somebody's always
bound to say that google.com isn't valid and the old chestnut of amazon.com
is way invalid. besides those sites typically were invalid before anyway.
But this is what triggered the main problem with the new smart robots that
parse semantically.

I can only maintain a firm stance on this.

Christina
www.webado.net

Albert Wiersch

unread,

Mar 11, 2008, 10:13:06 AM3/11/08

to Only Validation + Navigation = Crawlability

>
> Well I am bringing it to your attention now. I have been looking at the
> effect of this particular meta tag for 2-3 years now.
> At the time when Google was reporting supplemental pages, the effect almost
> immediately after people added this uncorrected meta tag to their homepages
> was to make all or most other pages go supplemental very quickly, with a
> marked drop in expected SERPs for their typical queries. it got to be
> extremely predictable, as soon as one would complain of their pages going
> supplemental, I'd go and see exactly this meta tag with the wrong closure
> used in non xhtml documents. Most of the time those are web pages which
> don't have any doctype at all. And even if they also have tons of other
> validation errors, this particular one broke the camel's back.

Thanks for the info, but this isn't conclusive. And since I suspect
those pages may have had other issues, there's really no way to know
what caused it unless an engineer from Google makes it clear. It may
not even be the errors in the pages but a change in how Google
processes rankings. You just can't know.

You may believe that's the case but others like me don't believe that
closing a meta tag with "/>" is a problem. For all those who want to
address this with CSE HTML Validator though, simply disabling the "XML
Compatibility" option will make these issues appear as errors. One can
also use the included DTD validator to find this problem (Tools-
>Validate->Nsgmls messages only).

And as others reason and as you mentioned, the high rankings of pages
that have many errors also raises questions as to how important
"perfect HTML" is to rankings.

Of course I think everyone should use a good checker to check their
HTML. Not only for possible search engine issues, but for other
potential problems as well. I don't think the W3C is the best choice
if you only have to use one. I think it is better to use something
that goes beyond the standards because DTD checkers are very limited
in what they can check. If you want to also check for strict
compliance, then that's fine, but one shouldn't limit themselves only
to strict compliance.

--
Albert Wiersch
AI Internet Solutions

sup...@htmlvalidator.com
http://www.htmlvalidator.com/

webado

unread,

Mar 11, 2008, 10:35:12 AM3/11/08

to Only Validation + Navigation = Crawlability

Albert, I've said my piece and I stand by my opinions.

It's up to you to use prudence as default or not.

There's nothing more to be said I'm afraid.

> supp...@htmlvalidator.comhttp://www.htmlvalidator.com/

Reply all

Reply to author

Forward