Google Groups Home
Help | Sign in
Discussions > Crawling, indexing, and ranking > 6 months since we mostly dropped out of the search index
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  15 messages - Collapse all
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
Autocrat  
View profile
 More options Jun 22, 6:17 pm
From: Autocrat
Date: Sun, 22 Jun 2008 15:17:46 -0700 (PDT)
Local: Sun, Jun 22 2008 6:17 pm
Subject: Re: 6 months since we mostly dropped out of the search index
::: Cross Thread :::
http://groups.google.com/group/Google_Webmaster_Help-
Indexing/browse_thread/thread/
43a75cd572b9d90/24bdd27f0b763685#24bdd27f0b763685

- - - - - - -

Please try to use the same Topic if it is the same issue.
Simple view your profile or search on your profile name.

If your Topic has not received a 'timely' reply (that being within 24
hours), then please feel free to reply to your own post with
:bump:
or similar.

This helps avoid filling the topic list up with 'dead' posts.


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Autocrat  
View profile
 More options Jun 22, 6:42 pm
From: Autocrat
Date: Sun, 22 Jun 2008 15:42:54 -0700 (PDT)
Local: Sun, Jun 22 2008 6:42 pm
Subject: Re: 6 months since we mostly dropped out of the search index
::: Canonical Domain issues :::
I can access your site with and without www in the URL...

http://www.gadgetguy.com.au/
http://gadgetguy.com.au/

This could be perceived as Internal Duplication.
This could also be a possible issue for Link Loss (the PR value of
inbound links is shared to different URLs rather than considated).

Employ a server based 301 redirect from the unwanted format to the
prefered format.
Select the prefered format in the GWMT.
ensure any/all links use the same format.
ensure that the search engine sitemap uses the prefered format

.

::: Internal Duplication / Multiple URL issues :::
I seem to be able to reach certain content through different URLs....

http://gadgetguy.com.au/
http://gadgetguy.com.au/index.asp
http://www.gadgetguy.com.au/article.asp?m_article=3052
http://www.gadgetguy.com.au/a-guide-to-the-apple-store-sydney-iwent,-...

This could be perceived as Internal Duplication.
This could also be a possible issue for Link Loss (the PR value of
inbound links is shared to different URLs rather than considated).

Try to ensure only 1 URL is used with any/all links.
If absolutely necessary (such as after fixing this issue and to cover
older items), employ a server based 301 permenant redirect to the
prefered item URL.
Include the prefered URL in the search engine sitemap (and disclude
the unwanted one).

.

::: Strange URLs :::
Some URLs contain unusual characters...

http://www.gadgetguy.com.au/a-guide-to-the-apple-store-sydney-iwent,-...

I'm seeing commas in that URL.

No idea if that could cause a problem... but I don't think it should
be happening.

.

::: Long/Large URLs / Possibly 'stuffed' URLs :::
Some URLs seem excessively long...

http://gadgetguy.com.au/computing-computer-laptop-notebook-software-w...
http://gadgetguy.com.au/home-appliances-electrical-whitegoods-aircond...

These URLs also seem to have an assortment of words in them that don't
seem very 'focused'... instead they look 'stuffed' with possible
keywords/terms.

.

::: Invalid Code :::
You seem to be mixing HTML and XHTML fomatting on some markup... a
prime example is in teh Head section...

<meta name="robots" content="all" />
<meta name="description" content="GadgetGuy.com.au - Reviews, news,
comparisons and buyers guides on the latest gadgets, computers, home
theatre, phones, games and cool stuff!" />

Your DocType says HTML ... so those items should not be ending with a
/>
only with a


.

::: You MAY have hidden content :::
Looking thorugh the source code, I see this...

<div id="nav0" style="display:none"></div>
<script language="JavaScript" type="text/javascript">
document.getElementById("nav0").innerHTML =
document.getElementById("subnav").innerHTML;
</script>

This means that with JS Off and CSS On, there is content that is
hidden from view, and cannot be made visible.

For best practice, ALL content should be 'visible'... and made hidden
be employing JS to change the style/class.

.

::: Malformed robots.txt :::
Your robots file may be acceptable to SOME bots, but possibly not all.

Allow: /

Is not 'standard' ... it may be worht looking up the robots standard.

.

::: Incorrect Sitemap :::
It's possible that your SE Sitemap is incorrectly formed... as well as
possible 'incorrect' in it's information.

http://gadgetguy.com.au/sitemap.xml

1) scroll right to the bottom... and see a bunch of entries that
really don't look right

2) Please look at the URL information you are suppling...
<changefreq>daily</changefreq>
<priority>0.50</priority>
Is that really true?

.
.
.

Hope all that helps.


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
kscope  
View profile
 More options Jun 22, 11:38 pm
From: kscope
Date: Sun, 22 Jun 2008 20:38:10 -0700 (PDT)
Local: Sun, Jun 22 2008 11:38 pm
Subject: Re: 6 months since we mostly dropped out of the search index
Thanks for the reply Autocrat, I'll speak to our developer to go over
the points you've raised. Sorry about the re-posting, I didn't know
about the 'bump' option.

    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
kscope  
View profile
 More options Jun 23, 1:03 am
From: kscope
Date: Sun, 22 Jun 2008 22:03:23 -0700 (PDT)
Local: Mon, Jun 23 2008 1:03 am
Subject: Re: 6 months since we mostly dropped out of the search index
And here's the feedback I received on your comments:

1. No external links, internal links, or sitemap links use http://gadgetguy.com.au
so this would not be an issue.
2. Internal Duplication is a very recent issue; however again the
Sitemap and all the Index and Section home pages use the latest links
which are being crawled, so this should not be a significant issue.
Duplication does not explain why neither page would be indexed, or why
deeper indexing other than Homepage and Section Home is not occuring.
3. Commas in URLs we will investigate.
4. Long/keyword stuffed URLs. Have edited to fix issues of length and
keyword stuffing.
5. Invalid Code - mmm... I very much doubt this is an issue. The
DocType is used by the browser to work out how to interpret the HTML
at render time. Google has no interest in this, so I see no reason how
this could cause a problem.
6. We don't have hidden content. The DIV layer being hidden is an
empty formatting control.
7. Google is happy the sitemap is valid. We also validated the sitemap
independantly before submitting. However, as this is dynamically
generated there could be new content or very old content in it that is
appearing for the first time; nonetheless if there was a problem the
Google Webmaster report would report it - and there are no issues
showing at the moment.


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Autocrat  
View profile
 More options Jun 23, 4:35 am
From: Autocrat
Date: Mon, 23 Jun 2008 01:35:41 -0700 (PDT)
Local: Mon, Jun 23 2008 4:35 am
Subject: Re: 6 months since we mostly dropped out of the search index
?

Well, not alot can be said really.
.

How about Hidden links/text?
Viewing the page without images shows the top nav bar becomes
'invisibile' ....
Viewing the page without images shows a lot of text that is hard to
read.

(hell, even if it isn't an SE issue, should be fixed for certain user-
types.)

.

How about applying rel="nofollow" to the links in noscrip[t tags?

.

Links to non-content pages?

Lets go here...
http://gadgetguy.com.au/home-appliances-whitegoods-air-conditioning-h...
first thing on the left is the Russel Hobbs rice cooker...
(with 2 links to 2 urls! We'll take the top one)
http://gadgetguy.com.au/russell-hobbs-reflections-rice-cooker-review-...

now I see 2 more links...
Review | Features

Review...
http://gadgetguy.com.au/product.asp?id=14&m_review=0
Erm... not really seeing anything important on here...
nothing specific to Reviewing hte Russel Hobbs rice cooker anyway
(inc. irrelevant title, h1 etc.)

Features...
http://gadgetguy.com.au/features.asp?id=14&m_review=0
This one has no real content at all?

So what are the bots meant to be indexing?
.

Weak Image Alt attributes...
Why does it just say 'Product' ?
Not exactly useful to blind people (including bots ;))...
so missing an opportunity to not only be helpful, but possibly get an
extra point or two with the SE's.

.

Any exernal duplicate content...
Is it the content 'yours' or is it from other sources?
Is it being 'fed' to other sites?
Is it being scrapped/copied by other sites?

.
.
.

Hope all that helps.
(May be worth getting the develoepr in as well, saves you running back
and forth ;))


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
JohnMu Google employee  
View profile
 More options Jun 23, 4:53 am
From: JohnMu
Date: Mon, 23 Jun 2008 01:53:22 -0700 (PDT)
Local: Mon, Jun 23 2008 4:53 am
Subject: Re: 6 months since we mostly dropped out of the search index
Hi kscope and welcome to the groups!

Looking at the pages indexed, I see two issues which Autocrat has
already mentioned (thanks, Autocrat!) that I'd like to expand on:

1. Long & crazy URLs
I ran into this URL while looking at [site:gadgetguy.com.au] :
http://www.gadgetguy.com.au/small-kitchen-appliances-toaster-kettle-c...

Now I'm all for having descriptive URLs, but .... this seems to be
taking it a bit too far and I have a bit of trouble identifying
anything that matches in the content of your page.

The problem with URLs like this is that they almost appear to be
random and in fact I can get exactly the same page by using something
like: http://www.gadgetguy.com.au/xyzzy-42.html . In general, you
should make sure that you have only one URL that leads to your content
-- all others should either redirect to the proper URL or return HTTP
result code 404 to signal that the URL is invalid. Without that, your
site is leading us (and all other crawlers) on a wild goose chase.

If your CMS is not able to handle this properly (one URL per piece of
content), I would recommend not using rewritten URLs so that we can
recognize and skip over unimportant parameters in your URL query
string.

2. Broken HTML code
In general, we try to get it right regardless of what a webmaster uses
on his page. However, there are limits to what we can guess at.
Although this is definitely not as important as the first point, you
can see this happening when you search for something like:
http://www.google.com/search?q=site:www.gadgetguy.com.au+intitle:shor...

Hope it helps!
John


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
kscope  
View profile
 More options Jun 23, 7:14 am
From: kscope
Date: Mon, 23 Jun 2008 04:14:24 -0700 (PDT)
Local: Mon, Jun 23 2008 7:14 am
Subject: Re: 6 months since we mostly dropped out of the search index
Hello Autocrat and John,

Thanks for the welcome, and for the feedback - I'm sure to be a
regular here to read up on the finer points of SE stuff.

OK, I have almost finished paring back the URLs of all the site
sections, the pages that have actually been indexed. Of the pages that
aren't being indexed, the URLs are only the article title or the
product name.

I'm told that via Webmaster Tools it appears that Google has indexed
about 70% of our sitemap. Can you tell me what the relationship is, or
lag is, between a sitemap being indexed and the pages making it into
the search index.

I have also passed along the URL of this thread to our developer, so
either he will chime in or I'll continue to be the point man to
hopefully getting us more SE-friendly.


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Autocrat  
View profile
 More options Jun 23, 8:53 am
From: Autocrat
Date: Mon, 23 Jun 2008 05:53:31 -0700 (PDT)
Local: Mon, Jun 23 2008 8:53 am
Subject: Re: 6 months since we mostly dropped out of the search index
Don't worry too much about what appears in the SiteMap indexation
information...
it tends to be al ittle 'off target' (putting it nicely  ;)).

.

For a more accurate idea, you may want to use the site: operator.
Enter this in the Google Search...
site:http://gadgetguy.com.au

Then browse to the very last page of results.
The figure in teh top right is a pretty good idea of the number of
pages indexed.

You may also want to try some variants...
site:gadgetguy.com.au
site:gadgetguy.com.au/*

Also, if you see the paragraph about 'omitted results', do try that...
and again go to the end.

.

Please be aware of Google DataCenters...
Google has info on nuemrous computers/networks.
Some of these are a little less up-to-date than others... and you may
end up connecting to one of htose, and get different results.

The DC contacted can be random... can be influenced by ISP, Location,
whether using local Google or .com Google, possible Browser Googles
(like in MFF) etc.

Always perform the same search several times...
And don't panic if you see changes in the results.


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Andy@netAttention  
View profile
 More options Jun 24, 7:57 pm
From: Andy@netAttention
Date: Tue, 24 Jun 2008 16:57:20 -0700 (PDT)
Local: Tues, Jun 24 2008 7:57 pm
Subject: Re: 6 months since we mostly dropped out of the search index
Thanks for the comments John.

I work at the company which developed the GadgetGuy web site.

We would really appreciate advice on the real issue we've been
battling with here.

There are over 2,000 links to this website around the Internet, many
from respectable sites with good pagerank.

We have two sitemap feeds to Google. Both report about 75% of the
pages as Indexed, however we are only seeing pages from one sitemap
showing on Google. This is despite the fact the Googlebot is also
constantly crawling the site.

This has been the case for some time and some of the issues now of
multiple URLs pointing to one page have come about due to attempts to
fix this issue - such as making URLs shorter (which is controlled by
content authors using the CMS by the way).

It seems many of our pages are on the Google supplementary index.

Interestingly, on Google's new website trends page, our plot seems to
have tanked to ZERO even though the site still gets Google traffic,
it's like we're not measured anymore, or we're on some kind of black
list we can't explain.

see http://trends.google.com/websites?q=www.gadgetguy.com.au&geo=all&date...

Has anyone else experienced this and resolved the cause?

We're not interested in hearing about a whole lot of incremental
tweaks. We're quite well versed in best practice for SEO. What we're
trying to uncover here is a fundamental issue that is stopping pages
getting indexed.

Thanks!

Andy Farrell
netAttention

On Jun 23, 6:53 pm, JohnMu wrote:


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Autocrat  
View profile
 More options Jun 25, 5:16 am
From: Autocrat
Date: Wed, 25 Jun 2008 02:16:02 -0700 (PDT)
Local: Wed, Jun 25 2008 5:16 am
Subject: Re: 6 months since we mostly dropped out of the search index
Not being funny... but don't you think someone (such as JM) would have
at least 'hinted' at a serious or concerning issue?

Not saying that there isn't a major problem, and that we've all missed
it so far...
but the algo changes... and sometimes some of those changes hit hard.
Every so often, there seems some tweaks to the algo, and all of a
sudden a huge rise in 'isues' walks in here... and there are no major
problems... it's all 'little' things that add up!

So, not wanting to sound harsh... but you've been given some issues
already... maybe make sure those are resolved and see how things go?
The least that does it ensure that it's not those thigns causing the
issue.
Whilst making those changes, it also gives you the chance to look at
the code and see if anything else is showing up.

.

Your URLs all seem to include an item number at the end?
If so, you could setup a a simple script to check the URL, if it
matches the prefered format.
If not, send a http() response with a 301, and redirect to the shorter
URL.

Of course, it would greatly help if that was done 'after' ensuring all
links point to just one format (which currently they don't?).

Ensure the same format is in the sitemap, xo the bots get an idea of
the prefered format.

Also ensure those URLs are 'cleaner'... none of the 'possibly looking
stuffed' urls,
and that they are 'relevant' ... containing something to do with the
page title/h1 etc.
(I think this could be the biggest issue?)

.

I'd ensure the links are all working too (seems to irritate the GBot
no end!).
So thats all links go somewhere....
then once thats resolved.,..
make sure that the 'somewhere' has some content (as pointed out above,
some of those pages are basically 'empty').

After that... sort out the server response for non-existant URLs...
a 302 to the homepage is not exactly the 'best' approach!
It should respond with a proper 404 or 410.
Provide either a custom error page... or then set a delayed redirect.
So long as the bots get that 404/410, so they know not to index it.

.

Try fixing those...
see how thigns go...
failing that, ask again (preferably in this topic/thread... or at
least link to it so we know where to look ;)


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Phil Payne  
View profile
 More options Jun 25, 5:34 am
From: Phil Payne
Date: Wed, 25 Jun 2008 02:34:42 -0700 (PDT)
Local: Wed, Jun 25 2008 5:34 am
Subject: Re: 6 months since we mostly dropped out of the search index

> 7. Google is happy the sitemap is valid. We also validated the sitemap
> independantly before submitting. However, as this is dynamically
> generated there could be new content or very old content in it that is
> appearing for the first time; nonetheless if there was a problem the
> Google Webmaster report would report it - and there are no issues
> showing at the moment.

It may be 'valid' but it's not useful.  Every priority but one set to
the default, and every lastmod set to the same time.

    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Phil Payne  
View profile
 More options Jun 25, 5:50 am
From: Phil Payne
Date: Wed, 25 Jun 2008 02:50:05 -0700 (PDT)
Local: Wed, Jun 25 2008 5:50 am
Subject: Re: 6 months since we mostly dropped out of the search index

> ::: Canonical Domain issues :::
> I can access your site with and without www in the URL...

> http://www.gadgetguy.com.au/
> http://gadgetguy.com.au/

Duplicate content and/or a domain farm penalty:

http://gadget.netattention.com.au/


    Forward