Account Options

  1. Sign in
Google Groups Home
« Groups Home
Discussions > Crawling, indexing, and ranking > 6 months since we mostly dropped out of the search index
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  15 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Autocrat  
View profile  
 More options Jun 22 2008, 6:17 pm
From: Autocrat
Date: Sun, 22 Jun 2008 15:17:46 -0700 (PDT)
Local: Sun, Jun 22 2008 6:17 pm
Subject: Re: 6 months since we mostly dropped out of the search index
::: Cross Thread :::
http://groups.google.com/group/Google_Webmaster_Help-
Indexing/browse_thread/thread/
43a75cd572b9d90/24bdd27f0b763685#24bdd27f0b763685

- - - - - - -

Please try to use the same Topic if it is the same issue.
Simple view your profile or search on your profile name.

If your Topic has not received a 'timely' reply (that being within 24
hours), then please feel free to reply to your own post with
:bump:
or similar.

This helps avoid filling the topic list up with 'dead' posts.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Autocrat  
View profile  
 More options Jun 22 2008, 6:42 pm
From: Autocrat
Date: Sun, 22 Jun 2008 15:42:54 -0700 (PDT)
Local: Sun, Jun 22 2008 6:42 pm
Subject: Re: 6 months since we mostly dropped out of the search index
::: Canonical Domain issues :::
I can access your site with and without www in the URL...

http://www.gadgetguy.com.au/
http://gadgetguy.com.au/

This could be perceived as Internal Duplication.
This could also be a possible issue for Link Loss (the PR value of
inbound links is shared to different URLs rather than considated).

Employ a server based 301 redirect from the unwanted format to the
prefered format.
Select the prefered format in the GWMT.
ensure any/all links use the same format.
ensure that the search engine sitemap uses the prefered format

.

::: Internal Duplication / Multiple URL issues :::
I seem to be able to reach certain content through different URLs....

http://gadgetguy.com.au/
http://gadgetguy.com.au/index.asp
http://www.gadgetguy.com.au/article.asp?m_article=3052
http://www.gadgetguy.com.au/a-guide-to-the-apple-store-sydney-iwent,-...

This could be perceived as Internal Duplication.
This could also be a possible issue for Link Loss (the PR value of
inbound links is shared to different URLs rather than considated).

Try to ensure only 1 URL is used with any/all links.
If absolutely necessary (such as after fixing this issue and to cover
older items), employ a server based 301 permenant redirect to the
prefered item URL.
Include the prefered URL in the search engine sitemap (and disclude
the unwanted one).

.

::: Strange URLs :::
Some URLs contain unusual characters...

http://www.gadgetguy.com.au/a-guide-to-the-apple-store-sydney-iwent,-...

I'm seeing commas in that URL.

No idea if that could cause a problem... but I don't think it should
be happening.

.

::: Long/Large URLs / Possibly 'stuffed' URLs :::
Some URLs seem excessively long...

http://gadgetguy.com.au/computing-computer-laptop-notebook-software-w...
http://gadgetguy.com.au/home-appliances-electrical-whitegoods-aircond...

These URLs also seem to have an assortment of words in them that don't
seem very 'focused'... instead they look 'stuffed' with possible
keywords/terms.

.

::: Invalid Code :::
You seem to be mixing HTML and XHTML fomatting on some markup... a
prime example is in teh Head section...

<meta name="robots" content="all" />
<meta name="description" content="GadgetGuy.com.au - Reviews, news,
comparisons and buyers guides on the latest gadgets, computers, home
theatre, phones, games and cool stuff!" />

Your DocType says HTML ... so those items should not be ending with a
/>
only with a


.

::: You MAY have hidden content :::
Looking thorugh the source code, I see this...

<div id="nav0" style="display:none"></div>
<script language="JavaScript" type="text/javascript">
document.getElementById("nav0").innerHTML =
document.getElementById("subnav").innerHTML;
</script>

This means that with JS Off and CSS On, there is content that is
hidden from view, and cannot be made visible.

For best practice, ALL content should be 'visible'... and made hidden
be employing JS to change the style/class.

.

::: Malformed robots.txt :::
Your robots file may be acceptable to SOME bots, but possibly not all.

Allow: /

Is not 'standard' ... it may be worht looking up the robots standard.

.

::: Incorrect Sitemap :::
It's possible that your SE Sitemap is incorrectly formed... as well as
possible 'incorrect' in it's information.

http://gadgetguy.com.au/sitemap.xml

1) scroll right to the bottom... and see a bunch of entries that
really don't look right

2) Please look at the URL information you are suppling...
<changefreq>daily</changefreq>
<priority>0.50</priority>
Is that really true?

.
.
.

Hope all that helps.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
kscope  
View profile  
 More options Jun 22 2008, 11:38 pm
From: kscope
Date: Sun, 22 Jun 2008 20:38:10 -0700 (PDT)
Local: Sun, Jun 22 2008 11:38 pm
Subject: Re: 6 months since we mostly dropped out of the search index
Thanks for the reply Autocrat, I'll speak to our developer to go over
the points you've raised. Sorry about the re-posting, I didn't know
about the 'bump' option.

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
kscope  
View profile  
 More options Jun 23 2008, 1:03 am
From: kscope
Date: Sun, 22 Jun 2008 22:03:23 -0700 (PDT)
Local: Mon, Jun 23 2008 1:03 am
Subject: Re: 6 months since we mostly dropped out of the search index
And here's the feedback I received on your comments:

1. No external links, internal links, or sitemap links use http://gadgetguy.com.au
so this would not be an issue.
2. Internal Duplication is a very recent issue; however again the
Sitemap and all the Index and Section home pages use the latest links
which are being crawled, so this should not be a significant issue.
Duplication does not explain why neither page would be indexed, or why
deeper indexing other than Homepage and Section Home is not occuring.
3. Commas in URLs we will investigate.
4. Long/keyword stuffed URLs. Have edited to fix issues of length and
keyword stuffing.
5. Invalid Code - mmm... I very much doubt this is an issue. The
DocType is used by the browser to work out how to interpret the HTML
at render time. Google has no interest in this, so I see no reason how
this could cause a problem.
6. We don't have hidden content. The DIV layer being hidden is an
empty formatting control.
7. Google is happy the sitemap is valid. We also validated the sitemap
independantly before submitting. However, as this is dynamically
generated there could be new content or very old content in it that is
appearing for the first time; nonetheless if there was a problem the
Google Webmaster report would report it - and there are no issues
showing at the moment.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Autocrat  
View profile  
 More options Jun 23 2008, 4:35 am
From: Autocrat
Date: Mon, 23 Jun 2008 01:35:41 -0700 (PDT)
Local: Mon, Jun 23 2008 4:35 am
Subject: Re: 6 months since we mostly dropped out of the search index
?

Well, not alot can be said really.
.

How about Hidden links/text?
Viewing the page without images shows the top nav bar becomes
'invisibile' ....
Viewing the page without images shows a lot of text that is hard to
read.

(hell, even if it isn't an SE issue, should be fixed for certain user-
types.)

.

How about applying rel="nofollow" to the links in noscrip[t tags?

.

Links to non-content pages?

Lets go here...
http://gadgetguy.com.au/home-appliances-whitegoods-air-conditioning-h...
first thing on the left is the Russel Hobbs rice cooker...
(with 2 links to 2 urls! We'll take the top one)
http://gadgetguy.com.au/russell-hobbs-reflections-rice-cooker-review-...

now I see 2 more links...
Review | Features

Review...
http://gadgetguy.com.au/product.asp?id=14&m_review=0
Erm... not really seeing anything important on here...
nothing specific to Reviewing hte Russel Hobbs rice cooker anyway
(inc. irrelevant title, h1 etc.)

Features...
http://gadgetguy.com.au/features.asp?id=14&m_review=0
This one has no real content at all?

So what are the bots meant to be indexing?
.

Weak Image Alt attributes...
Why does it just say 'Product' ?
Not exactly useful to blind people (including bots ;))...
so missing an opportunity to not only be helpful, but possibly get an
extra point or two with the SE's.

.

Any exernal duplicate content...
Is it the content 'yours' or is it from other sources?
Is it being 'fed' to other sites?
Is it being scrapped/copied by other sites?

.
.
.

Hope all that helps.
(May be worth getting the develoepr in as well, saves you running back
and forth ;))


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
JohnMu Google employee  
View profile  
 More options Jun 23 2008, 4:53 am
From: JohnMu
Date: Mon, 23 Jun 2008 01:53:22 -0700 (PDT)
Local: Mon, Jun 23 2008 4:53 am
Subject: Re: 6 months since we mostly dropped out of the search index
Hi kscope and welcome to the groups!

Looking at the pages indexed, I see two issues which Autocrat has
already mentioned (thanks, Autocrat!) that I'd like to expand on:

1. Long & crazy URLs
I ran into this URL while looking at [site:gadgetguy.com.au] :
http://www.gadgetguy.com.au/small-kitchen-appliances-toaster-kettle-c...

Now I'm all for having descriptive URLs, but .... this seems to be
taking it a bit too far and I have a bit of trouble identifying
anything that matches in the content of your page.

The problem with URLs like this is that they almost appear to be
random and in fact I can get exactly the same page by using something
like: http://www.gadgetguy.com.au/xyzzy-42.html . In general, you
should make sure that you have only one URL that leads to your content
-- all others should either redirect to the proper URL or return HTTP
result code 404 to signal that the URL is invalid. Without that, your
site is leading us (and all other crawlers) on a wild goose chase.

If your CMS is not able to handle this properly (one URL per piece of
content), I would recommend not using rewritten URLs so that we can
recognize and skip over unimportant parameters in your URL query
string.

2. Broken HTML code
In general, we try to get it right regardless of what a webmaster uses
on his page. However, there are limits to what we can guess at.
Although this is definitely not as important as the first point, you
can see this happening when you search for something like:
http://www.google.com/search?q=site:www.gadgetguy.com.au+intitle:shor...

Hope it helps!
John


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
kscope  
View profile  
 More options Jun 23 2008, 7:14 am
From: kscope
Date: Mon, 23 Jun 2008 04:14:24 -0700 (PDT)
Local: Mon, Jun 23 2008 7:14 am
Subject: Re: 6 months since we mostly dropped out of the search index
Hello Autocrat and John,

Thanks for the welcome, and for the feedback - I'm sure to be a
regular here to read up on the finer points of SE stuff.

OK, I have almost finished paring back the URLs of all the site
sections, the pages that have actually been indexed. Of the pages that
aren't being indexed, the URLs are only the article title or the
product name.

I'm told that via Webmaster Tools it appears that Google has indexed
about 70% of our sitemap. Can you tell me what the relationship is, or
lag is, between a sitemap being indexed and the pages making it into
the search index.

I have also passed along the URL of this thread to our developer, so
either he will chime in or I'll continue to be the point man to
hopefully getting us more SE-friendly.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Autocrat  
View profile  
 More options Jun 23 2008, 8:53 am
From: Autocrat
Date: Mon, 23 Jun 2008 05:53:31 -0700 (PDT)
Local: Mon, Jun 23 2008 8:53 am
Subject: Re: 6 months since we mostly dropped out of the search index
Don't worry too much about what appears in the SiteMap indexation
information...
it tends to be al ittle 'off target' (putting it nicely  ;)).

.

For a more accurate idea, you may want to use the site: operator.
Enter this in the Google Search...
site:http://gadgetguy.com.au

Then browse to the very last page of results.
The figure in teh top right is a pretty good idea of the number of
pages indexed.

You may also want to try some variants...
site:gadgetguy.com.au
site:gadgetguy.com.au/*

Also, if you see the paragraph about 'omitted results', do try that...
and again go to the end.

.

Please be aware of Google DataCenters...
Google has info on nuemrous computers/networks.
Some of these are a little less up-to-date than others... and you may
end up connecting to one of htose, and get different results.

The DC contacted can be random... can be influenced by ISP, Location,
whether using local Google or .com Google, possible Browser Googles
(like in MFF) etc.

Always perform the same search several times...
And don't panic if you see changes in the results.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Andy@netAttention  
View profile  
 More options Jun 24 2008, 7:57 pm
From: Andy@netAttention
Date: Tue, 24 Jun 2008 16:57:20 -0700 (PDT)
Local: Tues, Jun 24 2008 7:57 pm
Subject: Re: 6 months since we mostly dropped out of the search index
Thanks for the comments John.

I work at the company which developed the GadgetGuy web site.

We would really appreciate advice on the real issue we've been
battling with here.

There are over 2,000 links to this website around the Internet, many
from respectable sites with good pagerank.

We have two sitemap feeds to Google. Both report about 75% of the
pages as Indexed, however we are only seeing pages from one sitemap
showing on Google. This is despite the fact the Googlebot is also
constantly crawling the site.

This has been the case for some time and some of the issues now of
multiple URLs pointing to one page have come about due to attempts to
fix this issue - such as making URLs shorter (which is controlled by
content authors using the CMS by the way).

It seems many of our pages are on the Google supplementary index.

Interestingly, on Google's new website trends page, our plot seems to
have tanked to ZERO even though the site still gets Google traffic,
it's like we're not measured anymore, or we're on some kind of black
list we can't explain.

see http://trends.google.com/websites?q=www.gadgetguy.com.au&geo=all&date...

Has anyone else experienced this and resolved the cause?

We're not interested in hearing about a whole lot of incremental
tweaks. We're quite well versed in best practice for SEO. What we're
trying to uncover here is a fundamental issue that is stopping pages
getting indexed.

Thanks!

Andy Farrell
netAttention

On Jun 23, 6:53 pm, JohnMu wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Autocrat  
View profile  
 More options Jun 25 2008, 5:16 am
From: Autocrat
Date: Wed, 25 Jun 2008 02:16:02 -0700 (PDT)
Local: Wed, Jun 25 2008 5:16 am
Subject: Re: 6 months since we mostly dropped out of the search index
Not being funny... but don't you think someone (such as JM) would have
at least 'hinted' at a serious or concerning issue?

Not saying that there isn't a major problem, and that we've all missed
it so far...
but the algo changes... and sometimes some of those changes hit hard.
Every so often, there seems some tweaks to the algo, and all of a
sudden a huge rise in 'isues' walks in here... and there are no major
problems... it's all 'little' things that add up!

So, not wanting to sound harsh... but you've been given some issues
already... maybe make sure those are resolved and see how things go?
The least that does it ensure that it's not those thigns causing the
issue.
Whilst making those changes, it also gives you the chance to look at
the code and see if anything else is showing up.

.

Your URLs all seem to include an item number at the end?
If so, you could setup a a simple script to check the URL, if it
matches the prefered format.
If not, send a http() response with a 301, and redirect to the shorter
URL.

Of course, it would greatly help if that was done 'after' ensuring all
links point to just one format (which currently they don't?).

Ensure the same format is in the sitemap, xo the bots get an idea of
the prefered format.

Also ensure those URLs are 'cleaner'... none of the 'possibly looking
stuffed' urls,
and that they are 'relevant' ... containing something to do with the
page title/h1 etc.
(I think this could be the biggest issue?)

.

I'd ensure the links are all working too (seems to irritate the GBot
no end!).
So thats all links go somewhere....
then once thats resolved.,..
make sure that the 'somewhere' has some content (as pointed out above,
some of those pages are basically 'empty').

After that... sort out the server response for non-existant URLs...
a 302 to the homepage is not exactly the 'best' approach!
It should respond with a proper 404 or 410.
Provide either a custom error page... or then set a delayed redirect.
So long as the bots get that 404/410, so they know not to index it.

.

Try fixing those...
see how thigns go...
failing that, ask again (preferably in this topic/thread... or at
least link to it so we know where to look ;)


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Phil Payne  
View profile  
 More options Jun 25 2008, 5:34 am
From: Phil Payne
Date: Wed, 25 Jun 2008 02:34:42 -0700 (PDT)
Local: Wed, Jun 25 2008 5:34 am
Subject: Re: 6 months since we mostly dropped out of the search index

> 7. Google is happy the sitemap is valid. We also validated the sitemap
> independantly before submitting. However, as this is dynamically
> generated there could be new content or very old content in it that is
> appearing for the first time; nonetheless if there was a problem the
> Google Webmaster report would report it - and there are no issues
> showing at the moment.

It may be 'valid' but it's not useful.  Every priority but one set to
the default, and every lastmod set to the same time.

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Phil Payne  
View profile  
 More options Jun 25 2008, 5:50 am
From: Phil Payne
Date: Wed, 25 Jun 2008 02:50:05 -0700 (PDT)
Local: Wed, Jun 25 2008 5:50 am
Subject: Re: 6 months since we mostly dropped out of the search index

> ::: Canonical Domain issues :::
> I can access your site with and without www in the URL...

> http://www.gadgetguy.com.au/
> http://gadgetguy.com.au/

Duplicate content and/or a domain farm penalty:

http://gadget.netattention.com.au/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
JohnMu Google employee  
View profile  
 More options Jun 25 2008, 9:03 am
From: JohnMu
Date: Wed, 25 Jun 2008 06:03:25 -0700 (PDT)
Local: Wed, Jun 25 2008 9:03 am
Subject: Re: 6 months since we mostly dropped out of the search index
Hi Andy and welcome to the groups!

It seems that someone did a complete redesign around mid January. You
can see some of the old design at http://web.archive.org/web/20070823193356/http://www.gadgetguy.com.au/
. In general, the best practice would be to have all the old URLs 301
redirect to the appropriate new ones. However, in this case, there
were a few things done in a suboptimal way:

1. Old URLs are 302 redirected to the homepage

2. For a period of about 3 weeks, it looks like you had robots meta
tags with a value of "none" across the site.

3. Until recently, many URLs had many keywords and spaces in them. For
example, you can see this on
http://74.125.39.104/search?q=cache:xWJcPWZfH8MJ:www.gadgetguy.com.au...

It contains a link like this:
<a href="photo-and-video-photography digital camera camcorder
videocamera handycam canon sony panasonic nikon channel 7 sunrise
australia-5.html">Photo and Video</a>

This ties in with my previous comment on long, difficult to understand
URLs. There are a lot of URLs that could end up showing that
content... this means we might spend a lot of time crawling through
URLs that are really just duplicates. URL length is not an issue
(apart from making it close to impossible for users to link to your
pages without copy&pasting the URL).

At this time, I would work on designing a very simple URL structure
that allows you to use relevant keywords in your URL so that the user
can understand what might be shown on the page. Also, you would want
to make sure that all canonical versions of your existing URLs
(including the old style ones) are 301 redirected to the new & simple
URL structure.

Hope it helps!
John


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Andy@netAttention  
View profile  
 More options Jun 25 2008, 7:05 pm
From: Andy@netAttention
Date: Wed, 25 Jun 2008 16:05:59 -0700 (PDT)
Local: Wed, Jun 25 2008 7:05 pm
Subject: Re: 6 months since we mostly dropped out of the search index
Hi John,

Thanks - that's a big help.

1. We've removed any remaining on 302's and changed to 301.
2. The robots none tag has us stumped. Can you see when this was?
3. We're already well on top of the URL issue too and have redirected
(301) the canonical domains to be sure.

So, hopefully, things will improve soon?

Thanks,
Andy

On Jun 25, 11:03 pm, JohnMu wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
JohnMu Google employee  
View profile  
 More options Jun 25 2008, 7:28 pm
From: JohnMu
Date: Wed, 25 Jun 2008 16:28:44 -0700 (PDT)
Local: Wed, Jun 25 2008 7:28 pm
Subject: Re: 6 months since we mostly dropped out of the search index
Hi Andy
Just a few short remarks:

1. The old URLs should redirect to the appropriate new URLs for
maximum effect. By redirecting it to the root URL, we tend to lose the
context provided by the old URL.

2. The robots "none" appears to have started when the new structure
was put up (around mid January). However, since that has long been
resolved, there's not much that can be done at this point to change it
-- except making sure that things work better from now on :)

Cheers
John


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »