I was testing something recently on my blog, and I went to see if some
of the images in my post had been indexed yet... and discovered that
*no* images from my site were indexed in Google Image search, period.
I have nothing blocking the images in robots.txt. I went through the
logs and did not see any requests from Google-Images this month, but
am not positive that is the correct UA for that bot (I know it is what
I would disallow if I did *not* want them indexed).
The site in question is on a subdomain (it is my Smackdown blog), and
in checking the Bad Neighborhood subdomain is also not indexed. I am
not sure why it being on a subdomain would matter, but my friend Donna
thought that might be the case.
Anybody have any clues on this one...? It's not a new site by any
means, no idea why nothing would be coming up.
Your website didn't have any images that would be indexed by Google
images at first look...
Are your images close to a 16x9 through 9x16 aspect ratio? Google
seems to take preference to images around that size... I could be
wrong... but alot google images are about that size... you only have
one extremely wide and short image on your blog for all that I can
see...
> Your website didn't have any images that would be indexed by Google
> images at first look...
> Are your images close to a 16x9 through 9x16 aspect ratio? Google
> seems to take preference to images around that size... I could be
> wrong... but alot google images are about that size... you only have
> one extremely wide and short image on your blog for all that I can
> see...
> Your website didn't have any images that would be indexed by Google
> images at first look...
> Are your images close to a 16x9 through 9x16 aspect ratio? Google
> seems to take preference to images around that size... I could be
> wrong... but alot google images are about that size... you only have
> one extremely wide and short image on your blog for all that I can
> see...
Not sure what image you are seeing... I have a bunch of images on my
site. MSN has 40 images indexed:
Yes, I know Blogger and WordPress subdomains show images, as do
About.com and a few others I saw. Like I said, I can't see why that
would be an issue... but I can't see any other issues either. It's not
an issue of Google Images coming to my site and deciding not to index
them... I don't see them coming at all.
As mentioned there, some of the important elements we look at with
regards to indexing images are:
- Descriptive text in alt-attributes
- Descriptive URLs
- Relevant text surrounding the images
- And of course the quality of the site & page and of the images
themselves
One thing you can do in addition is to opt-in to "enhanced image
search" in your Webmaster Tools account (in the "Tools" subsection).
There's no guaranteed way to get your images into Google Image search,
but attention to some of these details and continued work on your
website is never a bad idea :-).
> As mentioned there, some of the important elements we look at with
> regards to indexing images are:
> - Descriptive text in alt-attributes
> - Descriptive URLs
> - Relevant text surrounding the images
> - And of course the quality of the site & page and of the images
> themselves
Thank you John. :) According to Matt Cutts, my site falls into the
category of being one of "a small number of high-quality sites", so
hopefully that meets that last criteria. :D
I'm pretty sure I meet the criteria for at least some of the images on
my site, but as I said none at all are currently indexed. Thing is
there is no "Image Troubleshooting" section in GWT, and not a ton of
information about it out there (that you for those links btw, I had
seen the one from GWC a while ago but had forgotten about it).
The thing that concerns me is that there are none indexed in Yahoo
images either, although MSN does find them. I'm trying to figure out
if it is something beyond normal optimization at this point, but they
are all returning 200 status codes, and nothing is blocking them in
robots.txt.
I mean, I have some non-images in the same directory, html or text
files that *used* to be indexed... and now none of them are either:
Just makes me concerned that something else odd may be going on,
something that I won't be able to diagnose myself.
> One thing you can do in addition is to opt-in to "enhanced image
> search" in your Webmaster Tools account (in the "Tools" subsection).
Thanks, I added the blog to GWT the day before I posted this, I think,
and opted in... but again, as far as I know that just has to do with
the community "labeling", not with indexing itself. I guess we'll
see. :)
> One thing you can do in addition is to opt-in to "enhanced image
> search" in your Webmaster Tools account (in the "Tools" subsection).
Just a quick update... 1 month later, and still nothing in that
subdirectory, images or otherwise, are indexed. I mean, one of the
pages comes up in the top 5 for [wordpress exploit], which I believe
is a fairly competitive phrase, yet the support files for that post (2
images plus a cached search page) are not indexed at all. I would have
at least expected them to be indexed, even if they didn't rank highly
for any phrases.
Is it possible that my pages have been blocked or throttled from
passing PageRank, even to other internal pages, for some odd reason? I
just realized that this might be related... I have 66 posts, and 73
pages indexed in Google altogether, but only 30 of them are indexed in
a non-supplemental way, despite the fact that I have over 4,000 links
pointed at my site coming from over 550 unique domains, most of which
are deep links.
I mean, for instance... not that this is a page I need indexed to be
happy, but look at the example of my About page. It is linked to from
every single page on my blog (aside from the login page), and
therefore should be getting secondary benefit from every one of the
over 4k links pointed at my site. Yet when I search on an exact quote
from that page, but without using quotation marks, it does not show,
despite it being a non-competition phrase:
I'm not talking about it not ranking well, either. I went through all
9 pages at 100 results per page, which came out to 888 results when I
searched, and it simply is not there, despite the fact that it is
indexed. I know that they did away with the supplemental label, and
changed the way that indexed could be searched still under certain
conditions, but there is still a threshold in play, and as far as I
can tell no reason why that page (or many others on my site, including
the pages in the /images/ directory) would be beneath it.
So I took your post as an excuse to take a better look at your
site :-) and found a few things which I'd like to share with you. In
particular regarding your /images/ subdirectory I noticed that there
are some things which could be somewhat problematic. These are just
two examples:
- You appear have copies of other people's sites, eg /images/
viewgcache-getafreelinkfromwired.htm
- You appear to have copies of search results in an indexable way, eg /
images/viewgcache-bortlebotts.htm
I'm not sure why you would have content like that hosted on your site
in an indexable way, perhaps it was just accidentally placed there or
meant to be blocked from indexing. I trust you wouldn't do that on
purpose, right? At any rate, I'm sure you're aware that this kind of
content is not something that we would like to include in our index
and which is also mentioned in our Webmaster Guidelines.
I think it would be a good idea to clean this up, either by removing
it, moving it somewhere else (where can't be indexed) or making sure
it can't be indexed where it is at the moment. Once you have done
that, I'd recommend submitting a reconsideration request detailing the
changes that you have made (and perhaps linking to this thread).
One of the topics I deal with on the site is SEO, and at times that
means referencing specific search results. Since those can change from
the time at which I blog about them, I started taking snapshots. They
are no different from jpeg'd snapshots, aside from the fact that they
are interactive, and much easier to manage (I know of no software that
will take a complete snapshot beneath the fold). I do clearly mark
each of them as being cached pages, so I am not sure why they would
cause an issue. Honestly though, those are not the pages I am worried
about, and have no problem nofollowing those particular links. Those
pages fit the very definition of being built for users (support files
for the posts, for clarity) and not for search engines.
However, unless you know something I do not (ie. if you looked at the
behind the scenes parameters), this does nothing to explain why none
of the other files in that directory, or my About page, are indexed.
At the very least something appears to be going on in that arena, and
if at all possible I would love to figure what is going on there, and
if possible not just fix but be able to provide help to others who
might have a similar situation happen.
> So I took your post as an excuse to take a better look at your
> site :-) and found a few things which I'd like to share with you. In
> particular regarding your /images/ subdirectory I noticed that there
> are some things which could be somewhat problematic. These are just
> two examples:
> - You appear have copies of other people's sites, eg /images/
> viewgcache-getafreelinkfromwired.htm
> - You appear to have copies of search results in an indexable way, eg /
> images/viewgcache-bortlebotts.htm
> I'm not sure why you would have content like that hosted on your site
> in an indexable way, perhaps it was just accidentally placed there or
> meant to be blocked from indexing. I trust you wouldn't do that on
> purpose, right? At any rate, I'm sure you're aware that this kind of
> content is not something that we would like to include in our index
> and which is also mentioned in our Webmaster Guidelines.
> I think it would be a good idea to clean this up, either by removing
> it, moving it somewhere else (where can't be indexed) or making sure
> it can't be indexed where it is at the moment. Once you have done
> that, I'd recommend submitting a reconsideration request detailing the
> changes that you have made (and perhaps linking to this thread).
Also, quick side note, Google has upped the number of search pages
that they index on other sites lately, so I really don't see how that
is related or a no-no:
etc. If I were some sort of scraper site, relying on stolen content,
then of course I could understand that viewpoint. As it is it really
doesn't make sense.
Content like that would generally not be a problem if it were blocked
from being indexed. As it is (or was -- if you have already changed
something :-)), those pages are free to be crawled and indexed, which
is not really a good thing for the reasons already mentioned.
For what it's worth, I use the "Abduction!" Firefox add-on for
screenshots, it seems to work pretty good for me & captures the whole
web page.
Regarding your /about/ page -- I couldn't find any links to it from
your root page nor from the articles I checked. Are you sure that it's
linked everywhere? If we can't crawl your pages to find it, chances
are we might assume that it isn't as important as the rest of your
site. (and wouldn't it be neat to get some more content on it :-)?)
> Content like that would generally not be a problem if it were blocked
> from being indexed. As it is (or was -- if you have already changed
> something :-)), those pages are free to be crawled and indexed, which
> is not really a good thing for the reasons already mentioned.
Thanks, although you didn't actually mention the reasons. I assume you
meant the reference in Webmaster Guidelines to original content... but
that would explain an algo removal of a couple of pages, not the
deindexing of the entire directory.
> For what it's worth, I use the "Abduction!" Firefox add-on for
> screenshots, it seems to work pretty good for me & captures the whole
> web page.
Thanks, will check it out.
> Regarding your /about/ page -- I couldn't find any links to it from
> your root page nor from the articles I checked. Are you sure that it's
> linked everywhere? If we can't crawl your pages to find it, chances
> are we might assume that it isn't as important as the rest of your
> site. (and wouldn't it be neat to get some more content on it :-)?)
Well, it's minimalistic on purpose, and it *used* to be linked from
everywhere. I didn't notice that it was not included in my new theme,
that's my bad, thanks. :)
So, out of curiosity, are you saying you think those pages are the
reason that the directory was deindexed? Or more of a "reinclusion
might help, but you should block those first" kind of thing?
John, I sure do hope you're wrong. If Google doesn't want to index a
certain type of image or file, fine, no problem, but why in the world
would that then extend to an entire directory of images? It makes no
logical sense to penalize someone for something that is absolutely not
"wrong". It may not be something Google wants to index, but it was a
perfectly legitimate use of something that clarified a post and was
good for the users. In no way, shape or form, does it go against any
Google guideline that I know of. And then to expect someone to have
to do Google's job for them AND ask for forgiveness for something that
isn't wrong to begin with...wow, that would be just unreal. I'm going
to go out on a limb here and say that you must be wrong, John. As
much as I like and respect you, I can only hope that you're misreading
this one. Because if you are right, then Googlebot should be
considered certifiably insane and locked away. Give that algo some
strong medication and help him to understand that a folder-wide
penalty is a ridiculous way to react to a type of file it doesn't want
in its index.
> Content like that would generally not be a problem if it were blocked
> from being indexed. As it is (or was -- if you have already changed
> something :-)), those pages are free to be crawled and indexed, which
> is not really a good thing for the reasons already mentioned.
> For what it's worth, I use the "Abduction!" Firefox add-on for
> screenshots, it seems to work pretty good for me & captures the whole
> web page.
> Regarding your /about/ page -- I couldn't find any links to it from
> your root page nor from the articles I checked. Are you sure that it's
> linked everywhere? If we can't crawl your pages to find it, chances
> are we might assume that it isn't as important as the rest of your
> site. (and wouldn't it be neat to get some more content on it :-)?)
And now, for no apparent reason I can see, my latest blog post has
been deindexed. Currently linked to from the front page of Sphinn, was
in the top 10 for [googlebot crawls forms] pretty much since it went
live, now gone altogether.
It sure seems like this is out of whack. Could this be related...?
And now 3 of my last 4 posts have been deindexed. Not they aren't
ranking, but have been completely removed. They show in Blog Search,
but not the regular index.
> And now, for no apparent reason I can see, my latest blog post has
> been deindexed. Currently linked to from the front page of Sphinn, was
> in the top 10 for [googlebot crawls forms] pretty much since it went
> live, now gone altogether.
> It sure seems like this is out of whack. Could this be related...?
2 things spring to mind:
1. new post may reappear shorly. I've encounteres this behaviour
previously for new posts
2. #1 above wouldn't make sense for older posts, so if Google is
deindexing your content the question then becomes why. The fact that
John menions reinclusion request raises a big red flag to me...
Enjoyed reading some of your experiments BTW :)
Rgds
Richard
> And now 3 of my last 4 posts have been deindexed. Not they aren't
> ranking, but have been completely removed. They show in Blog Search,
> but not the regular index.
> On May 25, 11:24 pm, mvandemar wrote:
> > And now, for no apparent reason I can see, my latest blog post has
> > been deindexed. Currently linked to from the front page of Sphinn, was
> > in the top 10 for [googlebot crawls forms] pretty much since it went
> > live, now gone altogether.
> > It sure seems like this is out of whack. Could this be related...?
Ok, in case you were right I went ahead and tracked down the links to
those cached files in question and nofollowed each and every one. I'm
still having random posts disappearing though, and again I don't see
what those files would have to do with the text files or images in
that directory, but I'm giving it a show anyways. I submitted the
reconsideration request, and pointed them to this thread, as you
suggested. We'll see what happens.
On a side note, I did get a message in the dashboard acknowledging the
request, which is great, didn't know you guys were doing that these
days, but I noticed something in the wording:
> We'll review the site. If we find that it's no longer in
> violation of our Webmaster Guidelines, we'll reconsider
> our indexing of the site.
The wording that suggests that the site had to have been in violation
of the Guidelines in order for them to reconsider it is in conflict
with the verbiage changes that were made to the reconsideration form
back in February:
Maybe something along the lines of "As long as we find that it is not
in violation of our Webmaster Guidelines," would match up better. Just
a thought. :)
> Content like that would generally not be a problem if it were blocked
> from being indexed. As it is (or was -- if you have already changed
> something :-)), those pages are free to be crawled and indexed, which
> is not really a good thing for the reasons already mentioned.
> For what it's worth, I use the "Abduction!" Firefox add-on for
> screenshots, it seems to work pretty good for me & captures the whole
> web page.
> Regarding your /about/ page -- I couldn't find any links to it from
> your root page nor from the articles I checked. Are you sure that it's
> linked everywhere? If we can't crawl your pages to find it, chances
> are we might assume that it isn't as important as the rest of your
> site. (and wouldn't it be neat to get some more content on it :-)?)
> Ok, in case you were right I went ahead and tracked down the links to
> those cached files in question and nofollowed each and every one. I'm
> still having random posts disappearing though, and again I don't see
> what those files would have to do with the text files or images in
> that directory, but I'm giving it a show anyways. I submitted the
> reconsideration request, and pointed them to this thread, as you
> suggested. We'll see what happens.
> On a side note, I did get a message in the dashboard acknowledging the
> request, which is great, didn't know you guys were doing that these
> days, but I noticed something in the wording:
> > We'll review the site. If we find that it's no longer in
> > violation of our Webmaster Guidelines, we'll reconsider
> > our indexing of the site.
> The wording that suggests that the site had to have been in violation
> of the Guidelines in order for them to reconsider it is in conflict
> with the verbiage changes that were made to the reconsideration form
> back in February:
> Maybe something along the lines of "As long as we find that it is not
> in violation of our Webmaster Guidelines," would match up better. Just
> a thought. :)
> On May 25, 5:02 pm, JohnMu wrote:
> > Hi Michael
> > Content like that would generally not be a problem if it were blocked
> > from being indexed. As it is (or was -- if you have already changed
> > something :-)), those pages are free to be crawled and indexed, which
> > is not really a good thing for the reasons already mentioned.
> > For what it's worth, I use the "Abduction!" Firefox add-on for
> > screenshots, it seems to work pretty good for me & captures the whole
> > web page.
> > Regarding your /about/ page -- I couldn't find any links to it from
> > your root page nor from the articles I checked. Are you sure that it's
> > linked everywhere? If we can't crawl your pages to find it, chances
> > are we might assume that it isn't as important as the rest of your
> > site. (and wouldn't it be neat to get some more content on it :-)?)
Thank you, the images have started to return to the index, although
non of the non-cached search query, non-images have as of yet. I'm
guessing they just haven't been re-spidered yet though.
We have recently implemented URL rewriting on our images: they have
all disapeared and the new URLs have not been indexed either.
Previoulsy we had img.php?id=123 and now it is something like
word1,word2,word3-123.jpeg
Do you think it is the comma separator that's blocking it?
> As mentioned there, some of the important elements we look at with
> regards to indexing images are:
> - Descriptive text in alt-attributes
> - Descriptive URLs
> - Relevant text surrounding the images
> - And of course the quality of the site & page and of the images
> themselves
> One thing you can do in addition is to opt-in to "enhanced image
> search" in your Webmaster Tools account (in the "Tools" subsection).
> There's no guaranteed way to get your images into Google Image search,
> but attention to some of these details and continued work on your
> website is never a bad idea :-).