Google Groups Home
Help | Sign in
Discussions > Random chit-chat > Anybody using the sexy X-Robots-Tags yet?
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  19 messages - Collapse all
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
Sebastian  
View profile
 More options Aug 1 2007, 8:15 am
From: Sebastian
Date: Wed, 01 Aug 2007 12:15:52 -0000
Local: Wed, Aug 1 2007 8:15 am
Subject: Anybody using the sexy X-Robots-Tags yet?
Fun to play with:
http://sebastianx.blogspot.com/2007/07/handling-googles-neat-x-robots...

PDF files with index,nofollow REP header tags :)

Sebastian


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
JohnMu  
View profile
 More options Aug 1 2007, 9:21 am
From: JohnMu
Date: Wed, 01 Aug 2007 13:21:24 -0000
Local: Wed, Aug 1 2007 9:21 am
Subject: Re: Anybody using the sexy X-Robots-Tags yet?
Great idea! The things I was looking at were not that neat and I was
wondering what you couldn't do with a wildcards-enabled
robots.txt :-).

Will the other engines adopt it as well?

John


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Sebastian  
View profile
 More options Aug 1 2007, 2:07 pm
From: Sebastian
Date: Wed, 01 Aug 2007 18:07:46 -0000
Local: Wed, Aug 1 2007 2:07 pm
Subject: Re: Anybody using the sexy X-Robots-Tags yet?
I hope so. At least they've finally implemented sitemaps, somewhat ...
would make sense for all of them.
Sebastian

On Aug 1, 3:21 pm, JohnMu wrote:


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
cristina  
View profile
 More options Aug 1 2007, 3:38 pm
From: cristina
Date: Wed, 01 Aug 2007 12:38:10 -0700
Local: Wed, Aug 1 2007 3:38 pm
Subject: Re: Anybody using the sexy X-Robots-Tags yet?
It is great to be able to set meta tags via the HTTP header!

But Sebastian,
what does index,nofollow mean for a PDF file?
What links be collected from a PDF file if you would
have set index,follow for a PDF file in the HTTP header?

On Aug 1, 1:15 pm, Sebastian wrote:


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Sebastian  
View profile
 More options Aug 1 2007, 5:14 pm
From: Sebastian
Date: Wed, 01 Aug 2007 21:14:46 -0000
Local: Wed, Aug 1 2007 5:14 pm
Subject: Re: Anybody using the sexy X-Robots-Tags yet?
I knew someone would point that out. It's unusual at least. If that
PDF is a marketing study linking out to the competition probably it
makes even sense hehe. Good fetch :)
Actually, you can use noindex,noarchive,nosnippet too making it an
even more dangling node. Everything valid in a robots meta tag can be
stuffed into an X-Robots-Tag.
Sebastian

On Aug 1, 9:38 pm, cristina wrote:


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
JohnMu  
View profile
 More options Aug 2 2007, 3:26 am
From: JohnMu
Date: Thu, 02 Aug 2007 07:26:45 -0000
Local: Thurs, Aug 2 2007 3:26 am
Subject: Re: Anybody using the sexy X-Robots-Tags yet?
I've been wondering ... would "nosnippet" remove the HTML version? or
would you use "noarchive" for that? I could see a few applications
where it would be great to be able to suppress the converted version.

John


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
cristina  
View profile
 More options Aug 2 2007, 3:32 am
From: cristina
Date: Thu, 02 Aug 2007 00:32:01 -0700
Local: Thurs, Aug 2 2007 3:32 am
Subject: Re: Anybody using the sexy X-Robots-Tags yet?
Hi Sebastian,
It is very interesting that you mentioned
the nofollow for PDF files, this new possibility
created by the meta tags via HTTP headers.

Some PDF files do not have well designed
hyperlinks in them, some have URLs
without hyperlinks, and your posting
shows that it would be a good idea
to re-consider the way some PDF files
are done for better embedded links,
maybe even to have some sort of site navigation.
I assume that the links
appearing in 'view as html'
in the search results for a PDF file
are the links collected
by Googlebot like from any HTML file.

On Aug 1, 10:14 pm, Sebastian wrote:


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Sebastian  
View profile
 More options Aug 2 2007, 8:33 am
From: Sebastian
Date: Thu, 02 Aug 2007 12:33:47 -0000
Local: Thurs, Aug 2 2007 8:33 am
Subject: Re: Anybody using the sexy X-Robots-Tags yet?
My choice would be nosnippet, it should remove the snippet and its
extension, the view-as HTML link. However, using both crawler
directives should certainly remove it. I really want to know it for
sure ...
Sebastian

On Aug 2, 9:26 am, JohnMu wrote:


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
cristina  
View profile
 More options Aug 2 2007, 8:49 am
From: cristina
Date: Thu, 02 Aug 2007 12:49:02 -0000
Local: Thurs, Aug 2 2007 8:49 am
Subject: Re: Anybody using the sexy X-Robots-Tags yet?
But isn't the 'view as HTML' link
a feature for all PDF files
in search results?
As far as I know nosnippet prevents
the display of the snippet in search results,
not the crawling and conversion to HTML
of PDF files.

On Aug 2, 1:33 pm, Sebastian wrote:


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Sebastian  
View profile
 More options Aug 2 2007, 9:38 am
From: Sebastian
Date: Thu, 02 Aug 2007 13:38:16 -0000
Local: Thurs, Aug 2 2007 9:38 am
Subject: Re: Anybody using the sexy X-Robots-Tags yet?
Yep, that's why we aren't sure. The HTML version is something like a
huge snippet, both snippets and HTML extract are previews. Noarchive
OTOH just makes Google's fetched copy of the file unviewable.
Technically one could argue that Google can't/shouldn't/won't
transform unviewable contents into HTML previews, but from a searchers
perspective grouping the snippet and the HTML version under preview
makes more sense. That's why I vote for nosnippet, unless Google
invents NOPREVIEW or NOTRANSFORM to suppress HTML versions of PDFs,
transcripts from vids, text/link excerpts from flash, additional info
from jpegs ...

Sebastian

On Aug 2, 2:49 pm, cristina wrote:


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
cristina  
View profile
 More options Aug 2 2007, 10:22 am
From: cristina
Date: Thu, 02 Aug 2007 14:22:43 -0000
Local: Thurs, Aug 2 2007 10:22 am
Subject: Re: Anybody using the sexy X-Robots-Tags yet?
When you look at the 'view as HTML'
page for a PDF file you see the message that
Googlebot automatically generates html versions of documents
as it crawls the web.
I presume that this means that a PDF document
is automatically transformed into an HTML document
to be crawled and content extracted from it
for the search result page,
like title, links collected, etc.
as for any HTML document,
even when nosnippet or noarchive are specified
????

On Aug 2, 2:38 pm, Sebastian wrote:


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Sebastian  
View profile
 More options Aug 2 2007, 10:36 am
From: Sebastian
Date: Thu, 02 Aug 2007 14:36:31 -0000
Local: Thurs, Aug 2 2007 10:36 am
Subject: Re: Anybody using the sexy X-Robots-Tags yet?
Maybe, would make sense to store contents in a somewhat unified
format, but we're talking about the *linked* HTML version available
from the SERP. When you chose noarchive for a HTML page it removes the
"cached" link and the call from the toolbar as well. That does not
mean that Google didn't keep a copy ;)  BTW looking at the HTML
version of a PDF might lead to ideas on optimizing the original for
the engines ...
Sebastian

On Aug 2, 4:22 pm, cristina wrote:


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
JohnMu  
View profile
 More options Aug 3 2007, 3:19 am
From: JohnMu
Date: Fri, 03 Aug 2007 07:19:58 -0000
Local: Fri, Aug 3 2007 3:19 am
Subject: Re: Anybody using the sexy X-Robots-Tags yet?
Some of the search results shown at http://blogsci.com/randoms/academic-publishers-as-spammers
show a full snippet but suppress the HTML version of the PDF. I wonder
how they're doing that (besides the fact that they're cloaking to
Google and trying to take out visitors hoping to view the PDFs)...

John


    Forward