Google Groups Home
Help | Sign in
Message from discussion Anybody using the sexy X-Robots-Tags yet?
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
cristina  
View profile
 More options Aug 2 2007, 10:22 am
From: cristina
Date: Thu, 02 Aug 2007 14:22:43 -0000
Local: Thurs, Aug 2 2007 10:22 am
Subject: Re: Anybody using the sexy X-Robots-Tags yet?
When you look at the 'view as HTML'
page for a PDF file you see the message that
Googlebot automatically generates html versions of documents
as it crawls the web.
I presume that this means that a PDF document
is automatically transformed into an HTML document
to be crawled and content extracted from it
for the search result page,
like title, links collected, etc.
as for any HTML document,
even when nosnippet or noarchive are specified
????

On Aug 2, 2:38 pm, Sebastian wrote:

> Yep, that's why we aren't sure. The HTML version is something like a
> huge snippet, both snippets and HTML extract are previews. Noarchive
> OTOH just makes Google's fetched copy of the file unviewable.
> Technically one could argue that Google can't/shouldn't/won't
> transform unviewable contents into HTML previews, but from a searchers
> perspective grouping the snippet and the HTML version under preview
> makes more sense. That's why I vote for nosnippet, unless Google
> invents NOPREVIEW or NOTRANSFORM to suppress HTML versions of PDFs,
> transcripts from vids, text/link excerpts from flash, additional info
> from jpegs ...

> Sebastian

> On Aug 2, 2:49 pm, cristina wrote:

> > But isn't the 'view as HTML' link
> > a feature for all PDF files
> > in search results?
> > As far as I know nosnippet prevents
> > the display of the snippet in search results,
> > not the crawling and conversion to HTML
> > of PDF files.

> > On Aug 2, 1:33 pm, Sebastian wrote:

> > > My choice would be nosnippet, it should remove the snippet and its
> > > extension, the view-as HTML link. However, using both crawler
> > > directives should certainly remove it. I really want to know it for
> > > sure ...
> > > Sebastian

> > > On Aug 2, 9:26 am, JohnMu wrote:

> > > > I've been wondering ... would "nosnippet" remove the HTML version? or
> > > > would you use "noarchive" for that? I could see a few applications
> > > > where it would be great to be able to suppress the converted version.

> > > > John


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.

Create a group - Google Groups - Google Home - Terms of Service - Privacy Policy
©2008 Google