From: cristina
Date: Thu, 02 Aug 2007 14:22:43 -0000
Local: Thurs, Aug 2 2007 10:22 am
Subject: Re: Anybody using the sexy X-Robots-Tags yet?
When you look at the 'view as HTML'
page for a PDF file you see the message that Googlebot automatically generates html versions of documents as it crawls the web. I presume that this means that a PDF document is automatically transformed into an HTML document to be crawled and content extracted from it for the search result page, like title, links collected, etc. as for any HTML document, even when nosnippet or noarchive are specified ???? On Aug 2, 2:38 pm, Sebastian wrote: > Yep, that's why we aren't sure. The HTML version is something like a
> huge snippet, both snippets and HTML extract are previews. Noarchive > OTOH just makes Google's fetched copy of the file unviewable. > Technically one could argue that Google can't/shouldn't/won't > transform unviewable contents into HTML previews, but from a searchers > perspective grouping the snippet and the HTML version under preview > makes more sense. That's why I vote for nosnippet, unless Google > invents NOPREVIEW or NOTRANSFORM to suppress HTML versions of PDFs, > transcripts from vids, text/link excerpts from flash, additional info > from jpegs ... > Sebastian > On Aug 2, 2:49 pm, cristina wrote: > > But isn't the 'view as HTML' link > > On Aug 2, 1:33 pm, Sebastian wrote: > > > My choice would be nosnippet, it should remove the snippet and its > > > On Aug 2, 9:26 am, JohnMu wrote: > > > > I've been wondering ... would "nosnippet" remove the HTML version? or > > > > John You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
| ||||||||||||||