(News) Googlebot Stops Lynxing, Starts Using Mozilla

Roy Schestowitz

unread,

Feb 23, 2006, 12:43:42 PM2/23/06

to

http://www.adsensebits.com/node/24

Google will no longer view our sites as textual fragments, but rather render
the pages and interpret them in a richer context (including JS and CSS).

John Bokma

unread,

Feb 23, 2006, 1:06:52 PM2/23/06

to

Roy Schestowitz <newsg...@schestowitz.com> wrote:

Yeah, author is doing a lot of wild guessing based on what files are
fetched.

render pages - doubtfull, unless for providing thumbnails

checking spam hiding - maybe, although CSS is complex enough to make this
hard, or very hard.

bypassing some stupid checks - yes, some block everything that doesn't say
Mozilla.

--
John Experienced Perl programmer: http://castleamber.com/

Fart Free Fox: http://johnbokma.com/firefox/find-in-page-sound.html

canadafred

unread,

Feb 23, 2006, 1:09:38 PM2/23/06

to

"Roy Schestowitz" <newsg...@schestowitz.com> wrote in message
news:dtksb9$2v8g$1...@godfrey.mcc.ac.uk...

That's cool Roy,

I checked out the article. I have a question maybe you could answer. For us
who are less programming savvy, can you tell us what the implications would
be with this development? What does this mean to our web sites? How will
this affect the average "back of the room" code cruncher?

Thanks

--

Fred canadi...@hotmail.com
Ethical SEO Tips, Tools and Resources
www.rezultz-web-site-promotion.com

Roy Schestowitz

unread,

Feb 23, 2006, 1:28:05 PM2/23/06

to

__/ [ John Bokma ] on Thursday 23 February 2006 18:06 \__

> Roy Schestowitz <newsg...@schestowitz.com> wrote:
>
>> http://www.adsensebits.com/node/24
>>
>> Google will no longer view our sites as textual fragments, but rather
>> render the pages and interpret them in a richer context (including JS
>> and CSS).
>
> Yeah, author is doing a lot of wild guessing based on what files are
> fetched.
>
> render pages - doubtfull, unless for providing thumbnails
>
> checking spam hiding - maybe, although CSS is complex enough to make this
> hard, or very hard.
>
> bypassing some stupid checks - yes, some block everything that doesn't say
> Mozilla.

Interesting that you mention all of this as I probably wasn't critical
enough. I had posted this to the group before I had the chance to read
rather than skimp.

Roy

Roy Schestowitz

unread,

Feb 23, 2006, 1:48:44 PM2/23/06

to

__/ [ canadafred ] on Thursday 23 February 2006 18:09 \__

> "Roy Schestowitz" <newsg...@schestowitz.com> wrote in message
> news:dtksb9$2v8g$1...@godfrey.mcc.ac.uk...
>> http://www.adsensebits.com/node/24
>>
>> Google will no longer view our sites as textual fragments, but rather
>> render
>> the pages and interpret them in a richer context (including JS and CSS).
>
> That's cool Roy,
>
> I checked out the article. I have a question maybe you could answer. For us
> who are less programming savvy, can you tell us what the implications would
> be with this development? What does this mean to our web sites? How will
> this affect the average "back of the room" code cruncher?
>
> Thanks

I am flattered that you consider me "programming-savvy", but I am not
proficient or well-oriented in this domain which is browser renderers. I
know a fair bit about KHTML and Gecko, yet I have never used Lynx, only seen
screenshots of it.

If Google decided to render pages graphically, which requires some extra
computational labour, they could initially choose to do it with just a
subsample of popular pages. Alternatively, they could just employ Lynx and
subsequently Mozilla /only/ if the scanned page has changed based on shallow
analysis by Lynx.

So what could they do with the product (assuming that John Bokma's statement
is inaccurate and thing work as the blog suggests)? With tools that are
equivalent to http://khtml2png.sourceforge.net/ , they could make use of
pattern recognition and image analysis programs (also see
http://browsershots.org/). Interpretation of images happens to be my field
of research actually. All sorts of tests could then be run and lead to a
reward or penalty. Running a series of such test, gives a figure of merit,
which may depend on the surfer in question (use parameters like browser,
O/S, screen size, known accessibility issues, etc.). Tests I can think of
are:

* Does the page get rendered gracefully in all browser rendering engines?

* Does the page redirect using JavaScript (like richo.de and bmw.de)?

* How 'pretty' is the page? Cosmetics are a matter of taste and a 'fluffy'
notion, so I doubt it's of any fair or valued use.

* How heavy is the page (including images) and does the size justify the
visual gains and enhancements? Is there a 'nice' combination of colours used
(discerns pricy design work from amateurish DIY)

* Is anything pornographic possibly contained in the page (use a *rough*
scoring mechanism)?

* What screen sizes are properly supported? Should the search engines deliver
different results depending on the perceived support for screen sizes and
other pre-requisites like JavaScript? Could PDA users get different results
pages altogether?

* Arguments similar to the above but in reference to the visually impaired,
the astigmatic. This can currently be done by looking at colour contrast (in
the CSS/Source) and font sizes. What about a special option for the
colour-blind, e.g. "give me no pages with yellow and green on the same
page"?

* Okay, enough for now... *smile*

To Webmasters, this also means that more bandwidth will be consumed by the
crawler in question, if it truly exists or ever *will* exist.

With kind regards,

canadafred

unread,

Feb 23, 2006, 2:13:49 PM2/23/06

to

"Roy Schestowitz" <newsg...@schestowitz.com> wrote in message

news:dtl057$30an$1...@godfrey.mcc.ac.uk...

Wow, what a great answer.

I get it, I think. The search engine could deliver results based on the
visitors browsing preferences, physical and hardware limitations. Site
owners and developers would be required to adhere to stricter design
standards in order to deliver web content that satisfies a broader range of
browser criteria. Thus would be the need for the crawler to analyze the .js
and .css in order to create a "snapshot". This "snapshot" could be used to
determine compatibility with a searcher's limitations both physical and
electronic. The search engine could then deliver results based on what it
has perceived to be acceptable web sites that both are relevant to the
search query and that satisfy the searchers additional requirements.

www.1-script.com

unread,

Feb 26, 2006, 10:13:02 PM2/26/06

to

Roy Schestowitz wrote:

> Google will no longer view our sites as textual fragments, but rather
> render
> the pages and interpret them in a richer context (including JS and
> CSS).

I doubt the part about rendering very much. It's the content they are
after, not the presentation. Also, this article reads as if it was written
couple years back. Googlebot is at version 2.1, not 2.0

Also, Google's Deepbot has been signing as mozilla-compatible for couple
years already, maybe as far back as 2003. Freshbot still signs as
Googlebot, not Mozilla-compatible. Makes sense: it only looks for links
and nothing else.

--
Cheers,
Dmitri
See Site Sig Below
-------------------------------------

--
##-----------------------------------------------##
Article posted with Web Developer's USENET Archive
http://www.1-script.com/forums
Web and RSS gateway to your favorite newsgroup -
alt.internet.search-engines - 24762 messages and counting!
##-----------------------------------------------##

Roy Schestowitz

unread,

Feb 26, 2006, 10:21:03 PM2/26/06

to

__/ [ www.1-script.com ] on Monday 27 February 2006 03:13 \__

> Roy Schestowitz wrote:
>
>> Google will no longer view our sites as textual fragments, but rather
>> render
>> the pages and interpret them in a richer context (including JS and
>> CSS).
>
> I doubt the part about rendering very much. It's the content they are
> after, not the presentation. Also, this article reads as if it was written
> couple years back. Googlebot is at version 2.1, not 2.0
>
> Also, Google's Deepbot has been signing as mozilla-compatible for couple
> years already, maybe as far back as 2003. Freshbot still signs as
> Googlebot, not Mozilla-compatible. Makes sense: it only looks for links
> and nothing else.

All are interesting observations as I don't have much confidence in the
article at hand. As regards Googlebot versions, I suppose it's possible that
a certain proportion of Google's hardware still runs
older-yet-fully-compatible software. I guess you know better though. I never
look at the raw logs and, if I ever do, I don't descend to lower levels of
granularity.

I don't believe that Google cares much about presentation either. If they
did, on the other hand, there would be much to gain -- for all ends, i.e.
the surfers and the crawlers, sometimes even the Webmasters.

Best wishes,

Roy