seo and google web toolkit

23 views
Skip to first unread message

Raphael André Bauer

unread,
Sep 7, 2009, 5:04:24 AM9/7/09
to google-we...@googlegroups.com
hey everybody,


as every friendly web inhabitant i want that google knows my website
and people that are interested in my stuff can find it easily.
however, as my first experiments suggest the google bot does not even
try to analyze (execute) gwt code (a working test of my concept is at
[1]).

this can -- on the one hand -- be explained by the very nature of gwt
- it is javascript - much like an application that should not be
indexed by a search bot by nature. but -- on the other hand -- hey! it
is so simple. execute the js, see if it generates a more or less
stable DOM, parse the dom and you are done. and both is from google?
seems that the GWT hits the same indexing hell flash did.

ok. maybe i am wrong here. in my opinion it is really bad news that
GWT stuff is not at all analyzed by the google search bot.


to come up with a conclusion would involve to sacrifice a lot of GWT
coolness. mainly because i have to generate a lot of HTML myself that
can be analyzed by search robots. i also wrote about that at [2]. it
is especially interesting how [3] did "solve" the "problem".


do the experts have any recommendations?


thanks!


ra!
[1] http://scisurfer.com/news
[2] http://blog.scisurfer.com/2009/09/gwt-and-seo-concerns.html
[3] http://examples.roughian.com

Alexander Cherednichenko

unread,
Sep 8, 2009, 4:45:10 AM9/8/09
to Google Web Toolkit
I am not expert, though i have some recommendations.

1. Don't even try to generate _really_ different content for searchbot
and for the end-user. Sometimes googles sends request with 'normal'
user-agent, and if you have different content for users and bots,
you'll be banned (because of cloacking :) )
Though, there are some ways to do so,
Though, not sure /me has enough time to work with google customer
support to clear this issue (if they have customer support, have never
heard of it:) )

2. You may try create normal static application with unique URS, and
just wrap some page elements with javascript to provide rich
functionality. UIBinder is of new stuff.

3. You may try searhcing here (in this group) for such problems.
People resolve it in different ways. I've heard of interesting way -
generating html content by HTMLUnit and feeding it to the bot. Simply
- if your application sees it is a bot, it runs gwt (javascript) code
in headless browser on the serverside, and flushes momental DOM
snapshot to the response.

4. Look here - http://lexaux.blogspot.com/2009/03/afraid-of-being-banned-by-google-for.html
I was going to make a little demo some time ago, but had no time to
complete it :( sorry. Though, there may still be some interesting
facts.

On Sep 7, 12:04 pm, Raphael André Bauer

Sri

unread,
Sep 8, 2009, 7:57:20 AM9/8/09
to Google Web Toolkit
Just a small correction about UiBinder.. It is not going to help in
SEO in anyway.
UiBinder allows you to put arbitrary HTML content in an XML file --
but that is processed at compile time only. The browser never sees
that arbitrary HTML content directly. Its all encoded as javascript,
so you are back to the old problem.


On Sep 8, 4:45 am, Alexander Cherednichenko <lex...@gmail.com> wrote:
> I am not expert, though i have some recommendations.
>
> 1. Don't even try to generate _really_ different content for searchbot
> and for the end-user. Sometimes googles sends request with 'normal'
> user-agent, and if you have different content for users and bots,
> you'll be banned (because of cloacking :) )
> Though, there are some ways to do so,
> Though, not sure /me has enough time to work with google customer
> support to clear this issue (if they have customer support, have never
> heard of it:) )
>
> 2. You may try create normal static application with unique URS, and
> just wrap some page elements with javascript to provide rich
> functionality. UIBinder is of new stuff.
>
> 3. You may try searhcing here (in this group) for such problems.
> People resolve it in different ways. I've heard of interesting way -
> generating html content by HTMLUnit and feeding it to the bot. Simply
> - if your application sees it is a bot, it runs gwt (javascript) code
> in headless browser on the serverside, and flushes momental DOM
> snapshot to the response.
>
> 4. Look here -http://lexaux.blogspot.com/2009/03/afraid-of-being-banned-by-google-f...

Raphael André Bauer

unread,
Sep 8, 2009, 12:53:43 PM9/8/09
to google-we...@googlegroups.com
On Tue, Sep 8, 2009 at 10:45 AM, Alexander
Cherednichenko<lex...@gmail.com> wrote:
>
> I am not expert, though i have some recommendations.
>
> 1. Don't even try to generate _really_ different content for searchbot
> and for the end-user. Sometimes googles sends request with 'normal'
> user-agent, and if you have different content for users and bots,
> you'll be banned (because of cloacking :) )
> Though, there are some ways to do so,
> Though, not sure /me has enough time to work with google customer
> support to clear this issue (if they have customer support, have never
> heard of it:) )
>
> 2. You may try create normal static application with unique URS, and
> just wrap some page elements with javascript to provide rich
> functionality. UIBinder is of new stuff.
>
> 3. You may try searhcing here (in this group) for such problems.
> People resolve it in different ways. I've heard of interesting way -
> generating html content by HTMLUnit and feeding it to the bot. Simply
> - if your application sees it is a bot, it runs gwt (javascript) code
> in headless browser on the serverside, and flushes momental DOM
> snapshot to the response.
>
> 4. Look here - http://lexaux.blogspot.com/2009/03/afraid-of-being-banned-by-google-for.html
> I was going to make a little demo some time ago, but had no time to
> complete it :( sorry. Though, there may still be some interesting
> facts.
@all: thanks for the ideas.

htmlunit is a really cool idea. however, i am not sure if it will fail
with adsense and related concepts. so my solution is roughly like
this:
- generate regular html output
- enrich / regroup html output via gwt

so - there is no double content (i hope at least). google is happy,
because parsing my page is really simple. and the user is happy -
because he can use nice web 2.0 goodies.

ok. this breaks some things of the gwt (i was hoping that i can forget
almost everything about html and css and cross browser compatability
issues), but it works.


btw. in my opinion, google should run a 1sec htmlunit against my
webpage, not me. but - ok takes way more CPU and the result is not
predictable (what is this funny js generating? so somehow i can
understand google. however - that puts gwt applications in the same
bucket as flash applications - a bucket called "not properly indexed
by google".



thanks again,

r

Alexander Cherednichenko

unread,
Sep 9, 2009, 3:37:27 AM9/9/09
to Google Web Toolkit
I think they would never execute your code (even in javascript) on
their crawlers :)
electricity costs more.

As for htmlUnit and google adSense - no prob with that; I believe, you
could easily leave adsense blocks as externals even in HTMLUnit
output. Or find other solution which would do this.

Problem in having static content and enriching it with gwt is that
you'll need to create application for each page (if they are
different) and app creation in GWT is fairly overheaded. Also, you'll
need to have gwt app bootstrap on each page load, which is no good.

On Sep 8, 7:53 pm, Raphael André Bauer <raphael.andre.ba...@gmail.com>
wrote:
> On Tue, Sep 8, 2009 at 10:45 AM, Alexander
>
> Cherednichenko<lex...@gmail.com> wrote:
>
> > I am not expert, though i have some recommendations.
>
> > 1. Don't even try to generate _really_ different content for searchbot
> > and for the end-user. Sometimes googles sends request with 'normal'
> > user-agent, and if you have different content for users and bots,
> > you'll be banned (because of cloacking :) )
> > Though, there are some ways to do so,
> > Though, not sure /me has enough time to work with google customer
> > support to clear this issue (if they have customer support, have never
> > heard of it:) )
>
> > 2. You may try create normal static application with unique URS, and
> > just wrap some page elements with javascript to provide rich
> > functionality. UIBinder is of new stuff.
>
> > 3. You may try searhcing here (in this group) for such problems.
> > People resolve it in different ways. I've heard of interesting way -
> > generating html content by HTMLUnit and feeding it to the bot. Simply
> > - if your application sees it is a bot, it runs gwt (javascript) code
> > in headless browser on the serverside, and flushes momental DOM
> > snapshot to the response.
>
> > 4. Look here -http://lexaux.blogspot.com/2009/03/afraid-of-being-banned-by-google-f...

Ian Bambury

unread,
Sep 9, 2009, 7:44:02 AM9/9/09
to google-we...@googlegroups.com

2009/9/9 Alexander Cherednichenko <lex...@gmail.com>


Problem in having static content and enriching it with gwt is that
you'll need to create application for each page (if they are
different) and app creation in GWT is fairly overheaded. Also, you'll
need to have gwt app bootstrap on each page load, which is no good.

Not true. You can keep your content in HTML pages on the server and fetch it as needed. You then link these pages to create a non-JS site which is crawlable. If you name the pages after the history token you would use in the GWT site, then a) you can write a generic function to get the page and b) you can write a JS script to redirect JS-enabled visitors who click on a search link to the right place in the GWT app.

E.g. of you search Google for 'GWT DockPanel' my site's link is 'examples.roughian.com/Panels__DockPanel.htm' but if you click on the link, you end up at 'http://examples.roughian.com/index.htm#Panels~DockPanel'.

The initial payload for the site is about 30% of what it would be if the text were included, and I can update it without a recompile.

Alexander Cherednichenko

unread,
Sep 10, 2009, 3:45:41 AM9/10/09
to Google Web Toolkit
That's true; I was also thinking of redirect.

Also, this is good for non-js browsers. Links users would see it OK,
which is really valuable for me.

Although, does not google ban for <body onload='javascript:
widnow.location=http://newsite?aaa'/> ?

This sounds pretty much like a doorway page.
I'm really interested in how searchers treat this.

In official release what's they say about cloacking:
"
So what's an honest web designer to do? The only hard and fast rule
is to show Googlebot the exact same thing as your users. If you
don't, your site risks appearing suspicious to our search algorithms.
This simple rule covers a lot of cases including cloaking, JavaScript
redirects, hidden text, and doorway pages. And our engineers have
gathered a few more practical suggestions:
"
(taken from http://googlewebmastercentral.blogspot.com/2007/07/best-uses-of-flash.html,
though it's old)

Unclear moment for me is that what do they mean saying 'exact same
thing as your users. '. If it is same content (like compared by user)
- your method would work, as you load the SAME page as the one shown
to bot.

But textually - the contents are different. Bot may think that you're
blindly redirecting the user to strange page with the only 1
javascript file inclusion and no content at all.

Maybe, something new has happened which allows more SEO methodics and
i missed this?

Thanks for the point with redirect,
Alex.

Raphael André Bauer

unread,
Sep 10, 2009, 5:49:49 AM9/10/09
to google-we...@googlegroups.com
hmm.

seems that search engines are important and that it is important for
many gwt-enriched apps to be found by search engines.

i think it would be cool to have some guidelines "do and don'ts for
gwt to be found in search engines". maybe in a wiki and with an
official approval from (at least) the google search engine team (what
is cloaking and what not).

is there any interest from the official gwt team? any connections
between the official gwt team and the official google search engine
team?

if there is any support from the googlers i could write an article
(together with anybody that wants to participate - this thread might
be a good starting point) - or start writing a wiki page that can be
easily changed according to new developments...

just an idea..
r

Ian Bambury

unread,
Sep 10, 2009, 7:29:30 AM9/10/09
to google-we...@googlegroups.com
2009/9/10 Alexander Cherednichenko <lex...@gmail.com>


That's true; I was also thinking of redirect.

Also, this is good for non-js browsers. Links users would see it OK,
which is really valuable for me.

Although, does not google ban for <body onload='javascript:
widnow.location=
http://newsite?aaa'/> ?

Maybe - I don't use that. It would seem a bit harsh if they do. But Google is following IBM's FUD route on things like that. 

This sounds pretty much like a doorway page.
I'm really interested in how searchers treat this.

In official release what's they say about cloacking:
"
 So what's an honest web designer to do? The only hard and fast rule
is to show Googlebot the  exact same thing as your users. If you
don't, your site risks appearing suspicious to our search algorithms.
This simple rule covers a lot of cases including cloaking, JavaScript
redirects, hidden text, and doorway pages.

My site does show the non-JS Googlebot exactly what a non-JS user would see. If Google decide to develop a JS-enabled Googlebot, it will see what a JS-enabled user will see. It's not my fault if they are not technically savvy enough to do it. Not only that, JS-users get *exactly* the same content as non-JS-users (albeit with menus and demos and stuff)

If they *don't* allow me to do this, then effectively they are banning anyone who wants to be listed from using JavaScript, and how many sites does that leave them?
 
And our engineers have
gathered a few more practical suggestions:
"
(taken from
http://googlewebmastercentral.blogspot.com/2007/07/best-uses-of-flash.html,
though it's old)

Unclear moment for me is that what do they mean saying 'exact same
thing as your users. '. If it is same content (like compared by user)
- your method would work, as you load the SAME page as the one shown
to bot.

Cloaking is giving a different to a bot from the page you give to a user. They cannot expect their non-enabled bot to get the same as an enabled user or, for example, using Flash and Java applets and images of text and video and audio would earn you a ban as well.

But textually - the contents are different.

No they are not
 
Bot may think that you're
blindly redirecting the user to strange page with the only 1
javascript file inclusion and no content at all.

But I don't. My index page is used to pick up the home page text.  
 

Maybe, something new has happened which allows more SEO methodics and
i missed this?

Thanks for the point with redirect,
Alex.


From a purely practical point of view, if I can't do this and they do ban me and my site isn't listed, then how much worse off am I than if my site is not being listed because I don't have any content that the bots can see?

Less than 5% of my traffic comes from search engines anyway - about 15% is from referring sites, and over 80% is direct. It wouldn't be a great loss if I did get banned. In fact, the amount of free advertising and links I'd get from the news articles I could generate and the resulting knock-on news coverage, social network and blog activity and links ('Google Bans Site For Using Google Web Toolkit' - that would get picked up) it would probably make it a positive blessing if all I wanted was traffic.

My site has been doing this for nearly 3 years and I described the whole setup here on this forum over 2 years ago. No-one official said it *wasn't* OK.

Because Google use IBM's FUD approach, you won't see anyone from Google say that this is OK to do (AFAIK they still haven't publicly said that underscores are used as work delimiters in file names, and how uncontentious and long-running is that?) but if you are doing something that is unequivocally wrong, they usually tell you.

Watch this space.

Ian Bambury

unread,
Sep 10, 2009, 7:34:09 AM9/10/09
to google-we...@googlegroups.com
'word delimiter' not 'work delimiter' - I think I was subconciously longing for a coffee break.

Alexander Cherednichenko

unread,
Sep 10, 2009, 8:49:46 AM9/10/09
to Google Web Toolkit
nice answer. big thanks :)

On Sep 10, 2:29 pm, Ian Bambury <ianbamb...@gmail.com> wrote:
> 2009/9/10 Alexander Cherednichenko <lex...@gmail.com>
>
>
>
> > That's true; I was also thinking of redirect.
>
> > Also, this is good for non-js browsers. Links users would see it OK,
> > which is really valuable for me.
>
> > Although, does not google ban for <body onload='javascript:
> > widnow.location=http://newsite?aaa'/> ?

Ian Bambury

unread,
Sep 10, 2009, 11:10:09 AM9/10/09
to google-we...@googlegroups.com
At the end of the day, I'm not trying to con them and get them to see something different from my ordinary visitors - actually, the opposite - I'm trying to get them to see what my ordinary visitors see, rather than a blank page.

Even if they banned me, I'd still do it for accessibility reasons, the reduced initial download size, and the advantage of not having to recompile if I make a correction or addition to a page. 

Download size and ease of writing copy in HTML (rather than Java strings) was the original reason - lack of recompile was a bonus, and after that, it occurred to me I could come up with a 'ghost' site which would allow non-JS users, SEO and screen readers to access the site.

Ian

http://examples.roughian.com


2009/9/10 Alexander Cherednichenko <lex...@gmail.com>
Reply all
Reply to author
Forward
0 new messages