SEO issues with GWT webapps - how have people address this issue?

290 views
Skip to first unread message

joster

unread,
Jul 3, 2007, 12:44:07 PM7/3/07
to Google Web Toolkit
Hello All-

I searched the user-forum and looked at several posts on SEO (search
engine optimization) issue with GWT apps. Essentially GWT based
applications are more like web "apps" not the tradition web "sites" -
which are much more friendlier for search engine crawlers and
spiders.

Now, GWT is very productive for creating dynamic web-applications. My
questions:
* I am wondering how have people address the SEO issue with GWT apps?
* I have seen many create very powerful web-apps with GWT, how do you
provide access to SEO crawlers?

Joster

Reinier Zwitserloot

unread,
Jul 3, 2007, 5:27:23 PM7/3/07
to Google Web Toolkit
Create a view-only version of your webapp built on vanilla HTML+CSS,
no javascript, no GWT. This has two advantages:

1. You got the SEO angle covered (as long as you serve up the view-
only version if the User-Agent contains 'bot' of some sort), and
2. You got non-compatible browsers mostly covered. Make your target
REALLY SMALL displays because the browsers that will be getting your
view-only mode are mobile phones and stuff like netscape v1.0 on a
640x480 screen, or w3m/links/lynx. Even if a mobile phone could
somehow run your AJAX app (e.g., iPhone), you can't design an
interface to be well designed both for 400x300 and 1280x1024 screens,
it just can't be done.

As far as I know (but I don't work for google and I'm not a lawyer),
showing a differently built/styled version of the exact same content
is not a violation of google's search spidering rules. I believe an
analytics account can help you in that it informs of you of any
spidering problems.

It's unfortunate that this isn't exactly easy.

joster

unread,
Jul 3, 2007, 6:44:49 PM7/3/07
to Google Web Toolkit
Hello Reinier-

This is an interesting concept. A follow-up question: how do I go
about creating view-only version of my web-app? And how do I make the
view-only version to serve 'bot' user-agents?

Please advice.

Joster

Reinier Zwitserloot

unread,
Jul 4, 2007, 2:48:28 AM7/4/07
to Google Web Toolkit
Basically, you use some combination of User-Agent parsing, adding the
hook that GWT has for javascript-supporting browsers that nevertheless
aren't GWT compatible (early versions of opera, IE 5.5, a number of
mobile phone browsers but not all of them), and use of the <noscript>
tag, to differentiate between versions.

e.g: The hook redirects to the view only version using
document.location =.

in the <noscript> tag, you add a link to the view-only version as a
failsafe, and finally:

When serving up the MyProject.html file, use a servlet (or PHP script,
or whatever you like) to inspect the User-Agent string, and depending
on it, serve up the GWT-activating MyProject.html file, or the view
only version.

The User-Agent parser should only redirect to 'view-only' mode for
obvious User-Agents, like anything containing 'BOT' in any
capitalization, links, lynx, w3m.

The read-only version is just something you cook up completely outside
of GWT. Use whatever people have been using to build websites before
AJAX and JS stuff became popular. This isn't as hard as it sounds, as
you only need to generate the data with a few links to other read-only
pages, no edit capability. Servlets, JSPs, PHP scripts, whatever
strikes your fancy.


mumuri

unread,
Jul 4, 2007, 3:06:15 AM7/4/07
to Google Web Toolkit
no this trick is forbidden, and you will have problem later.
you can be blacklist by google.

http://google.about.com/od/searchengineoptimization/tp/badseo.htm

joster

unread,
Jul 4, 2007, 3:27:16 AM7/4/07
to Google Web Toolkit
Thanks. What other techniques have GWT users employed to enable SEO
friendly web-apps/web-sites?

Any official word from the GWT developers? In the current age where
market is flooded with ad-driven web-apps/web-sites, I am sure this
aspect of GWT is very important.

Joster

Ian Bambury

unread,
Jul 4, 2007, 3:36:52 AM7/4/07
to Google-We...@googlegroups.com
Yep, they banned BMW for doing this. They may not get around to you, but it is against the rules.
 
My solution is to hold a lot of the text in hidden divs in the main html index page. You can organise it so it formats well on text-only, non-JavaScript browsers and is therefore also available for web crawlers. It also means that you can change text without a recompile, and it is accessability-friendly. I find it easier to write in plain text rather than in strings in Java. Internationalisation works, and you can palm off writing the content to people who aren't Java programmers.
 
Ian

 

mumuri

unread,
Jul 4, 2007, 5:51:25 AM7/4/07
to Google Web Toolkit
i don't think it's a good idea (event if bot doesn't read css)

an other issue is to develop your site in a static way (no ajax), and
to wrap exiting div in widgets (set gwt widgets library) in order to
enhance them with gwt.

an other things, do not use ajax to display the content on the man
page or to do the navigation, use ajax only to modify or update data.

you can test your code with utils like "spider simulator" to see what
will be seen by bots.

On Jul 4, 9:36 am, "Ian Bambury" <ianbamb...@gmail.com> wrote:
> Yep, they banned BMW for doing this. They may not get around to you, but it
> is against the rules.
>
> My solution is to hold a lot of the text in hidden divs in the main html
> index page. You can organise it so it formats well on text-only,
> non-JavaScript browsers and is therefore also available for web crawlers. It
> also means that you can change text without a recompile, and it is
> accessability-friendly. I find it easier to write in plain text rather than
> in strings in Java. Internationalisation works, and you can palm off writing
> the content to people who aren't Java programmers.
>
> Ian
>

Sanjiv Jivan

unread,
Jul 4, 2007, 6:44:01 AM7/4/07
to Google Web Toolkit
As Reiner stated there is a way to support two versions of the site,
static and dynamic, to make your site SEO friendly without Google
considering it as cloaking. The guideline is that the content served
by both sites should be the same. The problem is that there aren't any
clear guidelines on what makes the Google search engine recognize this
versus cloaking, which is slightly different, both in terms of intent
and content. So while it is absolutely possible to build a site with
static / dynamic versions, unless you know that you're doing and have
the time and resources to make sure that the Google search engine is
indexing your site as desired you might run the risk of being
blacklisted if you don't follow all the rules as Google sees it.

Those articles from about.com are pretty superficial (most of them
are) and I would suggest you look for more detailed articles on the
subject.

One of the best articles that I've read on the subject is this
whitepaper from backbase :
http://www.backbase.com/download/articles/DesigningRIAsForSearchEngineAccessibility.pdf
which goes over the concept and implementation details in some depth.

This author seems to this that GWT's hashed fragments in the URL for
history management is the solution for SEO but I think not.
http://seoblog.intrapromote.com/2006/05/seo_considerati.html

Sanjiv

Reinier Zwitserloot

unread,
Jul 4, 2007, 8:39:21 AM7/4/07
to Google Web Toolkit
mumuri, stop naysaying. The one solution you did offer (wrap all
elements in GWT) is practically unworkable due to how GWT works - you
don't get full widgets out of 're-wrapping' like that and you
completely cannot use any more involved widget.

The about.com article is hardly canon. As long as your INTENT is in
the right place, and the content is exactly the same, there should be
no problem.

joster

unread,
Jul 4, 2007, 2:01:42 PM7/4/07
to Google Web Toolkit
Thanks everyone for valuable feedback and pointers.

A follow-up question: has someone already implemented such a strategy
for GWT app and has been successful in getting their site/content
indexed by Google search engine?

Joster

Ian Bambury

unread,
Jul 5, 2007, 3:31:28 AM7/5/07
to Google-We...@googlegroups.com
If you do a search for
 
"ian bambury" trek 2008
 
you'll find a site that has done this and is indexed by Google and others

 
--
Ian
http://roughian.com

joster

unread,
Jul 6, 2007, 9:31:41 PM7/6/07
to Google Web Toolkit
Thanks Ian. As I understand there are 3 primary strategies for making
SPIs search engine accessible:
1) Lightweight indexing
2) Extra link strategy
3) Secondary site strategy

Which ones of these did you adopt, could you please share your
learning, any BKMs?

Joster

On Jul 5, 12:31 am, "Ian Bambury" <ianbamb...@gmail.com> wrote:
> If you do a search for
>
> "ian bambury" trek 2008
>
> you'll find a site that has done this and is indexed by Google and others
>

Sanjiv Jivan

unread,
Jul 7, 2007, 9:09:41 AM7/7/07
to Google Web Toolkit

Ian Bambury

unread,
Jul 7, 2007, 10:24:07 AM7/7/07
to Google-We...@googlegroups.com
I didn't use any of those. Like I said in my previous post, I just stuck the text in a hidden div. I've said a bit more about it in a thread "Static text/HTML [Was: HTML templating vs Java Coding]"

joster

unread,
Jul 7, 2007, 4:29:17 PM7/7/07
to Google Web Toolkit
Thanks Sanjiv and Ian.

I am most interestedin the "Secondary Site strategy" as it most suits
our application needs for SEO.

Has anyone successfully implemented the "Secondary Site Strategy" for
making Single Page Interface (SPI) search engine accessible?
Is it possible to come-up with a generic library which can be
applicable for all/any GWT apps to help with this issue? I am sure
this would benefit the entire GWT community?
By any chance, is this something in the road-map of future GWT
releases?

Joster


On Jul 7, 7:24 am, "Ian Bambury" <ianbamb...@gmail.com> wrote:
> I didn't use any of those. Like I said in my previous post, I just stuck the

> text in a hidden div. I've said a bit more about it in a thread "*Static
> text/HTML [Was: HTML templating vs Java Coding]"*

> --
> Ianhttp://roughian.com

Ian Bambury

unread,
Jul 24, 2007, 4:43:44 PM7/24/07
to Google-We...@googlegroups.com
Hi Joster,
 
A followup... 
 
I put up my replacement examples site 9 days ago, Google has found most of the new pages now, and thought you might be interested in some checks I've just made.
 
The new site gets text from static html pages and puts it in the GWT app simply because a lot of text would slow down the initial load and I don't want casual visitors to give up :-)
 
These other HTML pages are browsable by text-only, non-JavaScript browsers, and screen-readers. If you go there with a JavaScript-enabled browser you get redirected to the correct page in the GWT app. These pages are linked to from the main index page.
 
Anyway, the reason for writing is that you asked at one point "has someone already implemented such a strategy for GWT app and has been successful in getting their site/contentindexed by Google search engine?"
 
These are the rankings for http://examples.roughian.com if you search for "gwt examples [widgetname]" (no quotes)
 
Ian
 
 
Search Text                         Ranking
gwt examples FocusPanel                1
gwt examples HTMLPanel                 1
gwt examples VerticalPanel             1
gwt examples FlowPanel                 1
gwt examples DisclosurePanel           1
gwt examples DeckPanel                 1

gwt examples StackPanel                2
gwt examples DockPanel                 2
gwt examples CellPanel                 2
gwt examples Grid                      2

gwt examples AbsolutePanel             3
gwt examples CheckBox                  3

gwt examples FlexTable                 5
gwt examples Composite                 5
gwt examples ScrollPanel               5
gwt examples DialogBox                 5
gwt examples Hidden                    5

gwt examples FileUpload                6
gwt examples Frame                     6
gwt examples HorizontalPanel           6

gwt examples FormPanel                 7
gwt examples TabPanel                  7

gwt examples Button                   10
gwt examples HTML                     14


 


On 07/07/07, joster <joste...@gmail.com> wrote:

joster

unread,
Jul 24, 2007, 8:11:59 PM7/24/07
to Google Web Toolkit
Hello Ian-

Thanks for follow-up. This looks very interesting and indeed great
work!

Do you detect that the user-agent is search-bots and direct them text/
html pages? And if not search-bots, you direct them to your main GWT
app? Sorry, I was unable to exactly understand your approach, could
you please elaborate.

I am very interested in how you implemented this as well.

Joster


On Jul 24, 1:43 pm, "Ian Bambury" <ianbamb...@gmail.com> wrote:
> Hi Joster,
>
> A followup...
>
> I put up my replacement examples site 9 days ago, Google has found most of
> the new pages now, and thought you might be interested in some checks I've
> just made.
>
> The new site gets text from static html pages and puts it in the GWT app
> simply because a lot of text would slow down the initial load and I don't
> want casual visitors to give up :-)
>
> These other HTML pages are browsable by text-only, non-JavaScript browsers,
> and screen-readers. If you go there with a JavaScript-enabled browser you
> get redirected to the correct page in the GWT app. These pages are linked to
> from the main index page.
>
> Anyway, the reason for writing is that you asked at one point "has someone
> already implemented such a strategy for GWT app and has been successful in
> getting their site/contentindexed by Google search engine?"
>

> These are the rankings forhttp://examples.roughian.comif you search


> for "gwt examples [widgetname]" (no quotes)
>
> Ian
>

> *Search Text* *Ranking*


> gwt examples FocusPanel 1
> gwt examples HTMLPanel 1
> gwt examples VerticalPanel 1
> gwt examples FlowPanel 1
> gwt examples DisclosurePanel 1
> gwt examples DeckPanel 1
>
> gwt examples StackPanel 2
> gwt examples DockPanel 2
> gwt examples CellPanel 2
> gwt examples Grid 2
>
> gwt examples AbsolutePanel 3
> gwt examples CheckBox 3
>
> gwt examples FlexTable 5
> gwt examples Composite 5
> gwt examples ScrollPanel 5
> gwt examples DialogBox 5
> gwt examples Hidden 5
>
> gwt examples FileUpload 6
> gwt examples Frame 6
> gwt examples HorizontalPanel 6
>
> gwt examples FormPanel 7
> gwt examples TabPanel 7
>
> gwt examples Button 10
> gwt examples HTML 14
>

> --
> Ianhttp://examples.roughian.com

Ian Bambury

unread,
Jul 24, 2007, 9:54:42 PM7/24/07
to Google-We...@googlegroups.com
Hi Joster,
 
As simply as I can, it goes like this.
 
==========
 
There's the GWT app index page as per usual except that there is a hidden div with a link in it.
 
JS browsers get directed to the GWT app in the normal way
 
Non-JS browsers see the link <a href="http://examples.roughian.com/Main_Home.htm">Text Version Here</a>
 
so real people get the GWT app and bots will follow the link to Main_Home.htm
 
==========
 
In Main_Home.htm, there is a hidden div full of links (more of which later) and another div (id="page") which contains HTML content
 
The non-JS browsers and the bots see this page 'as is' - links to other pages and the content in the 'page' div
 
The GWT app makes an RPC for Main_Home.htm, extracts the contents of the 'page' div and puts it in an HTMLPanel and puts that panel on the screen
 
So the bots see the page just as an ordinary web page, people with JS-enabled browsers get *exactly* the same content, but in a lazy-loaded widget (really lazy-loaded from the server, as opposed to lazy instantiated on the client).
 
==========
 
All the other pages on the site are created the same way - the GWT app gets the HTML source page, extracts the content and puts it on the screen. Non-JS things can follow a great heap of links in Main_Home.htm to all the other HTML source pages
 
==========
 
Two other points:
 
1) The HTML in the id='page' divs of all the HTML pages will have identified slot divs for GWT widgets if required. For example, the pages with demos in have a <div id="demo"></div> and the GWT app puts the demo widget in the slot (you don't need unique ids between pages, just within the page)
 
2) The search engines will follow the links in Main_Home.htm and index the pages on the content, and you will find that it is this page (the HTML source page) that is displayed in the listings. In the normal way, anyone clicking on the link in the Google listings would get the non-GWT page that the bot sees. But...
 
All the names match up, so the class that shows, say, the DeckPanel description is called ClientUI_DeckPanel.java. The base class for pages looks for ClientUI_DeckPanel.htm to get the HTML do display, and the history token is "ClientUI_DeckPanel", so in the HTML source files there is a redirect JS script which will send anyone with a JS-enabled browser going to ClientUI_DeckPanel.htm to http://www.examples.roughian.com/#ClientUI_DeckPanel
 
==========
 
The effect, if you have managed to follow me this far (or even if you haven't), is that
  • Non-JS things see the underlying set of web pages - i.e. the content without the GWT widgets
  • Bots collect the underlying set of web pages
  • JS-enabled things going to http://www.examples.roughian.com get the GWT site and see the conten PLUS any GWT widgets
  • JS-enabled things going to one of the underlying set of web pages (from, say a search engine) get the right page of the GWT site
It probably sounds terribly complicated, but in practice once the framework is in place
  • You write an HTML page with the non-GWT HTML content and slots for GWT widgets (if any)
  • You add a link to it in the home page
  • You create a Java class to add any GWT widgets you want to the slots
  • You stick an entry in the GWT menu
You only need to speed things up if you have casual visitors
It is only any good for speeding up loading, if you have great swathes of STATIC text
It's only any good for search engines if the content is relevant
 
But also, on the plus side,
  • you don't have problems keeping two sets of pages in sync since they are one and the same
  • you could add content for non-JS visitors which isn't displayed to GWT visitors to make the SEO better
  • you are not detecting bots and serving different pages so you are in the clear with the rules

If you actually read this far and have any questions, let me know

Ian

Dennis

unread,
Aug 19, 2007, 11:56:00 PM8/19/07
to Google Web Toolkit
Hi Ian,

That is very ingenious way to solve this issue. I've been keeping my
large static blocks of HTML in hidden divs on the home page and
displaying them (in a different div) as needed. Your idea makes much
more sense though as it it is easier to maintain and organize, and it
solves the SEO issue. Would you be willing to share your redirect.js
script. I'm sure for a javascript programmer it would be easy to
create i, but I'm not a javascript programmer.

Thanks.

Dennis

> ClientUI_DeckPanel.htm tohttp://www.examples.roughian.com/#ClientUI_DeckPanel


>
> ==========
>
> The effect, if you have managed to follow me this far (or even if you
> haven't), is that
>

> - Non-JS things see the underlying set of web pages - i.e. the content
> without the GWT widgets
> - Bots collect the underlying set of web pages
> - JS-enabled things going tohttp://www.examples.roughian.comget the


> GWT site and see the conten PLUS any GWT widgets

> - JS-enabled things going to one of the underlying set of web pages


> (from, say a search engine) get the right page of the GWT site
>
> It probably sounds terribly complicated, but in practice once the framework
> is in place
>

> - You write an HTML page with the non-GWT HTML content and slots for
> GWT widgets (if any)
> - You add a link to it in the home page
> - You create a Java class to add any GWT widgets you want to the slots
> - You stick an entry in the GWT menu


>
> You only need to speed things up if you have casual visitors
> It is only any good for speeding up loading, if you have great swathes of
> STATIC text
> It's only any good for search engines if the content is relevant
>
> But also, on the plus side,
>

> - you don't have problems keeping two sets of pages in sync since they


> are one and the same

> - you could add content for non-JS visitors which isn't displayed to


> GWT visitors to make the SEO better

> - you are not detecting bots and serving different pages so you are in


> the clear with the rules
>
> If you actually read this far and have any questions, let me know
> Ian
>

Ian Bambury

unread,
Aug 20, 2007, 1:54:53 PM8/20/07
to Google-We...@googlegroups.com
Hi Dennis,
 
Yep, no problem.

===========================================================================================
Each of the 'ghost' HTML pages looks something like this
===========================================================================================


    <body>
        <script language="javascript" src="bin/redirect.js"></script>
        <!-- ############################ LINKS ####################################### -->
        <div id="links" style="text-align:left">
            <a href="Main_Home.en.htm">- Up -</a>
           
            <!-- Anything just for non-JavaScript browsers goes here -->
           
        </div>
        <!-- ############################# PAGE TEXT HERE ############################# -->
        <div id="page">
       
         <!-- Static HTML goes here -->
       
            <br />
            <h2>AbsolutePanel</h2>
        </div>
    </body>
   
       
===========================================================================================
The call to get the page. This is done on my sites with an abstract base class which extends VerticalPanel.
buildPage() is required in each real page if you need to add widgets  to div id's.
urchinTracker(getHistoryToken()) is a JSNI call for Google Analytics
This routine also extracts the title from the HTML page and sets the GWT app's title to be the same
===========================================================================================

 String url;
 public void create()
 {
  url = Utilities.downloadURL(this);
  try
  {
   clear();
   add(panel = new HTMLPanel(msgs.msgContactingServer()));
   RequestBuilder builder = new RequestBuilder( RequestBuilder.GET, url);
   builder.setHeader("Content-Type", "application/x-www-form-urlencoded");
   builder.sendRequest(null, callback);
  }
  catch (RequestException e)
  {
   Window.alert ("Fetch of '" + url + "' - Failed with" + e.getMessage());
  }
 }
 HTMLPanel panel;
 RequestCallback callback = new RequestCallback()
 {
  public void onError(Request request, Throwable e)
  {
   Window.alert("Error: " + e.getMessage());
  }
  public void onResponseReceived(Request request, Response response)
  {
   String titleArray[] = response.getText().split("<title>|</title>");
   if (titleArray.length < 2)
   {
    clear();
    add(panel = new HTMLPanel(msgs.msgCannotContactServer()));
    setPageNeedsCreating(true);
    return;
   }
   title = titleArray[1];
   Window.setTitle (title);
   HTML bodyhtml = new HTML(response.getText());
   Element bodycontents = bodyhtml.getElement();
   Element elem = DOM.getFirstChild(bodycontents);
   while (!DOM.getElementProperty(elem, "id").equals("page"))
    elem = DOM.getNextSibling(elem);
   clear();
   add(panel = new HTMLPanel(elem.toString()));
   buildPage();
   add(new HTML(msgs.pageFooter()));
   urchinTracker(getHistoryToken());
  }
 };
 
===========================================================================================
 Utilities.downloadURL() looks like this
===========================================================================================


 public static String downloadURL(Object o)
 {
  String url = getBaseURL();
  String[] classpath = GWT.getTypeName(o).split("[.]");
  url += classpath[classpath.length - 1] + "." + msgs.infoLocale() + ".htm";
  return url;
 }


===========================================================================================
 getBaseURL() looks like this (it's just a way of choosing a particular web server
 on my local machine so I can have a number of project all working at the same time)
===========================================================================================


 public final static String getBaseURL()
 {
  String url = GWT.getModuleBaseURL();
  String bits[] = url.split("/");
  if (bits[2].equals("localhost:8888")) url = " http://localhost:1080";
  return url + "/";
 }
 
 
===========================================================================================
 redirect.js looks like this
===========================================================================================

var filebits = location.pathname.split("/");
var filename = filebits[filebits.length - 1];
filebits = filename.split(".");
var token = filebits[0];
var locale = "_" + filebits[1];
if(locale = "_en")locale = "";
var hostname = location.host;
var url = "http://" + hostname + "/index" + locale + ".htm" + location.search + "#" + token;
if(location.search != '?text') location.replace(url);


===========================================================================================

I simplified what I do slightly - Main_Home.htm is really Main_Home.en.htm so I can add
translations of the text at some point, and all messages are held as
com.google.gwt.i18n.client.Messages, so you can strip that bit out if you don't need it,
also the last line means you can append ?text to the page's url and you won't get redirected
This is useful when developing (in Windows and IE, anyway) because you have to restart
hosted mode before changes to the page will appear - with ?text, you can see it in IE
straight away without getting redirected


HTH, if you need any more info or explanation, drop me a line

Ian



--------------------------------------------

Dennis

unread,
Aug 20, 2007, 5:30:29 PM8/20/07
to Google Web Toolkit
Thanks, Ian. I'll let you know how it goes.

Dennis

> RequestBuilder builder = new RequestBuilder(RequestBuilder.GET, url);


> builder.setHeader("Content-Type", "application/x-www-form-urlencoded");
> builder.sendRequest(null, callback);
> }
> catch (RequestException e)
> {

> Window.alert("Fetch of '" + url + "' - Failed with" + e.getMessage());

> ...
>
> read more »

Reply all
Reply to author
Forward
0 new messages