Making GWT application crawlable by a search engine.

138 views
Skip to first unread message

PhilBeaudoin

unread,
Mar 12, 2010, 4:13:19 AM3/12/10
to Google Web Toolkit
I want to use the #! token to make my GWT application crawlable, as
described here:
http://code.google.com/web/ajaxcrawling/

The GWT showcase app available online uses this, for example:
http://gwt.google.com/samples/Showcase/Showcase.html#!CwRadioButton
Will serve the following static webpage to the googlebot:
http://gwt.google.com/samples/Showcase/Showcase.html?_escaped_fragment_=CwRadioButton

I want my GWT app to do something similar. In short, I'd like to serve
a different flavor of the page whenever the `_escaped_fragment_`
parameter is found in the URL.

What should I modify in order for the server to serve something else
(a static page, or a page dynamically generated through a headless
browser like HTML Unit)? I'm guessing it could be the `web.xml` file,
but I'm not sure.

Note: I thought of checking the source of the Showcase app provided
with the GWT SDK, but unfortunately this version doesn't seem to
support serving static files on `_escaped_fragment_` and it doesn't
use the #! token...

Thomas Broyer

unread,
Mar 12, 2010, 9:27:35 AM3/12/10
to Google Web Toolkit

On Mar 12, 10:13 am, PhilBeaudoin <philippe.beaud...@gmail.com> wrote:
> I want to use the #! token to make my GWT application crawlable, as
> described here:http://code.google.com/web/ajaxcrawling/
>
> The GWT showcase app available online uses this, for example:http://gwt.google.com/samples/Showcase/Showcase.html#!CwRadioButton

> Will serve the following static webpage to the googlebot:http://gwt.google.com/samples/Showcase/Showcase.html?_escaped_fragmen...


>
> I want my GWT app to do something similar. In short, I'd like to serve
> a different flavor of the page whenever the `_escaped_fragment_`
> parameter is found in the URL.
>
> What should I modify in order for the server to serve something else
> (a static page, or a page dynamically generated through a headless
> browser like HTML Unit)? I'm guessing it could be the `web.xml` file,
> but I'm not sure.
>
> Note: I thought of checking the source of the Showcase app provided
> with the GWT SDK, but unfortunately this version doesn't seem to
> support serving static files on `_escaped_fragment_` and it doesn't
> use the #! token...

There's work underway to make it "just work": you'd use a
CrawlableHyperlink instead of Hyperlink, and on the server-side it'd
use HtmlUnit as a "browser simulator" to "run your GWT app" just as if
a "true" browser would have loaded it and serialize the resulting DOM
into HTML.
http://code.google.com/p/google-web-toolkit/source/browse/branches/crawlability/

It hasn't been updated for a while, though there's a pending review to
add the CrawlableHyperlink widget and update the Showcase sample to
use it: http://groups.google.com/group/google-web-toolkit-contributors/t/88d4983324d328c5

For the server-side part, I think you'd have to either serve your HTML
host page from a servlet or JSP so you can change the output depending
on the presence and value of the _escaped_fragment_ query-string
parameter, or maybe using a <filter/> in your web.xml

Chris Lercher

unread,
Mar 12, 2010, 10:36:41 AM3/12/10
to Google Web Toolkit
Hi,

please see my answer to your question on stackoverflow.com:
http://stackoverflow.com/questions/2430244/making-gwt-application-crawlable-by-a-search-engine/2432953

Chris


On Mar 12, 10:13 am, PhilBeaudoin <philippe.beaud...@gmail.com> wrote:

> I want to use the #! token to make my GWT application crawlable, as
> described here:http://code.google.com/web/ajaxcrawling/
>
> The GWT showcase app available online uses this, for example:http://gwt.google.com/samples/Showcase/Showcase.html#!CwRadioButton

> Will serve the following static webpage to the googlebot:http://gwt.google.com/samples/Showcase/Showcase.html?_escaped_fragmen...

PhilBeaudoin

unread,
Mar 12, 2010, 11:54:15 AM3/12/10
to Google Web Toolkit
Thanks Chris. I'll continue the discussion over there, if needed.

On Mar 12, 7:36 am, Chris Lercher <cl_for_mail...@gmx.net> wrote:
> Hi,
>

> please see my answer to your question on stackoverflow.com:http://stackoverflow.com/questions/2430244/making-gwt-application-cra...

PhilBeaudoin

unread,
Mar 12, 2010, 11:58:26 AM3/12/10
to Google Web Toolkit
It seems that the CrawlableHyperlink will not make it into GWT, to
quote Kathrin from that code review:

"after a lengthy discussion with Joel, we decided to get rid of the
CrawlableHyperlink widget. The issue is that it doesn't add enough
useful functionality, because the app writer still needs to handle
the
"!" when actually "navigating" the app to a history state. For this
reason, we will recommend that people do this process manually, which
is
the same amount of work."

Still, the Showcase sample in that branch might be exactly what I
needed. I'll take a look and post back as to whether or not it solved
my problem. Thanks a lot!

On Mar 12, 6:27 am, Thomas Broyer <t.bro...@gmail.com> wrote:
> On Mar 12, 10:13 am, PhilBeaudoin <philippe.beaud...@gmail.com> wrote:
>
> > I want to use the #! token to make my GWT application crawlable, as
> > described here:http://code.google.com/web/ajaxcrawling/
>
> > The GWT showcase app available online uses this, for example:http://gwt.google.com/samples/Showcase/Showcase.html#!CwRadioButton
> > Will serve the following static webpage to the googlebot:http://gwt.google.com/samples/Showcase/Showcase.html?_escaped_fragmen...
>
> > I want my GWT app to do something similar. In short, I'd like to serve
> > a different flavor of the page whenever the `_escaped_fragment_`
> > parameter is found in the URL.
>
> > What should I modify in order for the server to serve something else
> > (a static page, or a page dynamically generated through a headless
> > browser like HTML Unit)? I'm guessing it could be the `web.xml` file,
> > but I'm not sure.
>
> > Note: I thought of checking the source of the Showcase app provided
> > with the GWT SDK, but unfortunately this version doesn't seem to
> > support serving static files on `_escaped_fragment_` and it doesn't
> > use the #! token...
>
> There's work underway to make it "just work": you'd use a
> CrawlableHyperlink instead of Hyperlink, and on the server-side it'd
> use HtmlUnit as a "browser simulator" to "run your GWT app" just as if
> a "true" browser would have loaded it and serialize the resulting DOM

> into HTML.http://code.google.com/p/google-web-toolkit/source/browse/branches/cr...


>
> It hasn't been updated for a while, though there's a pending review to
> add the CrawlableHyperlink widget and update the Showcase sample to

> use it:http://groups.google.com/group/google-web-toolkit-contributors/t/88d4...

PhilBeaudoin

unread,
Mar 12, 2010, 3:07:50 PM3/12/10
to Google Web Toolkit
It almost work... The only problem left is that the development mode
will serve the default html file right away if it is present, so the
filters defined in web.xml will not be called. This happens even with
the showcase application that you linked.

Is there any way to force the web.xml to go through the filters, even
if the requested .html file is there?


On Mar 12, 6:27 am, Thomas Broyer <t.bro...@gmail.com> wrote:

> On Mar 12, 10:13 am, PhilBeaudoin <philippe.beaud...@gmail.com> wrote:
>
>
>
>
>
> > I want to use the #! token to make my GWT application crawlable, as
> > described here:http://code.google.com/web/ajaxcrawling/
>
> > The GWT showcase app available online uses this, for example:http://gwt.google.com/samples/Showcase/Showcase.html#!CwRadioButton
> > Will serve the following static webpage to the googlebot:http://gwt.google.com/samples/Showcase/Showcase.html?_escaped_fragmen...
>
> > I want my GWT app to do something similar. In short, I'd like to serve
> > a different flavor of the page whenever the `_escaped_fragment_`
> > parameter is found in the URL.
>
> > What should I modify in order for the server to serve something else
> > (a static page, or a page dynamically generated through a headless
> > browser like HTML Unit)? I'm guessing it could be the `web.xml` file,
> > but I'm not sure.
>
> > Note: I thought of checking the source of the Showcase app provided
> > with the GWT SDK, but unfortunately this version doesn't seem to
> > support serving static files on `_escaped_fragment_` and it doesn't
> > use the #! token...
>
> There's work underway to make it "just work": you'd use a
> CrawlableHyperlink instead of Hyperlink, and on the server-side it'd
> use HtmlUnit as a "browser simulator" to "run your GWT app" just as if
> a "true" browser would have loaded it and serialize the resulting DOM

> into HTML.http://code.google.com/p/google-web-toolkit/source/browse/branches/cr...


>
> It hasn't been updated for a while, though there's a pending review to
> add the CrawlableHyperlink widget and update the Showcase sample to

> use it:http://groups.google.com/group/google-web-toolkit-contributors/t/88d4...

PhilBeaudoin

unread,
Mar 12, 2010, 5:51:55 PM3/12/10
to Google Web Toolkit
Great! Everything seems to work! (Although there seems to be some bugs
in that updated Showcase samples. Should I report them somehow?)

My only problem now... I can't run HTML Unit on App Engine, which is
where I host my app. :(

Fortunately, they seem to be working on it:
https://sourceforge.net/tracker/index.php?func=detail&aid=2962074&group_id=47038&atid=448269#

Thanks all for your help!

Reply all
Reply to author
Forward
0 new messages