A couple of (probably) simple questions about using _escaped_fragment_ / ajax crawl-able.

35 views
Skip to first unread message

darkflame

unread,
Jul 10, 2010, 7:07:18 PM7/10/10
to Google Web Toolkit
a) As my server doesn't support server-side java, I'll be using php to
generate the static/snapshot pages. How close do they have to be to
the proper/GWT ones? Is it good enough if the text and links are
exactly same....but not the images/layout? I dont want to be accused
of spoofing, but replicating the layout exact will prove a lot of
work.

b) Is there an easy way to parse the new urls? I used to use just GET
in PHP to retrieve each expected key/value, but this doesnt work now
that _escaped_fragment_ has been added at the start. Is there a
recommended method? or do I just code my own parser?

Maile Ohye

unread,
Jul 13, 2010, 5:56:40 PM7/13/10
to Google Web Toolkit
Hi darkflame,

> a) As my server doesn't support server-side java, I'll be using php to
> generate the static/snapshot pages. How close do they have to be to
> the proper/GWT ones? Is it good enough if the text and links are
> exactly same....but not the images/layout?  I dont want to be accused
> ofspoofing, but replicating the layout exact will prove a lot of
> work.

Thanks for your concern about cloaking. It's most important for text,
links, and images to be the same. Layout is less important (i.e. it's
fine to have differences between versions).

So sorry, not an authority for your second question. Hopefully someone
else can help. :)

RPB

unread,
Jul 14, 2010, 4:05:06 AM7/14/10
to Google Web Toolkit
Hi darkflame,

Not sure I fully understand your second question, but you should be
able to just use $param= $_GET['_escaped_fragment_']; and then process
the code as normal.
Also, the 'Fetch with googlebot' tool in Webmaster tools is very
helpful, showing you exactly what google will actually be crawling.

Cheers,
Rob
www.yournextread.com

darkflame

unread,
Jul 17, 2010, 3:12:21 PM7/17/10
to Google Web Toolkit
Thanks for both your help I'm almost there now.
The $param= $_GET['_escaped_fragment_']; worked fine, now the rest of
my php works.

One other query, however; What should the links generated by this php
file return?
If my normal code set history to something like "#meep" (which would
now be "#!meep") should the static page have that link set to "#meep"
"#!meep" "?meep" or even "?_escape_fragment_"

I assume I could use #! that googles crawler would automatically
change to "?_escape_fragment_" but wouldn't it be better to give it
directly? or would that not associate the links correctly? Of course,
if I stuck to just using ? then it would make the site browsable for
people with JavaScript turned of too.

-Thomas

RPB

unread,
Jul 19, 2010, 4:55:23 AM7/19/10
to Google Web Toolkit
I always use the "#!meep" syntax, which when the google crawler sees
it interprets as ?_escape_fragment. I seem to recall reading in the
google documentation that this is the correct way to do it.

-Rob

Katharina Probst

unread,
Jul 19, 2010, 8:51:27 AM7/19/10
to google-we...@googlegroups.com
That's right - you'll want #!.

You don't want to use _escaped_fragment_ because that's just meant as a temporary URL between the crawler and your site, not for the user (remember that if your user would click on _escaped_fragment_, they'll get a rendered snapshot, not a functioning JavaScript page, so none of the buttons etc would be enabled).

If you use ?, you'll always have to do a full reload of the page, rather than with # or #!, in which case you can use XHRs (maybe GWT RPC in your case) to reload only part of the page, which can make for a much better user experience.

If you use #, then the crawler won't interpret is as a JavaScript URL in the scheme, which means it won't ask you for the _escaped_fragment_ version and thus won't index your content.  (all it'll see is some <script src ... tag, not very useful). You also don't want # and #! versions of the same URL with the same content floating around - unless you make sure the crawler knows they're the same, the crawler could treat them as separate, which can't be good for your search results.

kathrin





--
You received this message because you are subscribed to the Google Groups "Google Web Toolkit" group.
To post to this group, send email to google-we...@googlegroups.com.
To unsubscribe from this group, send email to google-web-tool...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-web-toolkit?hl=en.


darkflame

unread,
Jul 19, 2010, 10:50:32 AM7/19/10
to Google Web Toolkit
Thanks that clears things up a lot.
I'll change them all to #!.

I knew that if I used ? the user would need page refreshes, but this
would only have been seen if the user has javascript disabled. I
figured I could possibly use the search-engine crawlable version also
has a "javascriptless" version of the site.
I guess I can still do that..maybe if I use "#!" for the links when
"_escaped_fragment_" is in the url, and "?" if it isn't, but
javascript is disabled.
(If JavaScript is enabled, ?'s can be changed to "#s" as long as
_escaped_fragment_ isn't there).
I think that will work.

I certainly dont want any crawlers treating the site as separate, as
page-for-page the content will be the same.
> > google-web-tool...@googlegroups.com<google-web-toolkit%2Bunsubs cr...@googlegroups.com>
> > .
Reply all
Reply to author
Forward
0 new messages