problem with crawler

62 views
Skip to first unread message

ale

unread,
Jun 8, 2011, 1:30:55 PM6/8/11
to google-we...@googlegroups.com
Hello everyone,
I'm trying to make my webapp gwt visible to crawlers, following the excellent guide http://code.google.com/intl/it-IT/web/ajaxcrawling/docs/getting-started.html
I have done all that is indicated (meta tag in the head, fragment with!, servlet filter) but I have this problem:
servlet in the queryString is always null, also with the simplest url.
The only way to have a queryString not null, is to invoke a specific servlet different from the default:
I explain with to sample (from my site):

URL that don't work:
http://www.youtrail.com/#!home  
I try
http://www.youtrail.com/?_escaped_fragment_=home
but from log I see that queryString is null
(and on the browser I land on http://www.youtrail.com/?_escaped_fragment_=home#!home)


what is wrong? Why in my filter I saw always queryString null?

Thanks a lot and sorry for my bad english
Alessandro

Qiang Ma

unread,
Jun 8, 2011, 4:53:40 PM6/8/11
to google-we...@googlegroups.com
Hi,

I am scratching my head with the same issue. (Haven't decided what to do yet.)

Just want to learn what you have done.
So your servlet will route regular URL (pretty one) to your GWT app, and the escaped one to some static page? (Are you using HtmlUnit?)
I have a question on the "static page": can I change the title to be more specific to the content (e.g. some search result)? But would that be considered bad because user would not see it in the real application?

Wish you can figure it all out!

-maq

--
You received this message because you are subscribed to the Google Groups "Google Web Toolkit" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-web-toolkit/-/SW1MY1JiaUpXdk1K.
To post to this group, send email to google-we...@googlegroups.com.
To unsubscribe from this group, send email to google-web-tool...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-web-toolkit?hl=en.

ale

unread,
Jun 9, 2011, 8:49:22 AM6/9/11
to google-we...@googlegroups.com
Hi,
yes I use HtmlUnit, but at te moment I don't know if you can change the title, because I stopped before for the other problem...


>But would that be considered bad because user would not see it in the real application?
My personal opinion: yes... when i search some word and in the site found i didn't found that world, I get angry...but often I blame the search engine...

When I'll solved my problem I will post the solution...
Alessandro

Qiang Ma

unread,
Jun 9, 2011, 11:50:08 PM6/9/11
to google-we...@googlegroups.com
I am trying to do the same set up and "successfully" reproduced the problem you have.
It is obvious it is working fine if the URL doesn't point to the domain name directly:
                  http://<domain_name>/?_escaped_fragment_=XXX    doesn't work
                  http://<domain_name>/<warfile_name>/?_escaped_fragment_=XXX    works fine

Keep working :)

--
You received this message because you are subscribed to the Google Groups "Google Web Toolkit" group.

ale

unread,
Jun 10, 2011, 6:53:40 AM6/10/11
to google-we...@googlegroups.com
Thank you maq!
I didn't tought about the war name in the url....
so I try to add to my url the war name and:
1) if I use the crawler url

h*tp://www.youtrail.com/youtrail/?_escaped_fragment_=trail&entityId=579101

in my servlet I see the queryString not null (now there is an other problem, but this is an other story, I simply forget  to include the httpclient-4.1.1.jar... this evening I deploy a new version)

2) if I use the normal url  with the war name:

http://www.youtrail.com/youtrail#trail&entityId=579101

I'll recive a  FORBIDDEN error...

Is this strange? (it is strange for me ok... but it is really strange?)

For now I think I put some hidden links dedicated to crawlers, but I don't know if it is  a good idea...


keep working too... thanks again!

Ale




Qiang Ma

unread,
Jun 12, 2011, 6:43:46 PM6/12/11
to google-we...@googlegroups.com
Hi, Ale,

How is your progress? I got further on the topic...

Now if I deploy the war file and manually copy the app.war file under ROOT file (I had to change the path to the app.nocache.js ). The filter seems to take the query string just fine.  (However, I don't remember any changes in the servlet code, it must be some configuration change from last time.)
For some reason, if I ran HtmlUnit offline, it can snapshot the content, but it doesn't work inside the servlet. So I had to ran it offline and save the contents into files. In the filter servlet I just read out from the file.

You can check this:
http://goscopia.com/?_escaped_fragment_=
http://goscopia.com/

http://goscopia.com/?_escaped_fragment_=info.about
http://goscopia.com/#!info.about


Right now I only have a few pages to crawl, So I plan to update the SiteMap links file so Google crawler will crawl them individually.

QUESTION: what is the correct or better way than update sitemap, if the application can generate a lot of pages , how to make these pages known to the crawler?

Any suggestion is appreciated.

-maq


--
You received this message because you are subscribed to the Google Groups "Google Web Toolkit" group.

ale

unread,
Jun 13, 2011, 9:14:03 AM6/13/11
to google-we...@googlegroups.com
Hi maq,
my progress are very slow...
I did a different thing, to solve the forbidden error when I rewrite the url, I change the servlet  setting the requestURI empty instead to get it from the reqesut

final String requestURI = ""; // before was req.getRequestURI();

and in the home page I add some hidden link to the url with the name of the war
(<a style="display: none" href="/youtrail#!searchTrail">search trail</a>)
but I 'm not sure that it works...

for the site map, I create an hidden block with all the History Item.

    private void createSiteMap() {
        SafeHtmlBuilder sb = new SafeHtmlBuilder();
        for (String token : HistoryConstants.getAll()) {
            sb.append(SafeHtmlUtils.fromTrustedString("<a href=\"#!" + token + "\">" + token + "</a>"));
        }

        // Add the site map to the page.
        HTML siteMap = new HTML(sb.toSafeHtml());
        siteMap.setVisible(false);
        RootPanel.get().add(siteMap, 0, 0);
    }

I copy this from some example of gwt (I don't remeber which)...


I like your site, the marker and the color are very beautiful!

Ale


Reply all
Reply to author
Forward
0 new messages