Cannot fetch dynamic content from the same app (URL Fetch API)

245 views
Skip to first unread message

PhilBeaudoin

unread,
Jun 21, 2010, 2:01:36 PM6/21/10
to Google App Engine
I need to use the URL Fetch API to load dynamic content (a JSP or
servlet-generated content) from the same application that uses URL
Fetch. It looks like this is not allowed on AppEngine -- and it
doesn't seem related to the documented limitation that URL Fetch
cannot get its own URL. (The URL is different, even though is has the
same domain). Another point to note is that this works if I'm using
URL Fetch to get static content (an HTML page).

* Background *

I need to do this to serve content generated with HTMLUnit to make my
app crawlable by search engines, as described in:
http://code.google.com/web/ajaxcrawling/

* AppEngine log analysis *

From my browser I request
http://puzzlebazaar.appspot.com?_escaped_fragment_=main

From the AppEngine log I see that the request for
http://puzzlebazaar.appspot.com?_escaped_fragment_=main starts, it
goes through HTMLUnit which then uses URL Fetch to get
http://puzzlebazaar.appspot.com#!main. Following this I get an
IOException :
com.philbeaudoin.gwtp.crawler.server.CrawlFilter logStackTrace:
java.util.concurrent.ExecutionException: java.io.IOException: Timeout
while fetching:
http://puzzlebazaar.appspot.com#!main

The page content is empty. This exception is caught, the servlet
continues and terminates normally.

Then the request for http://puzzlebazaar.appspot.com#!main starts.
From the timings in the AppEngine logs it's clear that this request
starts too late, that is, after the original request has terminated.

* What I tried *

1) To make sure it wasn't a problem with HTMLUnit, I used URL Fetch
directly, but got the same behavior.
3) I tried fetching dynamic content from the same domain but with more
differences in the URL. (Not just the parameters.) It fails.
2) I tried fetching a statically-served HTML page from the same
domain. This works.

* Questions *

1) Is this documented behavior from AppEngine?
2) Is there a way to work around this?
3) Are there any plans to allow this in the future?

Nick Johnson (Google)

unread,
Jun 22, 2010, 9:56:25 AM6/22/10
to google-a...@googlegroups.com
Hi Phil,

It looks like App Engine is neglecting to start up a new instance of your app just to serve your self-request, and there are no spare instances already running.

Why are you URLFetching yourself, though? I'm no Java expert, but I'm fairly certain there are accepted ways to run a servlet yourself, without having to make an HTTP request.

-Nick Johnson


--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.




--
Nick Johnson, Developer Programs Engineer, App Engine Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration Number: 368047
Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration Number: 368047

Alon Carmel

unread,
Jun 22, 2010, 10:03:17 AM6/22/10
to google-a...@googlegroups.com
if i wanted to hack this without nick's question i'd leave urlfetch and just output a javascript jquery ajax call to that url. that might work. but its a terrible hack.
-
Cheers,
public static function AlonCarmel() {
//Contact me
var email = 'a...@aloncarmel.me';
var twitter = '@aloncarmel';
var web = 'http://aloncarmel.me';
var phone = '+972-54-4860380';
}

* If you received an unsolicited email from by mistake that wasn't of your matter please delete immediately. All E-mail sent from Alon Carmel is copyrighted to Alon Carmel 2008. Any details revealed in e-mails sent by Alon Carmel are owned by the Author only. Any attempt to duplicate or imitate any of the Content is prohibited under copyright law 2008.



PhilBeaudoin

unread,
Jun 22, 2010, 3:56:23 PM6/22/10
to Google App Engine
Thanks for your comments.

HTMLUnit (a headless browser) uses URLFetch to get the content of any
URL it needs to fetch when parsing a webpage or running javascript
code. I think it would be really tricky to modify it in order to use a
different mechanism when fetching from the same app.

@Nick, is there a way to make sure a new instance starts up? Or to
check if I can get a spare instance? Does it depend on my level of
subscription? (I'm using the free quota for now.)

Philippe

On Jun 22, 7:03 am, Alon Carmel <a...@aloncarmel.me> wrote:
> if i wanted to hack this without nick's question i'd leave urlfetch and just
> output a javascript jquery ajax call to that url. that might work. but its a
> terrible hack.
> -
> Cheers,
> public static function AlonCarmel() {
> //Contact me
> var email = '...@aloncarmel.me';
> var twitter = '@aloncarmel';
> var web = 'http://aloncarmel.me';
> var phone = '+972-54-4860380';
>
> }
>
> * If you received an unsolicited email from by mistake that wasn't of your
> matter please delete immediately. All E-mail sent from Alon Carmel is
> copyrighted to Alon Carmel 2008. Any details revealed in e-mails sent by
> Alon Carmel are owned by the Author only. Any attempt to duplicate or
> imitate any of the Content is prohibited under copyright law 2008.
>
> On Mon, Jun 21, 2010 at 9:01 PM, PhilBeaudoin
> <philippe.beaud...@gmail.com>wrote:
>
>
>
> > I need to use the URL Fetch API to load dynamic content (a JSP or
> > servlet-generated content) from the same application that uses URL
> > Fetch. It looks like this is not allowed on AppEngine -- and it
> > doesn't seem related to the documented limitation that URL Fetch
> > cannot get its own URL. (The URL is different, even though is has the
> > same domain). Another point to note is that this works if I'm using
> > URL Fetch to get static content (an HTML page).
>
> > * Background *
>
> > I need to do this to serve content generated with HTMLUnit to make my
> > app crawlable by search engines, as described in:
> >http://code.google.com/web/ajaxcrawling/
>
> > * AppEngine log analysis *
>
> > From my browser I request
> >http://puzzlebazaar.appspot.com?_escaped_fragment_=main
>
> > From the AppEngine log I see that the request for
> >http://puzzlebazaar.appspot.com?_escaped_fragment_=mainstarts, it
> > google-appengi...@googlegroups.com<google-appengine%2Bunsubscrib e...@googlegroups.com>
> > .

PhilBeaudoin

unread,
Jun 28, 2010, 3:12:45 PM6/28/10
to Google App Engine
Bumping this.

Is it a good idea to put an issue up for this, as it seems deeply
rooted in AppEngine's internals?

On Jun 22, 12:56 pm, PhilBeaudoin <philippe.beaud...@gmail.com> wrote:
> Thanks for your comments.
>
> HTMLUnit (a headless browser) uses URLFetch to get the content of any
> URL it needs tofetchwhen parsing a webpage or running javascript
> > > I need to use the URLFetchAPI to load dynamic content (a JSP or
> > > servlet-generated content) from the same application that uses URL
> > >Fetch. It looks like this is not allowed on AppEngine -- and it
> > > doesn't seem related to the documented limitation that URLFetch
> > > cannot get its own URL. (The URL is different, even though is has the
> > > same domain). Another point to note is that this works if I'm using
> > > URLFetchto get static content (an HTML page).
>
> > > * Background *
>
> > > I need to do this to serve content generated with HTMLUnit to make my
> > > app crawlable by search engines, as described in:
> > >http://code.google.com/web/ajaxcrawling/
>
> > > * AppEngine log analysis *
>
> > > From my browser I request
> > >http://puzzlebazaar.appspot.com?_escaped_fragment_=main
>
> > > From the AppEngine log I see that the request for
> > >http://puzzlebazaar.appspot.com?_escaped_fragment_=mainstarts, it
> > > goes through HTMLUnit which then uses URLFetchto get

Jaroslav Záruba

unread,
Jul 3, 2010, 8:33:07 PM7/3/10
to Google App Engine
+1
I'm having the same issue. :(

My goal is to store the generated markup in blob.

Krishna

unread,
Jul 12, 2010, 1:26:24 PM7/12/10
to Google App Engine
+1

I'd love support for search engine crawling as well.

Matt H

unread,
Aug 19, 2010, 6:44:22 AM8/19/10
to Google App Engine
This really NEEDS to be supported.

PhilBeaudoin

unread,
Aug 19, 2010, 2:09:00 PM8/19/10
to Google App Engine
I've looked for an issue in the tracker for this, didn't find anything
so I created one. Please star it:
http://code.google.com/p/googleappengine/issues/detail?id=3602

Cheers,

Philippe

brucko

unread,
Oct 14, 2010, 12:06:56 AM10/14/10
to Google App Engine
Reply all
Reply to author
Forward
0 new messages