Loading up data in the init() method

96 views
Skip to first unread message

Syed Rizvi

unread,
Feb 19, 2015, 9:00:12 AM2/19/15
to google-a...@googlegroups.com
Hi,
I have a RESTful service that gets XML data from an external URL, does some work on it and serves it to its clients. The dataset is initialized in the init() method and takes > 1 minutes but less than 2 to be initialized and is naturally supposed to be there before the first client request comes in. This works on the local machine properly, but not when deployed to GAE.

I get the DeadlineExceededException even though I have warmup requests enabled: 
<warmup-requests-enabled>true</warmup-requests-enabled>
The URL fetch is not the culprit here since it takes average 10 seconds to complete the fetch.

What exactly am I missing? What would be the right way to do it? Here's what the log says:

javax.servlet.ServletContext log: unavailable com.google.apphosting.api.DeadlineExceededException: This request (4865075fa7d7c4db) started at 2015/02/19 13:38:24.019 UTC and was still executing at 2015/02/19 13:39:23.816 UTC.

Vinny P

unread,
Feb 19, 2015, 9:47:14 AM2/19/15
to google-a...@googlegroups.com
On Thu, Feb 19, 2015 at 8:00 AM, Syed Rizvi <amma...@gmail.com> wrote:
I have a RESTful service that gets XML data from an external URL, does some work on it and serves it to its clients. The dataset is initialized in the init() method and takes > 1 minutes but less than 2 to be initialized and is naturally supposed to be there before the first client request comes in. This works on the local machine properly, but not when deployed to GAE.
What exactly am I missing? What would be the right way to do it? Here's what the log says:

javax.servlet.ServletContext log: unavailable com.google.apphosting.api.DeadlineExceededException: This request (4865075fa7d7c4db) started at 2015/02/19 13:38:24.019 UTC and was still executing at 2015/02/19 13:39:23.816 UTC.


Can you look in your logs and see if there are any requests to the path _ah/startup? If so, can you post what their HTTP status code response is (202, 500, 404, etc)? Also, are you getting DeadlineExceededException on every instance startup (meaning that your app can't run at all on production App Engine) or only intermittently?

Just because you have warmup requests enabled doesn't necessarily mean all requests will be handed off to a warm instance. Yes, many requests will be handled by warm instances, but traffic can still be handed off to cold-started instances in certain situations. Those cold-started instances are still subject to a 60-second deadline to start up. Your log sample seems to be from a cold-start instance.

There are several ways of handling this. If you want to try solving it through raw processing power, you can increase the number of idle instances or increase the size of your instances (for example, going from an F1 to F2). Another way is to move the dataset initialization logic somewhere else; for instance you could move it out of init and into a request handler, then hit that handler with a cron request on a regular basis and cache the information in memcache/datastore/cloud storage. 
 
 
-----------------
-Vinny P
Technology & Media Consultant
Chicago, IL

App Engine Code Samples: http://www.learntogoogleit.com
 

Syed Rizvi

unread,
Feb 19, 2015, 10:11:47 AM2/19/15
to google-a...@googlegroups.com
Thanks for your response Vinny!
  1. I made a deliberate call to _ah/warmup which gives: "GET /_ah/warmup HTTP/1.1" 200 0
  2. To _ah/startup which gives "GET /_ah/startup HTTP/1.1" 404 204
This is my first GAE project and I'm not a paid user yet so upgrading isn't currently an option nor increasing idle instances since not even a single instance is up yet. Cron isn't an option currently since the dataset has to be there to fulfill even the first request that comes in (but it can be later used when, let's say, I have to update the dataset regularly from the source).

I don't think that the warmup request solution may work for me because it says here that: Currently, the deadline for requests to frontend instances is 60 seconds. (Backend instances have no corresponding limit.) Every request, including warmup (request to /_ah/warmup) and loading requests ("loading_request=1" log header), is subject to this restriction. Also see GAE issue reported here.

Do you have some suggestions? I'd be really glad to hear from you!

Vinny P

unread,
Feb 20, 2015, 2:52:44 AM2/20/15
to google-a...@googlegroups.com
On Thu, Feb 19, 2015 at 9:11 AM, Syed Rizvi <amma...@gmail.com> wrote:
This is my first GAE project and I'm not a paid user yet so upgrading isn't currently an option nor increasing idle instances since not even a single instance is up yet. Cron isn't an option currently since the dataset has to be there to fulfill even the first request that comes in (but it can be later used when, let's say, I have to update the dataset regularly from the source).

I don't think that the warmup request solution may work for me


Yes, the warmup request is still subject to the same limit; I just wanted to make sure that part of your application was working and didn't have a confounding issue.

The bottom line is that you need to move the dataset initialization logic out of init, and/or find a way to cut down execution time to under 1 minute (either speeding up the logic or optimizing it). If you don't want to pay for a larger instance to speed up the dataset processing, you'll need to modify your application logic. I would suggest (in no particular order) the following:

1. Reconfigure your application's code to separate out parts of the processing. For example, perhaps your init could urlfetch your dataset, and nothing more. When a request comes in, process only the part of the data relevant to the request. Obviously this option might not work if you can't subdivide your data easily. 

2. Profile your application's calls using AppStats: https://cloud.google.com/appengine/docs/java/tools/appstats . If your dataset initialization process involves several RPC calls, you might be able to batch calls together and save some time. For example, instead of using multiple memcache.put() calls you can try moving to AsyncMemcacheService.putAll().
 
3. You can also cut down on startup time by optimizing your entire app. If you're including many libraries in your application, you can use ProGuard to help cut down on unneeded files (assuming you're using Java).

4. There's a bunch of other performance tips at this article: https://cloud.google.com/appengine/articles/deadlineexceedederrors and a handful of others at this Groups thread https://groups.google.com/d/msg/google-appengine/sA3o-PTAckc/vsS976sdINMJ

Syed Rizvi

unread,
Mar 3, 2015, 8:06:58 AM3/3/15
to google-a...@googlegroups.com
Hi Vinny,

Sorry for the late response. I followed some suggestions of yours and have now decided to rather initialize the dataset as a cronjob since that allows more time for a job to be completed. Plus the dataset needs to be continuously updated from the source URL so I could schedule the cron for every X minutes or so.

Thanks!


On Thursday, February 19, 2015 at 3:00:12 PM UTC+1, Syed Rizvi wrote:
Reply all
Reply to author
Forward
0 new messages