backend termination question

1,285 views
Skip to first unread message

Rishi Arora

unread,
Apr 29, 2012, 9:16:54 AM4/29/12
to google-a...@googlegroups.com
Does anyone know what this means:
"Process terminated because the backend took too long to shutdown"

My backend is woken up every 90 minutes by a cron job and typically takes 30 minutes to complete.  Although sometimes this time can go up to 60 minutes.  The error above has occurred multiple times - sometimes 10-15 minutes into the processing.  Any ideas?

jon

unread,
Aug 7, 2012, 5:11:04 AM8/7/12
to google-a...@googlegroups.com
Can Google please respond to this? We also got  Process terminated because the backend took too long to shutdown.  on the 17 min mark.

Takashi Matsuo

unread,
Aug 9, 2012, 1:17:41 AM8/9/12
to google-a...@googlegroups.com
Hi everyone,

Have you guys read through this part?
https://developers.google.com/appengine/docs/python/backends/overview#Shutdown

App Engine sometimes needs to stop Backend instances for various
reasons other than your manual intervention. You can write a shutdown
hook to do something for your halfway job within the 30 seconds
window.

-- Takashi

On Tue, Aug 7, 2012 at 6:11 PM, jon <jonni...@gmail.com> wrote:
> Can Google please respond to this? We also got Process terminated because
> the backend took too long to shutdown. on the 17 min mark.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/google-appengine/-/bTJQ9OLKuxIJ.
>
> To post to this group, send email to google-a...@googlegroups.com.
> To unsubscribe from this group, send email to
> google-appengi...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/google-appengine?hl=en.



--
Takashi Matsuo | Developer Advocate | tma...@google.com

jon

unread,
Aug 9, 2012, 8:44:08 PM8/9/12
to google-a...@googlegroups.com
Thanks for the reply Takashi.


App Engine sometimes needs to stop Backend instances for various
reasons other than your manual intervention.

May I ask if this is the definite cause for "Process terminated because the backend took too long to shutdown." , or is it something that you thought might have explained it?

I'm asking to determine if this is something that I can fix and forget, or if it's something that can happen every now and then despite handling the shutdown case properly?
 
You can write a shutdown
hook to do something for your halfway job within the 30 seconds
window.

We completely forgot about the shutdown case. Will handle it properly.

Amy Unruh

unread,
Sep 10, 2012, 7:41:11 PM9/10/12
to google-a...@googlegroups.com
Can those on this thread let me know if you've seen *unusually many* of these "Process terminated because the backend took too long to shutdown" incidents recently?
If so, can you let me know your app id and runtime?

As Takashi noted, it is normal for this to happen sometimes (and the shutdown hook is the way to handle it). 

 -Amy

To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/MPFGbkGMsl8J.

c h

unread,
Oct 3, 2012, 7:28:49 PM10/3/12
to google-a...@googlegroups.com, amyu+...@google.com
i am trying to run a process that analyzes many rows of data - i'm querying 50 rows (using cursors), doing work, then writing data to a blobstore file.  i can't get it to run for more than a few minutes on the backend before it is shutdown - as far as i know i am not hitting any memory limits, and we have a crap-ton of budget available.

i'd prefer to share my appid privately....let me know if you would like it.

christian

c h

unread,
Oct 3, 2012, 7:29:25 PM10/3/12
to google-a...@googlegroups.com, amyu+...@google.com
oh....python2.7, single threaded.  using B1 instances, tried both resident and dynamic instances.

Lucas

unread,
Sep 23, 2013, 9:03:57 AM9/23/13
to google-a...@googlegroups.com, amyu+...@google.com, how...@umich.edu

Hi @amy,
   i also faced similar. I tested with

1. taskqueue sending request to backend, terminated within 15min
2. taskqueue sending request to backend, and spawn a thread  try run task,  also get terminated


i will send you my appId personally in email

wi...@smallwhere.com

unread,
Nov 7, 2013, 12:06:36 PM11/7/13
to google-a...@googlegroups.com, amyu+...@google.com, how...@umich.edu
I'm faced with this currently. I get roughly 50% of my jobs killed with this error on one of my modules (converted from a backend). I'm using the python27 runtime and can provide my app IDs.

I believe it's hanging when I'm calling AdWords apis (I have tried breaking these calls up a bit, waiting a little between batches, not caching the urlfetch) but I lose some of the logging when I get this error and can't seem to figure out if there are local variables eating up all the memory or if I need to continue to break up the api calls. I used to have a large variable in memory that I've moved to a database. I've spent the better part of the last few weeks trying to fix this.

Is there any way to help me diagnose? Perhaps confirm I'm getting shut down due to memory issues (and perhaps which are the key offenders)? I know it could potentially be python specific. If these are just scheduled/non memory related shut downs then there is no problem - I'm happy to setup a handler to check and rerun the script. I just didn't want to do that if there was an existing problem with my application.

Thanks for any help.

Will

Vinny P

unread,
Nov 7, 2013, 5:57:07 PM11/7/13
to google-a...@googlegroups.com
On Thu, Nov 7, 2013 at 11:06 AM, <wi...@smallwhere.com> wrote:
I believe it's hanging when I'm calling AdWords apis (I have tried breaking these calls up a bit, waiting a little between batches, not caching the urlfetch) but I lose some of the logging when I get this error and can't seem to figure out if there are local variables eating up all the memory or if I need to continue to break up the api calls.Is there any way to help me diagnose?


To inspect the urlfetch calls to Adwords, you can install AppStats ( https://developers.google.com/appengine/docs/python/tools/appstats ). If those calls are hanging, you'll see it in the performance report.

As for the memory issues, try increasing your application's instance class; i.e. if you're currently using a B1 sized module, try going up to B2 for a day and see if it fixes the problem. Of course, this is just a temporary band-aid until the memory-hog can be found.
 
 
-----------------
-Vinny P
Technology & Media Advisor
Chicago, IL

App Engine Code Samples: http://www.learntogoogleit.com

wi...@smallwhere.com

unread,
Nov 8, 2013, 11:34:34 AM11/8/13
to google-a...@googlegroups.com
Thanks for the response. I'm running a B8 and I've run AppStats on these call in the past. I'll run it again and post the log to see if there's anything interesting there.

Thanks for the help!

Will

wi...@smallwhere.com

unread,
Nov 8, 2013, 6:06:15 PM11/8/13
to google-a...@googlegroups.com
OK - here's what I see. The script ran two out of two times successfully with AppStats (of course). There are many, many rdbc and AdWords API calls. However, the longest real time spent was on a single AdWords API call for 60 seconds. Here's some of the readout:
service.call #RPCs real time
rdbms.Exec 1004 78364ms
urlfetch.Fetch 721 266851ms
rdbms.CloseConnection 497 7027ms
rdbms.OpenConnection 497 7599ms
rdbms.ExecOp 491 48623ms
logservice.Flush 10 41ms
memcache.Set 1 6ms
memcache.Get 1 7ms

And here was the log info from AppStats on the long AdWords API call:
@576702ms urlfetch.Fetch real=59837ms api=0ms cost=0 billed_ops=[]
Request: URLFetchRequest<Method=2, Url='https://adwords.google.com/api/a.../v201306/AdGroupCriterionService', ...>
Response: URLFetchResponse<>
 
On a side note, it was a little interesting getting AppStats to work on my module. At first I had a 404 error when attempting to access the admin panel. My working code included:
  1. A custom handler for the appstats ui  in my (backend/ non-default) module .yaml that included a URL that matched the regex pattern sending URLs to my module in the dispatch file.
  2. Renaming my webapplication to "app" from "application". I had named my backend WSGI application "application" and noticed I wasn't picking up the appstats when running.
  3. Adding the appengine_config per documentation.
  4. The first time I successfully ran the script with appstats I got an "Invalid or stale record" error when trying to view the timeline. I made no changes, waited about an hour, then reran the script and it worked...
Will

Vinny P

unread,
Nov 8, 2013, 11:34:03 PM11/8/13
to google-a...@googlegroups.com
On Fri, Nov 8, 2013 at 5:06 PM, <wi...@smallwhere.com> wrote:
OK - here's what I see. The script ran two out of two times successfully with AppStats (of course). There are many, many rdbc and AdWords API calls. However, the longest real time spent was on a single AdWords API call for 60 seconds.


If you repeat this AdWords API call from your local machine, does it take the same amount of time or less?


On Fri, Nov 8, 2013 at 5:06 PM, <wi...@smallwhere.com> wrote:
  1. The first time I successfully ran the script with appstats I got an "Invalid or stale record" error when trying to view the timeline. I made no changes, waited about an hour, then reran the script and it worked...


Yes, AppStats occasionally needs some time to initialize. Feel free to leave the AppStats recorder on - it doesn't take up much memory and it might catch some interesting info sooner or later.

wi...@smallwhere.com

unread,
Nov 12, 2013, 5:14:27 PM11/12/13
to google-a...@googlegroups.com
Sorry for the temporary radio silence. I was trying to get an AppStats log for my local machine to compare. Absent that, qualitatively from running this many times I'd say the times are pretty close between local and GAE hosted (even with pretty substantial debug logging locally). The GAE stats came it at 37ms on average and that's got to be about right.

I'm going to attempt to:
  1. Break this script down into smaller components and run them sequentially. The hope here is that the garbage collector helps with any monster memory variables.
  2. Lower my urlfetch default deadline and explicitly handle url deadline exceeded errors within my adwords batches. The theory here is that GAE is killing jobs that consistently exceed some arbitrary urlfetch deadline independent of what is being set by an individual application. Not necessarily a bad thing.
  3. Move these pieces from a module to a Google Compute Engine instance.
While it's not ideal working without an explicitly identified issue, I've got to do what I can to get this consistently running. I'm still 50% at best.

Thanks again for all your help.

Will

wi...@smallwhere.com

unread,
Nov 19, 2013, 6:40:13 PM11/19/13
to google-a...@googlegroups.com
OK! I did not solve this issue for GAE. However, I moved the code from this module to a GCE instance and it now runs successfully (3 for 3 so far). I did the following:

1. Copied all code (including dependencies) to Google Cloud Storage bucket.
2. Created shell bash script to be run as a startup script by GCE instance that included downloading all required libraries and the code from the bucket. It also copies the syslog to another bucket upon complete. Put this in its own bucket on GS. Side note: the adspygoogle API code had a lot of logging at the info level if you make a lot of calls. I forked a copy and modified the log level to debug.
example library download (used default debian-7-wheezy-v20130926 image): 
apt-get -y install python-mysqldb
example copy from GS bucket via metadata. The cs-bucket field is just the string value of the name of your bucket. since the gsutil copied the files into a folder with the name of the bucket I had to then copy the files into my working directory in order to access it while running my main code. I very much like working with the preinstalled gsutil - would love to see a similar utility for Cloud SQL that does more than administration (e.g. can query the databases).
DEST_DIR=$(pwd)
CS_BUCKET=$(curl http://metadata/computeMetadata/v1beta1/instance/attributes/cs-bucket)
gsutil cp -R gs://$CS_BUCKET $DEST_DIR
cp -r $CS_BUCKET/. $DEST_DIR
3. Reserved an external IP on Compute. I need the external IP to support Cloud SQL access. I authorized this IP explicity per instructions on Cloud SQL here https://developers.google.com/cloud-sql/docs/external. 
4. Modified the module code on GAE to just start a GCE instance. Useful information found here https://github.com/GoogleCloudPlatform/compute-getting-started-python. I add metadata for the GS bucket locations (startup script and code) and I added the external IP (natIP) to the instance and I toned down some of the logging (the startup script takes a while to run, so I increased the buffer time between attempts to shut down the instance).
If using the gce file from the information above, then adding an external IP looks something like this, where external_ip is the string of the address:
instance['networkInterfaces'] = [{
'accessConfigs': [{'type': 'ONE_TO_ONE_NAT', 'name': 'External NAT', 'natIP': external_ip}],
'network': '%s/global/networks/%s' % (self.project_url, network)}]
 
Wish I knew more about what caused this to fail on GAE. It may have been the volume of calls to either the database or the AdWords API, but my guess would be some variable(s) hogging up more and more memory as the script ran.

Thanks again for the help.

Will
Reply all
Reply to author
Forward
0 new messages