How to run a long task until it ends ?

137 views
Skip to first unread message

nilsl...@gmail.com

unread,
Jun 23, 2015, 2:31:59 PM6/23/15
to google-a...@googlegroups.com
Hi everybody,
I have a long task to run under my App Engine application with a lot of computing. But after a while running the task (~2 minutes), it stops and just retries. Obviously, my task restarts from the beginning and cannot end.
I would like to configure App Engine to run the task until it is finished. I've heard about a 10 minutes limit for the tasks. I guess it should be enough if it didn't retries everytime.

My source code is (hopefully) freely available here : https://github.com/gilbsgilbs/freemobilenetstat-gae . The two tasks under src/main/java/org/pixmob/freemobile/netstat/gae/web/task are problematic (since it computes a lot of data). They are pushed the the queue from a cron under src/main/java/org/pixmob/freemobile/netstat/gae/web/cron/CronServlet.java.

Note that I'm really new to App Engine and that my code might not be optimal. I'm not sure of the exact issue and maybe it is just a memory overflow (which I wouldn't known how to fix).

Thank you !
G.

Chad Vincent

unread,
Jun 23, 2015, 7:00:49 PM6/23/15
to google-a...@googlegroups.com
What are you getting in your logs?  Any uncaught exceptions?

You might want to add some debug logging statements and lower the logging level for a while to see what is happening.  You do get 10 minutes for queries from the TaskQueue, so it isn't a timeout issue.

nilsl...@gmail.com

unread,
Jun 24, 2015, 2:48:07 AM6/24/15
to google-a...@googlegroups.com
Thank you for answering me.

Here is the only log I'm getting :
/task/update-known-devices 500 151600ms 0kb AppEngine-Google; (+http://code.google.com/appengine) module=default version=dev
  1. 0.1.0.2 - - [23/Jun/2015:23:23:07 -0700] "POST /task/update-known-devices HTTP/1.1" 500 0 "http://freemobile-netstat.appspot.com/cron/update-known-devices" "AppEngine-Google; (+http://code.google.com/appengine)" "freemobile-netstat.appspot.com" ms=151600 cpu_ms=3530 queue_name=update-queue task_name=81805458203132447311 exit_code=202 app_engine_release=1.9.22 trace_id=b86eb2915714ad73aae5f54bfbef746d instance=00c61b117c5b3b509e154cd4930e3af3083a8a68
  2. E2015-06-24 08:23:07.492
    A problem was encountered with the process that handled this request, causing it to exit. This is likely to cause a new process to be used for the next request to your application. (Error code 202)
This log appears for each retry (around 3 times per task).

G.

Chad Vincent

unread,
Jun 24, 2015, 4:30:38 PM6/24/15
to google-a...@googlegroups.com
Then I would definitely add more logging so you can see what is and is not getting done before the instance bails.

nilsl...@gmail.com

unread,
Jun 24, 2015, 4:54:19 PM6/24/15
to google-a...@googlegroups.com
Thank you Chad Vincent for your answer.

I added some logging, but it didn't gave me more relevant information. Everything is running fine until the "unexplainable" failure. Therefore, I guess that it is related to a memory issue. It retrieves data and caches until the memory is full and then crashes.

I spent some time tonight reading the Objectify documentation. I came across this: https://code.google.com/p/objectify-appengine/wiki/BasicOperations#The_Session_Cache . It could really be related to my issue. Unfortunately calling ofy().clear() sometimes did not solve anything. Moreover, I read that assigning ObjectifyService.ofy() was a bad idea. But requesting ofy() directly each time I needed it did not help neither.

Any other idea or suggestion is welcome. Thank you :) .
G.

Nick (Cloud Platform Support)

unread,
Jun 25, 2015, 7:08:31 PM6/25/15
to google-a...@googlegroups.com, nilsl...@gmail.com
As mentioned, this can sometimes occur if you begin exhausting instance memory. You could try to do one of the following actions to alleviate such a problem:
  • If your task processing contains a loop, on each iteration (or every N iterations, or even using time-based logic if you'd like), you can test the amount of memory usage and back off on processing / shift a portion of procession to another instance if you're getting too close to the limit. You could also use (for manual scaling instances), a background-thread to monitor this.
  • Break the task into smaller portions and distribute them among instances which can handle the load
  • Determine whether it truly is memory exhausting causing the instances to spin down by combining your own profiling with the information available from the Developers Console logs view.
  • Inspect your code to determine where possible "painful" situations can occur for memory, where you're perhaps loading the entire contents of a large file into memory.
With these methods, you'll be able to either narrow down how to fix your memory issue, or determine that the issue isn't memory exhaustion. 

I'm curious as to what you mean by "
calling ofy().clear() sometimes did not solve anything". Does this mean that it did seem to, at least somewhat, alleviate the problem?

Also, yes you should be using the static import ofy() method as the docs describe, not holding onto any objects of type Objectify.

Best wishes,

Nick

husayt

unread,
Jun 28, 2015, 5:53:45 AM6/28/15
to google-a...@googlegroups.com
Since you mentioned ofy, here are some useful tips which helped us to solve similar issues:

1) if you are making lot's of queries and don't need to cache, best is to disable cache for query, with :
ofy().cache(false).
2) use iterator to process big data sets in chunks.

Here is sample code, which works very robustly for us. It will process chunks for 8 minutes and will pass the rest to be run on next task. It uses deferred tasks.

chunks defined 'limit' variable, entity type by 'modelCalss', cursor by 'startCursor' which is null to start with. There are some other variables you can figure out yourself.
Hope this helps.

 try {
            do {
                Query<T> query = ofy().cache(false).load()
                        .type(this.modelClass)
                        .limit(this.limit)
                        .chunkAll();

                if (this.startCursor != null) {
                    query = query.startAt(this.startCursor);
                } 
                QueryResultIterator<T> iterator = null;

                try {
                    iterator = query.iterator();
                    if (!iterator.hasNext()) {
                        stillNotFinished = false;
                    }

                    while (iterator.hasNext()) {
                        T entity = iterator.next();
                       process(entity);
                       
                        this.countSoFar++;

                    }


                }  catch (Exception dxe) {
                    log.error("Exception error should stop from {}", this.countSoFar, dxe);
                    isException = true;
                } finally {
                  
                    ofy().flush();
                    if (iterator != null) {
                        this.startCursor = iterator.getCursor();
                        ofy().clear();
                    }
                }
   log.info("Processed so far {} items. ", this.countSoFar);

            } while (stillNotFinished && sw.elapsed(TimeUnit.MINUTES) < 8);
        } finally {
            sw.stop();
            ModelProcessor.log.info("Finished reading {} items in total for {}", this.countSoFar, sw.toString());
            if (stillNotFinished) {
                if (isException || !this.spawnNewProcessOnDisruption) {
                    log.warn("Has Stopped on item {}", this.countSoFar);
                } else {
                    log.info("Rerunning from {}", this.countSoFar);
                    TaskController.addToTaskQueue(this);

nilsl...@gmail.com

unread,
Jun 30, 2015, 3:45:13 PM6/30/15
to google-a...@googlegroups.com
Hi everybody and thank you for your help.

I finally managed to workaround my issues. I implemented a similar solution to the one described in the Objectify documentation : https://code.google.com/p/objectify-appengine/wiki/Queries#Cursor_Example. It's also really close to the solution suggested by husayt, except that it enqueues a new task to prevent the 10 minutes timeout. I have to compute only 300 entities in a task, and then, enqueue a new task (saving the state in query parameters or in the blobstore).

This works, but I still can't explain why it fails after ~3 minutes. I'd really like to be able to increase the amount of entities computed in one pass. Disabling global cache (ofy().cache(false)) and clearing the session cache (ofy().clear()) didn't help. Moreover, I couldn't notice any peak in my dashbord which could have explained the failure. Still a mistery for me :( .

Thank you.
G.
Reply all
Reply to author
Forward
0 new messages