MapReduce Failures

60 views

Skip to first unread message

Ranjit Chacko

unread,

May 24, 2013, 11:20:01 AM5/24/13

to google-a...@googlegroups.com

I'm seeing shards abruptly fail in my MR jobs for no apparent reason and without retrying:

task_name=appengine-mrshard-1581047187783C3601732-14-2-retry-0 app_engine_release=1.8.0 instance=00c61b117c53a40e120ac864168a3fe51c2ce

Shard 1581047187783C3601732-14 failed permanently.

Is there some adjustment I can make to my queue parameters to avoid or reduce these issues?

Recently I had been having problems with MR jobs throwing "UnknownErrors" and "ApplicationError followed by "RetrySliceErrors", and setting the min_backoff_seconds to 1 seemed to help with reducing the retry errors.

Tom Kaitchuck

unread,

May 30, 2013, 5:35:47 PM5/30/13

to google-a...@googlegroups.com

A "RetrySliceError" will result in a retry. If it is not, it could be that you have the max reattempts on your task queue set too low. (Because Map Reduce manages retries based on it's configuration, it is safe to set this to unlimited.) Also you may want to take a look at shard retry: https://code.google.com/p/appengine-mapreduce/wiki/PythonShardRetry which is a new feature designed to make python Map Reduce more relyable.

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-appengi...@googlegroups.com.
To post to this group, send email to google-a...@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Reply all

Reply to author

Forward

0 new messages