I'm seeing shards abruptly fail in my MR jobs for no apparent reason and without retrying:
task_name=appengine-mrshard-1581047187783C3601732-14-2-retry-0 app_engine_release=1.8.0 instance=00c61b117c53a40e120ac864168a3fe51c2ce
Shard 1581047187783C3601732-14 failed permanently.
Is there some adjustment I can make to my queue parameters to avoid or reduce these issues?
Recently I had been having problems with MR jobs throwing "UnknownErrors" and "ApplicationError followed by "RetrySliceErrors", and setting the min_backoff_seconds to 1 seemed to help with reducing the retry errors.