I have a Mapreduce job that is obviously run using the pipeline API. I have a number of tasks in my queue that have been retried 20 to 40+ times. When you examine the task under the "Previous Run" tab, the HTTP response is always 429 and the reason is "App Error". Can anyone help me track down what is really going on here? I don't see any exceptions thrown in my logs, so I don't think it's a real error, but rather some quota/threshold I'm hitting.
The other weird thing is that it is taking 25s to 35s to process this task, yet when I look in the logs it is saying only 5,000ms to execute the request that is returning with a 429.
How can I get better debug information to know what quota/threshold I'm hitting? What knob should I turn to make this run faster... ie, not have to be retried that many times?
Thanks!
Bill-