Hi all!
Recently we've faced with a problem when our batch jobs get canceled with INTERNAL_ERROR. Part of operations may be successfully executed, part may fail due some understandable reasons and part just gets lost, we get no information for them in response at all.
We follow all recommendations from Best Practices (operations are grouped by type, batch job is being uploaded with portions of 1000 operations and so on).
Examples of such batch jobs are:
3562299563
3557826257
3553022861
Please take a look at this problem and explain what went actually wrong and why? How to prevent such errors in future?
Thanks!