sorry for hijacking this thread but since yesterday morning, we also started to have significant troubles with batch job service. Jobs remained active for a very long time. The following jobs have been started yesterday morning, german time (see start time is noted for each job) and were still active this morning:
Job 445516765 | 2017-02-14T03:34
Job 445553000 | 2017-02-14T03:20
Job 445532407 | 2017-02-14T03:58
Job 445532515 | 2017-02-14T03:58
Job 445693959 | 2017-02-14T03:58
Job 445694061 | 2017-02-14T03:59
Job 445694595 | 2017-02-14T03:59
We've canceled the jobs this morning (after they are still reported to be active, how can that be?), but now they are in canceling state for about an hour (at the time of this writing). Furthermore, the situation got worse this morning, with 30 batch jobs being active for a couple of hours now. The ids of these jobs are attached to the end of this post. We will cancel these jobs as well because we do not expect them to finish, based on yesterday's experience.
As an additional information, the problems seem to affect accounts with campaigns with a large to huge number or product partitions / keywords. The problematic jobs were "spreaded" over three adwords accounts yesterday and three accounts today. Two of the accounts affected yesterday are also affected today. We do use these batch jobs to set cpc bids for keywords and product partitions and use a maximum of 40.000 operations per job. We experimented with this number in the past and found that to be a "suitable" balance between performance and reliability of batch jobs. All jobs contain only one type of operation. While multiple jobs might affect the same campaign / adgroups, they do not affect the same keywords / product partitions. I was told on the past API workshop that this is ok and will not result in concurrency issues. In addition to concrete information on the issue, we would be happy to know if it makes sense to "play" with the limit of 40.000 operations per job again.
As a general note, batch job continously tend to keep us busy since we switched to them last year. The runtime performance seems to be rather unpredictable. Especially on test accounts, batch jobs can run for a very long time (up to 30 minutes) which is painful if you want to run automated tests. Combined with the unclear advise on the number of operations to put in a batch job and best practices in general, that leaves us with a very uncertain feeling. Given the reliabilty issues we experienced in the past half year, we're currently thinking about ditching batch jobs entirely and implement batching on our site which would give us at least more control. We understand that the intend of batch jobs is like "fire and forget" (yeah, not really forget but check results etc.), we offload operations to AdWords and it takes care of it. Currently its more like "fire and pray". While we do not want to bring back the past, but we never had such troubles with mutate job service. I apologize for this bit of ranting, but it's constantly keeping us busy for a while now.
Finally, here are the job ids of the still-active (soon to be canceled) jobs from this morning.
445954525
445954540
445989185
446102085
446102088
445952002
445986527
445986596
445986662
445986857
445987025
445987106
445987562
445987577
445987715
446107815
446107821
446108097
446108442
446108472
446108553
446108598
446108796
446108910
446108958
45948708
445983389
445983824
446105442
446105460
Thanks for taking care of this and regards,
Stefan