On Thu, Apr 28, 2011 at 6:26 PM, Aaron <aaron.t...@gmail.com> wrote:
> Hi,
>
> I posted this question a few days ago in the general app engine group,
> but felt this would be more appropriate here.
Ahh sorry I missed that; I monitor this list much more than the other ones.
> I'm getting a TaskTooLarge Error when starting a pipeline, but I've
> logged the arguments passed in, and the size of the parameters are
> all well within the size limit. I feel like maybe the pipeline api
> stores all child tasks within some larger task, but can't find
> anything in the documentation related to his.
This is a known limitation and I'm sorry for the trouble. The task
payload limit is 10KB right now, which is what's hurting you. Everyone
would like that limit to be higher and I believe that's in the cards.
What your log tells me is your pipeline is getting scheduled just fine
(so job.start() runs okay), but when it runs and tries to schedule
child jobs it explodes. The reason is the "fanout handler" task
payload has all the keys of the child pipelines. I've described an
alternative approach in this issue which I think will solve your
problem:
http://code.google.com/p/appengine-pipeline/issues/detail?id=26
Not sure when I'll be able to get a patch out to fix this but hopefully soon.
In the meantime, the trivial solution is to split your iteration up
into two child pipelines, one issuing children 1-50, the other 51-100.
The task payload of each of those should be 13KB/2, and thus should
get scheduled fine. You'll just need to tweak your parameters and
introduce a new parent pipeline to coordinate this. The parent
pipeline can also send off the 101th pipeline that relies on the other
100.
Hope that helps!
-Brett
I had time right now, so I've checked in a fix for your issue:
http://code.google.com/p/appengine-pipeline/source/detail?r=39
Can you pull the source again and try it out? Let me know that it
fixes your issue.
-Brett
On Fri, Apr 29, 2011 at 5:38 PM, Aaron <aaron.t...@gmail.com> wrote:
> Thanks a lot for your time. Your update fixed the scenario when I'm
> only delaying a batch of ~100 tasks. However, when I am trying to
> handle larger data sets, I am still running into this error. In
> particular, the number of tasks that I need to wait for can range
> anywhere from 100-2000. I ran into the same error when I needed to
> wait for 1280 tasks before executing the last task.
>
> From the description of the fix, it seems like this isn't expected
> behavior. Is there something I'm missing?
Even with the optimization I added, you're going to hit the 10KB task
size limit. Until the task size limit is raised, you should try the
work-arounds I described in my first response to you. Essentially, I'm
suggesting you split the 1000+ child pipelines into multiple parts
(say, that yield 100 children each) and then have a parent task wait
on those 10 child tasks (thus, 10 children * 1000+ children's children
= 10,000 tasks in all).
-Brett