Execution Ordering - Newbie Question

53 views
Skip to first unread message

Bardia D.

unread,
Oct 6, 2014, 12:26:48 PM10/6/14
to app-engine-...@googlegroups.com
I have a series of pipelines and sub-pipelines, and I wanted to guarantee that the last item in the flow only gets executed after the first ones are done.  Something like:
  • PipelineA
    • Pipeline B, with pipeline.After(future_from_pipeline_a)
    • Pipeline C, with pipeline.After(future_from_pipeline_a_and_b)
      • Pipeline D (spawned from within PipelineC multiple times in a for loop)
  • Pipeline E (want to run this after A-D are in a done or finalizing state)
I'm noticing that Pipeline D (above) sometimes runs in parallel to E, even though we use pipeline.After and pipeline.InOrder statements.  All of our pipelines are generator pipelines (yield) but I'm thinking of converting them to just return (synchronous) since the order of execution is critical here and we don't actually use the future values for the next pipeline (each pipeline and sub-pipeline is responsible for fetching it's own state and data to run).

Happy to attach code if needed for further clarification, didn't want to create a large post my first time.  Thank you in advance!

Jeffrey Tratner

unread,
Oct 9, 2014, 10:48:18 AM10/9/14
to app-engine-...@googlegroups.com
are you using pipeline.After(c_future) when spawning pipeline E?

Bardia D.

unread,
Oct 9, 2014, 11:25:54 AM10/9/14
to app-engine-...@googlegroups.com
Yes sir. Here's the top-level code (below).  I spawn sub-pipelines within some of these.

a_future = yield PipelineA(token, force)
with pipeline.After(a_future):
    b_future
= yield PipelineB(token, force)
   
with pipeline.After(b_future):
        c_future
= yield PipelineC(token, force)
       
with pipeline.After(c_future):
            d_future
= yield PipelineD(token, force)
           
with pipeline.After(d_future):
               
yield PipelineE(token, force)

Tom Kaitchuck

unread,
Oct 17, 2014, 5:37:46 PM10/17/14
to app-engine-...@googlegroups.com
If the result of C is not dependent on D, then C may be finished and E may start even though D is still running. If the result of C depends on D then E will not be able to start until it is complete.

--
You received this message because you are subscribed to the Google Groups "Google App Engine Pipeline API" group.
To unsubscribe from this group and stop receiving emails from it, send an email to app-engine-pipeli...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Bardia D.

unread,
Oct 19, 2014, 12:36:47 PM10/19/14
to app-engine-...@googlegroups.com
Thanks Tom.  I guess I misunderstood the with.After() wrapper.  Within D, I have a yield common.List(*all_results_from_d) but I guess I still don't know how to enforce that E only runs after A-D have finalized.  Can you please provide a code sample?

Tom Kaitchuck

unread,
Oct 20, 2014, 2:05:44 PM10/20/14
to app-engine-...@googlegroups.com
I am a bit confused. In your earlier posts you indicated that D was spawned from within C and E depends on C. In which case if there is no dependency on D for C to complete they will run in parallel. But later you pasted code that looks like a simple sequence, in which case the order is equivalent to using InOrder, which will have all the steps run sequentially. 

Either way: if you do something like this:

class WordCountUrl(pipeline.Pipeline):
 
def run(self, url):
    r
= urlfetch.fetch(url)
   
return len(r.data.split())

class Sum(pipeline.Pipeline):
 
def run(self, *values):
   
return sum(values)

class MySearchEnginePipeline(pipeline.Pipeline):
 
def run(self, *urls):
    results
= []
   
for u in urls:
      results
.append( (yield WordCountUrl(u)) )
   
yield Sum(*results) # Barrier waits
(Copied from the wiki)
Here the sum depends on the results for WordCountUrl so it must run afterwards. While the WordCountUrl calls do not depend on one another, so they will execute in parallel.

A simplified way to think about it is that if there is no data dependency between two things they will run in parallel. (This is why after and inorder were created, to allow for an ordering dependency where there is no data dependency)

Reply all
Reply to author
Forward
0 new messages