Chained jobs skip steps

20 views
Skip to first unread message

Mike Busch

unread,
Dec 31, 2015, 2:40:10 PM12/31/15
to Disco-development

My project is setup like so (I'll be happy to cut/paste real code if it would help):


myfile.py

from disco.core import Job, result_iterator
import collections, sys
from disco.worker.classic.func import chain_reader
from disco.worker.classic.worker import Params

def helper1():
   #do stuff

def helper2():
   #do stuff
.
.
.
def helperN():
   #do stuff

class A(Job):
   @staticmethod
   def map_reader(fd, params):
      #Read input file
      yield line

   def map(self, line, params):
      #Process lines into dictionary
      #Iterate dictionary
          yield k, v

   def reduce(self, iter, out, params):
      print ("A - REDUCE")
      #iterate iter
      #Process k, v into dictionary, aggregating values
      #Process dictionry
      #Iterate dictionary
         out.add(k,v)

Class B(Job):

   map_reader = staticmethod(chain_reader)
   map = staticmethod(nop_map)

   reduce(self, iter, out, params):
      print("B - REDUCE")
      #Process iter
      #iterate results
         out.add(k,v)


if __name__ == '__main__':
   from myfile import A, B
   job1 = A().run(input=[input_filename], params=Params(k=k))
   job2 = B().run(input=[job1.wait()], params=Params(k=k))
   with open(output_filename, 'w') as fp:
        for count, line in result_iterator(job2.wait(show=True)):
            fp.write(str(count) + ',' + line + '\n')

My problem is the job flow completely skips A's reduce (no print in output) and goes down to B's reduce even though the web dashboard shows a reduce for A.


Any ideas what is going on here?

Mike Busch

unread,
Jan 2, 2016, 3:24:51 PM1/2/16
to Disco-development
This was an easy but subtle problem: I didn't have a 

show = True


for job1. For some reason, with show set for job2, it was showing me the map() and map-shuffle() steps from job1 so since I wasn't getting the final result I was expecting and input to one of the job2 functions looks wrong, I jumped to the conclusion that job1 steps weren't run properly (this is further supported that before I added job2 I verified accuracy of job1's output).
Reply all
Reply to author
Forward
0 new messages