passing data between tasks

1,622 views
Skip to first unread message

Imran Akbar

unread,
Jan 28, 2014, 8:01:05 PM1/28/14
to luigi...@googlegroups.com
Hi,
   I'm trying to pass some data between two tasks, from the child up to the parent (which requires the child task).  Is there any way to set the input of one task to be the output of the one it requires?  I tried setting that up, but got a bunch of errors.  I then tried to manually open up the file that the child outputs to, but I got this error indicating that self.output().open() method only accepts read or write as modes, not binary (which I needed to pickle some data):

File "kickoff_workflow.py", line 135, in run
    f = self.output().open('wb')
  File "/Users/imran/Code/InfoScout/luigi/luigi/file.py", line 102, in open
    raise Exception('mode must be r/w')
Exception: mode must be r/w

thanks again,
imran

Erik Bernhardsson

unread,
Jan 28, 2014, 9:49:54 PM1/28/14
to Imran Akbar, luigi...@googlegroups.com
Imran – using the output is exactly the way Luigi is supposed to be used. So in the parent you should write to self.output().open('w') and in the child you should require() the parent and you should read from self.input().open('r')

afaik using 'b' as a flag when reading files doesn't have any meaning on Linux – it does however on Windows. We do some pickling across tasks and it works well without the 'b' flag. If you are using Windows then that's probably something we have to add support for in Luigi. Happy to accept a pull request for it – it should be pretty easy


--
You received this message because you are subscribed to the Google Groups "Luigi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to luigi-user+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



--
Erik Bernhardsson
Engineering Manager, Spotify, New York

Imran Akbar

unread,
Jan 31, 2014, 2:00:32 PM1/31/14
to luigi...@googlegroups.com, Imran Akbar
that worked perfectly Erik, thanks!

for future reference, here is what the code would look like:

class Child(luigi.Task):
    def output(self):
        return luigi.LocalTarget("workflow_output/" + date.today().isoformat() + "/child.txt")

    def run(self):
        ...
        f = self.output().open('w')
        f.write(pickle.dumps(your_data)) # to pass to the next task
        f.close()

class Parent(luigi.Task):
    def requires(self):
        return Child()

    def output(self):
        return luigi.LocalTarget("workflow_output/" + date.today().isoformat() + "/parent.txt")

    def run(self):
        your_data = pickle.load(self.input().open('r'))

imran

Erik Bernhardsson

unread,
Jan 31, 2014, 3:02:12 PM1/31/14
to Imran Akbar, luigi...@googlegroups.com
Looks good. The one thing I would consider is making the date a parameter of the Parent

date = luigi.DateParameter(default=date.today())

and explicitly depending on it in the Parent

def requires(self): return Child(self.date)

Eyad Sibai

unread,
Aug 1, 2015, 12:07:41 PM8/1/15
to Luigi, skun...@gmail.com
Hi!

It seems I have this problem with python3 ... the current solution won't work ... the file is not open in binary mode somehow!

I am using Mac OS X not Windows

Eyad Sibai

unread,
Aug 1, 2015, 5:29:59 PM8/1/15
to Luigi, skun...@gmail.com
If I use format=MixedUnicodeBytes it works!

Eyad Sibai

unread,
Nov 26, 2015, 6:17:58 PM11/26/15
to Luigi, skun...@gmail.com
Apparently MixedUnicodeBytes works only for LocalTarget...

pickle in python3 writes bytes while in python2 writes string...

so one way we did it is to convert the pickle object to string (base64) then decode when we read
Reply all
Reply to author
Forward
0 new messages