Best practices for expressing parameter inheritance

2,322 views
Skip to first unread message

ale...@gmail.com

unread,
Oct 16, 2013, 8:30:59 PM10/16/13
to luigi...@googlegroups.com
Hi all,

I'm using Luigi to build out a workflow that ends in training a statistical model with a lot of free parameters on some processed data.

Schematically, the workflow is

RawData -> ProcessedData -> TrainedModel

where:
RawData is an external dependency that only has a single parameter that points to the data location
- data_name
ProcessedData is a Task that has a few parameters, e.g.
- smoothing_amount
- smoothing_type
TrainedModel is a Task that has a separate set of parameters for the statistical model, e.g.
- alpha
- beta
- gamma

All parameters besides "data_name" have useful defaults.

Ideally, I'd like to be able to run something like

python pipeline.py TrainedModel --alpha=0.5 --smoothing_amount=3 --data_name=/path/to/data.csv

Right now, I'm making TrainedModel be a subclass of ProcessedData, which is a subclass of RawData, and then doing a little bit of argument stuffing using get_params() and get_param_values() during initialization.

Is there a Luigi-approved way for passing parameters up a batching chain without explicitly shuttling them around?

ale...@gmail.com

unread,
Oct 16, 2013, 8:36:40 PM10/16/13
to luigi...@googlegroups.com, ale...@gmail.com
It's not difficult, just a little cumbersome, which feels out of place in this otherwise quite nice framework. My strategy is this:

# TaskB depends on TaskA

# I'm using inheritance to get around copy/pasting all the paramters.
# For the kinds of models I'm training, there can be a lot of knobs

class TaskB(TaskA):
...
def requires(self):
# Get all the arguments in common between TaskA and TaskB
common_params = list(set.intersection(set(TaskA.get_params()),set(self.get_params())))
common_kwargs = dict([(key,self.param_kwargs[key]) for key in dict(common_params).keys()])
vals = dict(self.get_param_values(common_params, [], common_kwargs))
return TaskA(**vals)
...

Joe Ennever

unread,
Oct 17, 2013, 1:31:13 PM10/17/13
to luigi...@googlegroups.com, ale...@gmail.com
Here's one pattern that we use:

class FooParamsMixin(object):
  param1 = luigi.Parameter()
  param2 = luigi.Parameter()
  ...
  
  def foo_params(self):
    return { 'param1': self.param1, 'param2' : self.param2, ... }

class TaskA(FooParamsMixin, luigi.Task):
   def requires(self):
      return TaskB(**self.foo_params(), # plus any other params)

class TaskB(FooParamsMixin, luigi.Task):
   pass

I'd be interested to hear if anyone else had a more succinct way to do this.

Ron Reiter

unread,
Mar 7, 2014, 2:02:58 AM3/7/14
to luigi...@googlegroups.com, ale...@gmail.com
There are two issues with inheritance - parameter passing and requirement addition. 

Luigi can be modified to support both of them, so that when the requires function is called for example, the parent's functions are also called and aggregated into a bigger list.

Were there any attempts or other best practices that I am not aware of?

Samuel Lampa

unread,
Apr 16, 2014, 6:34:26 PM4/16/14
to luigi...@googlegroups.com, ale...@gmail.com
On Thursday, October 17, 2013 7:31:13 PM UTC+2, Joe Ennever wrote:
Here's one pattern that we use:

class FooParamsMixin(object):
  param1 = luigi.Parameter()
  param2 = luigi.Parameter()
  ...
  
  def foo_params(self):
    return { 'param1': self.param1, 'param2' : self.param2, ... }

class TaskA(FooParamsMixin, luigi.Task):
   def requires(self):
      return TaskB(**self.foo_params(), # plus any other params)

class TaskB(FooParamsMixin, luigi.Task):
   pass

I'd be interested to hear if anyone else had a more succinct way to do this.

That's very interesting! Thanks for sharing!

So far, what I came up with, which feels far from optimal, is something along the lines of:

class MetaTask(luigi.Task):
    args = luigi.Parameter()

    def get_arg(self, name):
        if not hasattr(self, 'args_dict'):
            self.args_dict = ast.literal_eval(self.args)
        if name in self.args_dict:
            return str(self.args_dict[name])
        else:
            raise Exception("No argument {arg} in args dict!".format(arg=name))

class TaskA(MetaTask):
    def requires(self):
        ...

class TaskB(MetaTask):
    def requires(self):
        return TaskA(args=self.args)

class TaskC(MetaTask):
    def requires(self):
        return TaskB(args=self.args)

class RunAll(luigi.Task):
    param1 = luigi.Parameter()
    param2 = luigi.Parameter()

    def requires(self):
        args = str({'param1' : self.param1,
                        'param2' : self.param2 })
        yield TaskC(args=args)

(discussed briefly here https://groups.google.com/d/msg/luigi-user/uZWti9HBrb8/EvM0e8igGZ8J )
... but your solution looks so much cleaner, as AFIAS, it allows to specify parameters to any task in the workflow in the "normal" luigi way?

I should definitely test out your approach, thanks for sharing!

Cheers
// Samuel
Reply all
Reply to author
Forward
0 new messages