Initializing parameters based on the value of other parameters

572 views
Skip to first unread message

ian.t....@gmail.com

unread,
May 12, 2016, 12:04:51 PM5/12/16
to Luigi
I am trying to have default values for a task that come from the values provided in a config file and ingested.

For example, in the config I might have

[mytask]
param_a = file.txt

and then in my script:

class mytask(luigi.Task):
param_a = luigi.Parameter()
param_b = luigi.Parameter(default=extract_info(param_a))


This allows me to have a default value for param_b if it is undefined in the config file, or over-ride that if desired.

I can't see a way to do this, however. We can't actually assign parameters during the initialization phase of a task. At the point where extract_info(param_a) is executed, param_a does not have a value yet, as far as I can tell in my debugger.

Does anyone have any ideas on how this could be accomplished? This goes towards the larger question of dynamic parameters on task instantiation in general.

Dave Buchfuhrer

unread,
May 12, 2016, 1:00:29 PM5/12/16
to Ian Fiddes, Luigi
The way I usually do this is to make param_b have a default of None, and use a getter to dynamically construct its value when it's None. You could also override __init__ to overwrite the None value.


--
You received this message because you are subscribed to the Google Groups "Luigi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to luigi-user+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

ian.t....@gmail.com

unread,
May 12, 2016, 1:07:34 PM5/12/16
to Luigi, ian.t....@gmail.com
I read this SO post, which had the __init__ solution, but the comments pointed out that this is not really compatible with how a luigi Parameter works.

http://stackoverflow.com/questions/31008035/luigi-parameter-default-values-and-mocks

The dynamic getter seems reasonable to me, except that a similar problem occurs - this value is no longer a significant parameter for luigi to determine duplicate tasks from.

Dave Buchfuhrer

unread,
May 12, 2016, 7:22:55 PM5/12/16
to Ian Fiddes, Luigi
The SO way is weird. Instead, just do something like

class MyTask(luigi.Task):
    param_a = luigi.Parameter()
    param_b = luigi.Parameter(default=None)

    def __init__(self, *args, **kwargs):
        super(MyTask, self).__init__(*args, **kwargs)
        if self.param_b is None:
            self.param_b = extract_info(param_a)

You can do this a bit more safely by overriding get_param_values, or much more safely by just explicitly specifying param_b. If you go with the get_params route, try making it work through an optional argument to Parameter so you can put it in the base luigi code and make a pull request.

abhish...@gmail.com

unread,
Dec 29, 2017, 1:41:20 AM12/29/17
to Luigi

This is what i used for myself...


#!/usr/bin/env python2.7
from datetime import datetime, timedelta

import luigi
from luigi import LocalTarget


class FirstTest(luigi.Task):
hours_delay = luigi.IntParameter(default=2)
pdate = luigi.DateSecondParameter(
default=datetime.now() - timedelta(hours=hours_delay.task_value('FirstTest', 'hours_delay')))

def __init__(self, *args, **kwargs):
super(FirstTest, self).__init__(*args, **kwargs)
if self.hours_delay != FirstTest.hours_delay.task_value('FirstTest', 'hours_delay'):
FirstTest.pdate = luigi.DateSecondParameter(
default=datetime.now() - timedelta(hours=self.hours_delay))
super(FirstTest, self).__init__(*args, **kwargs)

def run(self):
print(self.pdate)

def output(self):
return LocalTarget("always_run.txt")


if __name__ == '__main__':
luigi.run(main_task_cls=FirstTest)


Reply all
Reply to author
Forward
0 new messages