Most elegant way of using targets of one task to compute actions and targets of another?

85 views
Skip to first unread message

nikolaus....@daltonmaag.com

unread,
Aug 6, 2018, 4:00:38 AM8/6/18
to python-doit
Hi!
I use doit in a pipeline sort of way. I generate some binaries from source files and apply several additional modifications in several different tasks. The issue here is that depending on where I use doit, the amount of initial binaries can vary greatly depending on several factors. Changing all tasks to handle all of that quickly becomes unwieldy.

I'd like to be able to say something like

```
def task_one():
    yield {
        'actions': ['...'],
        'targets': [...],
    }

def task_two(task_one):
    for target in task_one.targets:
        yield ...
```

What would be the most elegant way of doing this currently? Not sure if I can use getargs to do this?

Eduardo Schettino

unread,
Aug 6, 2018, 3:07:45 PM8/6/18
to python-doit
I am not 100% clear what exactly is your problem.

Task metadata can have a `basename` that will be used instead of
function name `task_xxx`.

Then you can a for-loop yield tasks inside another...


```
def task_all():
    for obj in some_list:
        targets = obj.get_targets()
        yield {
            'basename': 'one',
            'name': obj.name,
            'actions': ['...'],
            'targets': targets,
        }

        for target in targets:
            yield {
                'basename': 'two',
                ....
```

You could also create your own class that returns a bunch of task dicts.

cheers

--
You received this message because you are subscribed to the Google Groups "python-doit" group.
To unsubscribe from this group and stop receiving emails from it, send an email to python-doit...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Nikolaus Waxweiler

unread,
Aug 7, 2018, 4:30:29 AM8/7/18
to pytho...@googlegroups.com
My problem is: I have task_one that produces a varying amount of
targets depending on several factors, and task_two to task_nine that
were designed to mix and match the rest of the pipeline. In a project
dodo.py, you'd import e.g. task_one, task_two and task_six to get the
job done is a specific way.

The problem now becomes, how to most elegantly get at the task dicts
of the tasks you depend upon (explicitly or implicitly) so you can
simply iterate over the other tasks targets. A less elegant way would
be to @create_after and Path().glob() what you need... but ugh. And
you lose the distinction of what was produced why which sub-task.

Putting everything into task_all is possible, but messes with modularity.

What I want is:
```
def task_two():
task_one_targets = ...
for target in task_one_targets:
yield {
"actions": [f"modify {target} --output other/dir"]
"targets": [...]
}
```

Eduardo Schettino

unread,
Aug 7, 2018, 11:21:19 AM8/7/18
to python-doit
The most elegant way is up to your taste.
doit does not care how you create/manage the task-metadata.
That's the point in using a plain dictionary, instead of an API.

Another example:

```
TASK = {
    'actions': [],
    'targets': [],
}

def task_one():
    return TASK

def task_two():
    for target in TASK['targets']:

        yield {
            "actions": [f"modify {target} --output other/dir"]
            "targets": [...]
        }
```

- The project letsdoit has some decorators and classes.
https://bitbucket.org/takluyver/letsdoit

- Nikola defines classes with a method `gen_tasks` that returns dict with metadata
https://github.com/getnikola/nikola/blob/master/nikola/plugins/task/posts.py#L51

cheers,
  Eduardo

nikolaus....@daltonmaag.com

unread,
Oct 17, 2018, 9:56:12 AM10/17/18
to python-doit
Hi again,
finally getting back to this. None of these elegantly solve a gripe I have with calling tasks directly...

Imagine I have task A, B, C. Task A produces some targets as sub-tasks, task B takes those and produces new sub-tasks from them, Task C takes these and produces a third set of sub-tasks. Task A does not depend on previous tasks. Task B depends on A and so calls task_A() again to get the targets it produces. Task C depends on B and calls task_B(), which calls task_A() yet again. The call chain grows quickly as I add more tasks and they all depend on the tasks before them. My sources potentially contain thousands of files within, so this is bound to be time consuming...

I'm experimenting with memoization, the following works if you always produce sub-tasks instead of normal ones:

```
def memoize(func):
    def memoized_generator(cache=[]):
        if not cache:
            cache[:] = [x for x in func()]
        yield from cache

    return memoized_generator


@memoize
def task_a():
    print("a")
    for i in [1, 2, 3]:
        yield {
            "name": i,
            "actions": [f"echo hello{i} > hello{i}.txt"],
            "targets": [f"hello{i}.txt"],
        }


@memoize
def task_b():
    print("b")
    for i, subtask in enumerate(task_a()):
        sources = subtask["targets"]
        yield {
            "name": i,
            "actions": [f"echo {sources} > hellob{i}.txt"],
            "targets": [f"hellob{i}.txt"],
        }


@memoize
def task_c():
    print("c")
    for i, subtask in enumerate(task_b()):
        sources = subtask["targets"]
        yield {
            "name": i,
            "actions": [f"echo {sources} > helloc{i}.txt"],
            "targets": [f"helloc{i}.txt"],
        }
```
Output:
```
> doit
a
b
c
.  a:1
.  a:2
.  a:3
.  b:0
.  b:1
.  b:2
.  c:0
.  c:1
.  c:2
```

Nikolaus Waxweiler

unread,
Jul 3, 2019, 7:43:54 AM7/3/19
to python-doit
Aaargh... turns out that this doesn't work cleanly with multiprocessing, because the wrappers can't be pickled or something. Something as simple as importing a decorated task function into a seperate dodo.py can cause pickle errors when using multiprocessing. I have resigned myself to just building a giant amorphous global BuildSystemState object to transport state between tasks...
Reply all
Reply to author
Forward
0 new messages