branch grafts

1 view
Skip to first unread message

Nathan Schneider

unread,
Dec 31, 2012, 10:05:16 PM12/31/12
to ducttap...@googlegroups.com
I've been experimenting with branch grafts. I'm not sure if this is a
bug or a misunderstanding on my part:

$ cat branch.tape
task t1 < in=(B: one=big1.txt two=big2.txt) > out {
echo $in > $out
}
task t2 < in=$out@t1[B:two] > out {
echo $in > $out
}
task t3 < in=$out@t2 > out :: Q=(B: one=q1 two=q2) {
echo $in > $out
echo $Q >> $out
}
plan All {
reach t3 via (B: *)
}

$ ducttape branch.tape
ducttape 0.2.1
By Jonathan Clark
Have 6 previous workflow versions
Finding hyperpaths contained in plan...
Finding vertices for plan: All
Have 5 candidate tasks matching plan's realizations: t1 t2 t3
Found 2 realizations of goal task t3: B.two Baseline.baseline
Found 4 vertices implied by realization plan All
Union of all planned vertices has size 4
Planned 4 vertices
Checking for completed steps...
Task incomplete t1/B.two: No previous output
Task incomplete t2/Baseline.baseline: No previous output
Task incomplete t3/Baseline.baseline: No previous output
Task incomplete t3/B.two: No previous output
Finding packages...
Found 0 packages
Checking for already built packages...
Checking inputs...
Work plan:
RUN: /usr0/nschneid/funkyworkflow/./t1/B.two
RUN: /usr0/nschneid/funkyworkflow/./t2/Baseline.baseline
RUN: /usr0/nschneid/funkyworkflow/./t3/Baseline.baseline
RUN: /usr0/nschneid/funkyworkflow/./t3/B.two
Are you sure you want to run these 4 tasks? [y/n] y
Retreiving code and building...
Moving previous partial output to the attic...
Executing tasks...
Acquiring lock for t1/B.two
Running t1/B.two in /usr0/nschneid/funkyworkflow/./t1/B.two
Using submitter shell
Completed t1/B.two
Acquiring lock for t2/Baseline.baseline
Running t2/Baseline.baseline in
/usr0/nschneid/funkyworkflow/./t2/Baseline.baseline
Using submitter shell
Completed t2/Baseline.baseline
Acquiring lock for t3/Baseline.baseline
Running t3/Baseline.baseline in
/usr0/nschneid/funkyworkflow/./t3/Baseline.baseline
Using submitter shell
Completed t3/Baseline.baseline
Acquiring lock for t3/B.two
Running t3/B.two in /usr0/nschneid/funkyworkflow/./t3/B.two
Using submitter shell
Completed t3/B.two

$ ls -lo t?
t1:
total 4
drwxr-xr-x 2 nschneid 4096 2012-12-31 21:45 B.two

t2:
total 4
drwxr-xr-x 2 nschneid 4096 2012-12-31 21:45 Baseline.baseline

t3:
total 8
drwxr-xr-x 2 nschneid 4096 2012-12-31 21:45 Baseline.baseline
lrwxrwxrwx 1 nschneid 17 2012-12-31 21:45 B.one -> Baseline.baseline
drwxr-xr-x 2 nschneid 4096 2012-12-31 21:45 B.two

$ head t?/B*/out
==> t1/B.two/out <==
/usr0/nschneid/funkyworkflow/big2.txt

==> t2/Baseline.baseline/out <==
/usr0/nschneid/funkyworkflow/./t1/B.two/out

==> t3/Baseline.baseline/out <==
/usr0/nschneid/funkyworkflow/./t2/Baseline.baseline/out
q1

==> t3/B.one/out <==
/usr0/nschneid/funkyworkflow/./t2/Baseline.baseline/out
q1

==> t3/B.two/out <==
/usr0/nschneid/funkyworkflow/./t2/Baseline.baseline/out
q2


Is this the expected behavior? Note that for t3/Baseline.baseline uses
branch 'one' but its input is from t2/Baseline.baseline which uses
branch 'two'. Under what conditions is a symlink like t3/B.one
created?

Nathan

Jonathan Clark

unread,
Jan 2, 2013, 2:56:20 PM1/2/13
to Nathan Schneider, ducttap...@googlegroups.com
Hi Nathan,

I think you're seeing the expected behavior. There's a few separate issues here.

1) What happens when you use a branch graft such as task t2 < in=$out@t1[B:two]
Well, at task2, you requested that $in use $out@t1 as its input and, because it's a branch graft, specifically that it should only ever use the "two" branch of the "B" branch point. Other than that, t1 nor any of its children will have no further knowledge that B.two was ever part of the derivation. I've taken to calling this the "destructive" nature of branch grafts -- that branches are removed from the realization (branch derivation) after a branch graft is applied. Since there is no other branch point besides baseline at t2, its realization is "Baseline.baseline". Later, task t3, re-introduces the branch point "B", and you'll see both branches of it there -- but this is completely independent of task t2.

2) Where do symlinks in task directories come from?
There are two ways of naming a realization:
a) its full realization name, which explicitly enumerates all branch points (and the branch selected for each branch point) that are included in the realization
b) its canonical realization name, which is used to name the realization on disk for technical reasons.
The technical reason: To keep the workflow extensible -- when you introduce a new branch point, you want to re-use previous runs of the workflow that are compatible and canonical realization names make this possible. When you create a branch point, the first branch is the "baseline" branch and will be called "baseline" in the directory structure. Further, when the baseline branch is being used in a realization, it is dropped from the canonical realization name. This way, if you add a new branch point, all previous realization names don't suddenly become invalid (The baseline branch must mean the same thing as the previous task definition!). The symlinks are created form the full realization name to the canonical realization name just to make things slightly less confusing.

Jon



Nathan

--



Nathan Schneider

unread,
Jan 2, 2013, 9:17:41 PM1/2/13
to Jonathan Clark, ducttap...@googlegroups.com
> The symlinks are
> created form the full realization name to the canonical realization name
> just to make things slightly less confusing.

I understand the naming of the symlinks, but I don't understand why
they are only created for some realizations. There are lots of
Baseline.baseline realizations that don't have symlinks to them,
right? Is there a reason it's necessary with grafting?

Nathan
> --
>
>

Jonathan Clark

unread,
Jan 2, 2013, 9:21:11 PM1/2/13
to Nathan Schneider, ducttap...@googlegroups.com
A Baseline.baseline realization wouldn't have a symlink to it iff the full realization didn't actually have any user-defined branch points in it (i.e. if the full realization name is identical to the canonical realization name).

Nathan Schneider

unread,
Jan 3, 2013, 2:09:16 AM1/3/13
to Jonathan Clark, ducttap...@googlegroups.com
Cool, I think I understand now. Does this allow ducttape to properly
handle changes to workflow structure after it's been run (using the
symlinks it can tell, for instance, if a branch in the original
baseline realization has been removed or renamed)?
> --
>
>
Reply all
Reply to author
Forward
0 new messages