self-grafting

7 views
Skip to first unread message

Nathan Schneider

unread,
Jul 17, 2013, 8:29:11 PM7/17/13
to ducttap...@googlegroups.com

I have been thinking about how to formulate a task (such as training a model) that uses a tunable number of iterations, such that intermediate results are saved and used to initialize for subsequent iterations, and can also be used directly by downstream tasks.

A trivial but ugly solution would be creating a different task for each iteration, and using a branch point on the downstream task to select the output of one of those previous tasks.

Another thought is to use “self-grafting”, by which I mean feeding the output of one task into a subsequent realization of the same task by way of a branch graft. I tried this workflow:

$ cat selfgraft.tape
task preproc > trainingdata devdata {
  echo "" > $trainingdata
  echo "" > devdata
}

task learn < in=$trainingdata@preproc init=(I: 0=/dev/null 1=$model@learn[I:0] 2=$model@learn[I:1] 3=$model@learn[I:2] 4=$model@learn[I:3]) > model {
  echo "./train --data $in --init-model $init" > $model
}

# can be run with any value of I
task predict_eval < in=$devdata@preproc model=@learn > preds scores {
  echo "./predict --data $in --model $model" > $preds
  echo "./eval --data $in --preds $preds" > $scores
}

This is not perfect because it still requires manually specifying the inputs for each iteration. But it is more compact than having a bunch of tasks, and conceptually, it seems to me like it should work: though the learn task takes its own output as an input, it is strictly from a completed realization, so the dependencies are correctly specified. Here's what happens:

$ ../ducttape selfgraft.tape -j4
ducttape 0.3
by Jonathan Clark
Loading workflow version history...
Have 0 previous workflow versions
No plans specified in workflow -- Using default one-off realization plan: Each realization will have no more than 1 non-baseline branch
Checking for completed tasks
Finding packages...
Found 0 packages
Checking for already built packages (if this takes a long time, consider switching to a local-disk git clone instead of a remote repository)...
Checking inputs...
Work plan (depth-first traversal):
RUN: /Users/nathan/dev/nlp-tools/ducttape-0.3/testexamples/./preproc/Baseline.baseline (Baseline.baseline)
RUN: /Users/nathan/dev/nlp-tools/ducttape-0.3/testexamples/./learn/Baseline.baseline (I.0)
RUN: /Users/nathan/dev/nlp-tools/ducttape-0.3/testexamples/./predict_eval/Baseline.baseline (I.0)
RUN: /Users/nathan/dev/nlp-tools/ducttape-0.3/testexamples/./learn/I.1 (I.1)
RUN: /Users/nathan/dev/nlp-tools/ducttape-0.3/testexamples/./predict_eval/I.1 (I.1)
RUN: /Users/nathan/dev/nlp-tools/ducttape-0.3/testexamples/./learn/I.2 (I.2)
RUN: /Users/nathan/dev/nlp-tools/ducttape-0.3/testexamples/./predict_eval/I.2 (I.2)
RUN: /Users/nathan/dev/nlp-tools/ducttape-0.3/testexamples/./learn/I.3 (I.3)
RUN: /Users/nathan/dev/nlp-tools/ducttape-0.3/testexamples/./predict_eval/I.3 (I.3)
RUN: /Users/nathan/dev/nlp-tools/ducttape-0.3/testexamples/./learn/I.4 (I.4)
RUN: /Users/nathan/dev/nlp-tools/ducttape-0.3/testexamples/./predict_eval/I.4 (I.4)
Are you sure you want to run these 11 tasks? [y/n] y
Exception in thread "main" java.lang.RuntimeException: Task not found: learn/Baseline.baseline/1
    at ducttape.versioner.WorkflowVersionStore$.dependencies(WorkflowVersionStore.scala:136)
    at ducttape.versioner.WorkflowVersionStore$$anonfun$6.apply(WorkflowVersionStore.scala:177)
    at ducttape.versioner.WorkflowVersionStore$$anonfun$6.apply(WorkflowVersionStore.scala:177)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
    at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
    at scala.collection.AbstractTraversable.map(Traversable.scala:105)
    at ducttape.versioner.WorkflowVersionStore$.create(WorkflowVersionStore.scala:177)
    at ducttape.versioner.TentativeWorkflowVersionInfo.commit(WorkflowVersionInfo.scala:101)
    at ducttape.cli.ExecuteMode$.run(ExecuteMode.scala:120)
    at Ducttape$$anonfun$main$8.apply(ducttape.scala:879)
    at ducttape.cli.ErrorUtils$.ex2err(ErrorUtils.scala:59)
    at Ducttape$.main(ducttape.scala:572)
    at Ducttape.main(ducttape.scala)

So the static analysis doesn't complain but the workflow fails to run.

Any thoughts on whether there's a better solution, or a way around this, or whether this is a bug?

Cheers,
Nathan

Nathan Schneider

unread,
Jul 18, 2013, 9:33:24 AM7/18/13
to Greg Hanneman, ducttap...@googlegroups.com

Greg,

I think a sequence branch point only lets you specify that the branches are a range of integers: e.g., init=(I: 0..4) would be equivalent to init=(I: 0 1 2 3 4). As far as I know there's no way to choose other branch-specific values in a single expression, like init=(I: 0..4=$model@learn[$I-1]) (or i=(I: 0..4) and init=$model@learn[$I-1]). It would be nice to be able to do arithmetic on sequence branch points, though, because they are guaranteed to be integers.

Nathan



On Wed, Jul 17, 2013 at 9:30 PM, Greg Hanneman <ghan...@gmail.com> wrote:
Hey Nathan,

I've never tried it, but I remember reading something in the DuctTape tutorial about "sequence" branch points or sequence tasks or something like that.  I think there was some syntax where you specify a number, and then the task is repeated that many time.  Do you think that might work, possibly in conjunction with the branch grafts from the previous iteration you're trying now?

Greg.


--
You received this message because you are subscribed to the Google Groups "ducttape-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ducttape-user...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 


Jon Clark

unread,
Jul 18, 2013, 10:26:45 AM7/18/13
to Nathan Schneider, ducttap...@googlegroups.com
I like this pattern a lot. Self-grafting seems like a good name.

In principle, there's nothing wrong with this -- it is just an unforeseen use case. File a bug on it.

Warning: my cycles for this are currently a bit limited. If someone wants to investigate, I'm happy to advise via email.

Sent from my Windows Phone

From: Nathan Schneider
Sent: ‎7/‎17/‎2013 5:29 PM
To: ducttap...@googlegroups.com
Subject: self-grafting

Reply all
Reply to author
Forward
0 new messages