I am using ScalaNLP (among other Scala programs) to build a benchmark
suite for JVM research. As workloads for this benchmark I have chosen
the examples provided with TMT
<http://nlp.stanford.edu/software/tmt/tmt-0.3/>, pre-compiled as Scala
scripts. These reside side by side with the input data in a dedicated
scratch directory (which I have no control over):
$SCRATCH/tmt/example-2-lda-learn.class
$SCRATCH/tmt/pubmed-oa-subset.csv
Unfortunately, I have problems getting consistent outputs, since
ScalaNLP's outputs (System.out, System.err, names of created files) are
sensitive to the directory the scripts and their input data reside in.
In particular, lines like the following (from
<http://nlp.stanford.edu/software/tmt/tmt-0.3/examples/example-2-lda-learn.scala>)
cause trouble:
val modelPath = file("lda-"+dataset.signature+"-"+params.signature);
Also, Parcel.signature is annoyingly sensitive to pathnames.
Is it possible to use Pipes.cd (and maybe setting "user.dir") in such a
way as to produce consistent signatures/outputs, no matter what
directory $SCRATCH actually refers to? Just cd'ing to $SCRATCH/tmt
doesn't do the trick, as these directories now become part of the
various signatures. (If need be, I can reliably normalize directory
prefixes in the output, but that's pretty much impossible to do reliably
with signatures/hashcodes.)
Best wishes,
Andreas