Signatures: Producing consistent outputs regardless of current working directory

8 views
Skip to first unread message

Andreas Sewe

unread,
Mar 10, 2011, 5:02:18 PM3/10/11
to scal...@googlegroups.com
Hi all,

I am using ScalaNLP (among other Scala programs) to build a benchmark
suite for JVM research. As workloads for this benchmark I have chosen
the examples provided with TMT
<http://nlp.stanford.edu/software/tmt/tmt-0.3/>, pre-compiled as Scala
scripts. These reside side by side with the input data in a dedicated
scratch directory (which I have no control over):

$SCRATCH/tmt/example-2-lda-learn.class
$SCRATCH/tmt/pubmed-oa-subset.csv

Unfortunately, I have problems getting consistent outputs, since
ScalaNLP's outputs (System.out, System.err, names of created files) are
sensitive to the directory the scripts and their input data reside in.
In particular, lines like the following (from
<http://nlp.stanford.edu/software/tmt/tmt-0.3/examples/example-2-lda-learn.scala>)
cause trouble:

val modelPath = file("lda-"+dataset.signature+"-"+params.signature);

Also, Parcel.signature is annoyingly sensitive to pathnames.

Is it possible to use Pipes.cd (and maybe setting "user.dir") in such a
way as to produce consistent signatures/outputs, no matter what
directory $SCRATCH actually refers to? Just cd'ing to $SCRATCH/tmt
doesn't do the trick, as these directories now become part of the
various signatures. (If need be, I can reliably normalize directory
prefixes in the output, but that's pretty much impossible to do reliably
with signatures/hashcodes.)

Best wishes,

Andreas

Reply all
Reply to author
Forward
0 new messages