Hi devs,
I'm running Joshua v6.0.4 using my training data. But it seems that it has Thrax problem.
Previously, when I built the Joshua v6.0.4 it has a problem for not successfully build KenLM
because unable to detect my BOOST_ROOT. If I remember it correctly, the problem is on the line 40
of $JOSHUA/src/joshua/decoder/ff/lm/kenlm/Makefile
I change the KenLM Makefile with the one in Joshua v6.0.3 and it successfully build KenLM.
(I'm using Ubuntu 15.04)
And after I've been able to build Joshua v6.0.4 and running it, it encounters error during running Thrax.
Here is the tail output on the console :
------------------------------------------------------
[thrax-run] rebuilding...
dep=/media/thesis/working_directory/data/train/thrax-input-file [CHANGED]
dep=thrax-hiero.conf [CHANGED]
dep=grammar.gz [NOT FOUND]
cmd=hadoop/bin/hadoop jar /home/rezalesmana/joshua-v6.0.4/thrax/bin/thrax.jar -D mapred.child.java.opts='-Xmx10g' thrax-hiero.conf thrax > thrax.log 2>&1; rm -f grammar grammar.gz; hadoop/bin/hadoop fs -getmerge thrax/final/ grammar.gz
JOB FAILED (return code 255)
getmerge: File thrax/final does not exist.--------------------------------------------------------
And I try to see if the Thrax throws an error on the thrax.log, and this is the tail of thrax.log :
-------------------------------------------------------------------------------------------------------------------------------
15/06/18 07:13:46 INFO mapred.MapTask: Finished spill 81
15/06/18 07:13:48 INFO mapred.LocalJobRunner:
15/06/18 07:13:48 INFO mapred.MapTask: Starting flush of map output
15/06/18 07:13:49 INFO mapred.MapTask: Finished spill 82
15/06/18 07:13:49 WARN mapred.LocalJobRunner: job_local_0003
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for taskTracker/jobcache/job_local_0003/attempt_local_0003_m_000004_0/output/file.out
at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:343)
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124)
at org.apache.hadoop.mapred.MapOutputFile.getOutputFileForWrite(MapOutputFile.java:61)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1469)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1154)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:549)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:623)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
15/06/18 07:13:51 INFO mapred.LocalJobRunner:
[SCHED] class edu.jhu.thrax.hadoop.jobs.ExtractionJob in state FAILED
[SCHED] class edu.jhu.thrax.hadoop.features.annotation.AnnotationFeatureJob in state PREREQ_FAILED
[SCHED] class edu.jhu.thrax.hadoop.features.mapred.TargetPhraseGivenSourceFeature in state PREREQ_FAILED
[SCHED] class edu.jhu.thrax.hadoop.features.mapred.SourcePhraseGivenTargetFeature in state PREREQ_FAILED
[SCHED] class edu.jhu.thrax.hadoop.jobs.OutputJob in state PREREQ_FAILED
class edu.jhu.thrax.hadoop.jobs.ExtractionJob FAILED
class edu.jhu.thrax.hadoop.jobs.TargetWordGivenSourceWordProbabilityJob SUCCESS
class edu.jhu.thrax.hadoop.jobs.VocabularyJob SUCCESS
class edu.jhu.thrax.hadoop.features.annotation.AnnotationFeatureJob PREREQ_FAILED
class edu.jhu.thrax.hadoop.features.mapred.TargetPhraseGivenSourceFeature PREREQ_FAILED
class edu.jhu.thrax.hadoop.features.mapred.SourcePhraseGivenTargetFeature PREREQ_FAILED
class edu.jhu.thrax.hadoop.jobs.SourceWordGivenTargetWordProbabilityJob SUCCESS
class edu.jhu.thrax.hadoop.jobs.OutputJob PREREQ_FAILED--------------------------------------------------------------------------------------------------------------------------------------------
On the Joshua v6.0.3, thrax is running fine (I think), but that version has problem with tuner (although I haven't reach tuning problem in Joshua v6.0.4 yet).
How to solve this problem? Thanks a lot for your help.
I'm running the Joshua using this pipeline command:
-------------------------------------------------------------------
$JOSHUA/bin/pipeline.pl --corpus input/train --tune input/tune --test input/test --source en --target id --aligner berkeley --joshua-mem 10g --threads 4 --hadoop-mem 10g --buildlm-mem 10g-------------------------------------------------------------------
Regards,
Reza Lesmana