Joshua v6.0.4 KenLM and Thrax Problem

44 views
Skip to first unread message

Reza Lesmana

unread,
Jun 18, 2015, 3:59:03 AM6/18/15
to joshua_d...@googlegroups.com
Hi devs,

I'm running Joshua v6.0.4 using my training data. But it seems that it has Thrax problem.

Previously, when I built the Joshua v6.0.4 it has a problem for not successfully build KenLM
because unable to detect my BOOST_ROOT. If I remember it correctly, the problem is on the line 40
of $JOSHUA/src/joshua/decoder/ff/lm/kenlm/Makefile

I change the KenLM Makefile with the one in Joshua v6.0.3 and it successfully build KenLM.
(I'm using Ubuntu 15.04)

And after I've been able to build Joshua v6.0.4 and running it, it encounters error during running Thrax.

Here is the tail output on the console :
------------------------------------------------------
[thrax-run] rebuilding...
  dep=/media/thesis/working_directory/data/train/thrax-input-file [CHANGED]
  dep=thrax-hiero.conf [CHANGED]
  dep=grammar.gz [NOT FOUND]
  cmd=hadoop/bin/hadoop jar /home/rezalesmana/joshua-v6.0.4/thrax/bin/thrax.jar -D mapred.child.java.opts='-Xmx10g' thrax-hiero.conf thrax > thrax.log 2>&1; rm -f grammar grammar.gz; hadoop/bin/hadoop fs -getmerge thrax/final/ grammar.gz
  JOB FAILED (return code 255)
getmerge: File thrax/final does not exist.

--------------------------------------------------------

And I try to see if the Thrax throws an error on the thrax.log, and this is the tail of thrax.log :

-------------------------------------------------------------------------------------------------------------------------------
15/06/18 07:13:46 INFO mapred.MapTask: Finished spill 81
15/06/18 07:13:48 INFO mapred.LocalJobRunner:
15/06/18 07:13:48 INFO mapred.MapTask: Starting flush of map output
15/06/18 07:13:49 INFO mapred.MapTask: Finished spill 82
15/06/18 07:13:49 WARN mapred.LocalJobRunner: job_local_0003
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for taskTracker/jobcache/job_local_0003/attempt_local_0003_m_000004_0/output/file.out
        at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:343)
        at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124)
        at org.apache.hadoop.mapred.MapOutputFile.getOutputFileForWrite(MapOutputFile.java:61)
        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1469)
        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1154)
        at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:549)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:623)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
15/06/18 07:13:51 INFO mapred.LocalJobRunner:
[SCHED] class edu.jhu.thrax.hadoop.jobs.ExtractionJob in state FAILED
[SCHED] class edu.jhu.thrax.hadoop.features.annotation.AnnotationFeatureJob in state PREREQ_FAILED
[SCHED] class edu.jhu.thrax.hadoop.features.mapred.TargetPhraseGivenSourceFeature in state PREREQ_FAILED
[SCHED] class edu.jhu.thrax.hadoop.features.mapred.SourcePhraseGivenTargetFeature in state PREREQ_FAILED
[SCHED] class edu.jhu.thrax.hadoop.jobs.OutputJob in state PREREQ_FAILED
class edu.jhu.thrax.hadoop.jobs.ExtractionJob   FAILED
class edu.jhu.thrax.hadoop.jobs.TargetWordGivenSourceWordProbabilityJob SUCCESS
class edu.jhu.thrax.hadoop.jobs.VocabularyJob   SUCCESS
class edu.jhu.thrax.hadoop.features.annotation.AnnotationFeatureJob     PREREQ_FAILED
class edu.jhu.thrax.hadoop.features.mapred.TargetPhraseGivenSourceFeature       PREREQ_FAILED
class edu.jhu.thrax.hadoop.features.mapred.SourcePhraseGivenTargetFeature       PREREQ_FAILED
class edu.jhu.thrax.hadoop.jobs.SourceWordGivenTargetWordProbabilityJob SUCCESS
class edu.jhu.thrax.hadoop.jobs.OutputJob       PREREQ_FAILED

--------------------------------------------------------------------------------------------------------------------------------------------

On the Joshua v6.0.3, thrax is running fine (I think), but that version has problem with tuner (although I haven't reach tuning problem in Joshua v6.0.4 yet).

How to solve this problem? Thanks a lot for your help.

I'm running the Joshua using this pipeline command:
-------------------------------------------------------------------
$JOSHUA/bin/pipeline.pl --corpus input/train --tune input/tune --test input/test --source en --target id --aligner berkeley --joshua-mem 10g --threads 4 --hadoop-mem 10g --buildlm-mem 10g
-------------------------------------------------------------------

Regards,
Reza Lesmana

Reza Lesmana

unread,
Jun 18, 2015, 4:13:41 AM6/18/15
to joshua_d...@googlegroups.com
Oops, I think I may have found the problem. The disk where $JOSHUA is located seems to be full (only 300MB++ available out of 30GB).

I run the pipeline in a working directory located in an extension hdd with quite a lot of space (more than 80GB available), but it seems
the temporary folder of thrax is being saved in the disk where $JOSHUA is located.

Will update as soon as possible.

Regards,
Reza Lesmana

Matt Post

unread,
Jun 18, 2015, 7:30:02 AM6/18/15
to joshua_d...@googlegroups.com
Hadoop is now unpacked into /tmp (or whatever you set --tmp to). If you want it to go in your current directory, you could add '--tmp .' to your pipeline invocation.



--
You received this message because you are subscribed to the Google Groups "Joshua Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to joshua_develop...@googlegroups.com.
To post to this group, send email to joshua_d...@googlegroups.com.
Visit this group at http://groups.google.com/group/joshua_developers.
For more options, visit https://groups.google.com/d/optout.

Reza Lesmana

unread,
Jun 18, 2015, 9:25:05 AM6/18/15
to joshua_d...@googlegroups.com
Hi, Matt

I was looking for that, but I try to remove some of unused folders from my previous failed experiments and
try to run it again once more. 

And, it looks good and succeed until [analyze-test] step. I'm currently trying to download the result from the server. 


Will update as soon as possible.

Regards,
Reza Lesmana

--
You received this message because you are subscribed to a topic in the Google Groups "Joshua Developers" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/joshua_developers/lVsD3C6wEL4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to joshua_develop...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages