errors while running thrax on a machine with hadoop installation

283 views
Skip to first unread message

Junhui Li

unread,
Jan 29, 2014, 11:04:00 AM1/29/14
to joshua_...@googlegroups.com
Hi,

Our machine has hadoop installation and the command "hadoop version" returns:
---------------------
Subversion file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hadoop-2.0.0-cdh4.4.0/src/hadoop-common-project/hadoop-common -r c0eba6cd38c984557e96a16ccd7356b7de835e79
Compiled by jenkins on Tue Sep  3 19:33:17 PDT 2013
From source with checksum ac7e170aa709b3ace13dc5f775487180
This command was run using /opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/lib/hadoop/hadoop-common-2.0.0-cdh4.4.0.jar
----------------------

When run the following command (command "which hadoop" returns /usr/bin/hadoop, thus I set --hadoop as "/usr")
      $JOSHUA/bin/pipeline.pl --first-step THRAX --last-step THRAX --alignment alignments/training.align --thrax-conf thrax-hiero.conf --corpus input/train --source cn --target en --hadoop /usr
I get the output
--------------------------
[thrax-input-file] cached, skipping...
[thrax-prep] rebuilding...
  dep=/fs/clip-scratch/lijunhui/joshua-v5.0/cn-en/data/train/thrax-input-file
  dep=grammar.gz [NOT FOUND]
  cmd=/usr/bin/hadoop fs -rmr pipeline-cn-en-hiero-_fs_clip-scratch_lijunhui_joshua-v5.0_cn-en; /usr/bin/hadoop fs -mkdir pipeline-cn-en-hiero-_fs_clip-scratch_lijunhui_joshua-v5.0_cn-en; /usr/bin/hadoop fs -put /fs/clip-scratch/lijunhui/joshua-v5.0/cn-en/data/train/thrax-input-file pipeline-cn-en-hiero-_fs_clip-scratch_lijunhui_joshua-v5.0_cn-en/input-file
  took 6 seconds (6s)
[thrax-run] rebuilding...
  dep=/fs/clip-scratch/lijunhui/joshua-v5.0/cn-en/data/train/thrax-input-file
  dep=thrax-hiero.conf
  dep=grammar.gz [NOT FOUND]
  cmd=/usr/bin/hadoop jar /fs/clip-scratch/lijunhui/joshua-v5.0/thrax/bin/thrax.jar -D mapred.child.java.opts='-Xmx2g' thrax-hiero.conf pipeline-cn-en-hiero-_fs_clip-scratch_lijunhui_joshua-v5.0_cn-en > thrax.log 2>&1; rm -f grammar grammar.gz; /usr/bin/hadoop fs -getmerge pipeline-cn-en-hiero-_fs_clip-scratch_lijunhui_joshua-v5.0_cn-en/final/ grammar.gz; /usr/bin/hadoop fs -rmr pipeline-cn-en-hiero-_fs_clip-scratch_lijunhui_joshua-v5.0_cn-en
  took 59 seconds (59s)
[thrax-prep] recaching...
  dep=/fs/clip-scratch/lijunhui/joshua-v5.0/cn-en/data/train/thrax-input-file
  dep=grammar.gz [NOT FOUND]
  cmd=/usr/bin/hadoop fs -rmr pipeline-cn-en-hiero-_fs_clip-scratch_lijunhui_joshua-v5.0_cn-en; /usr/bin/hadoop fs -mkdir pipeline-cn-en-hiero-_fs_clip-scratch_lijunhui_joshua-v5.0_cn-en; /usr/bin/hadoop fs -put /fs/clip-scratch/lijunhui/joshua-v5.0/cn-en/data/train/thrax-input-file pipeline-cn-en-hiero-_fs_clip-scratch_lijunhui_joshua-v5.0_cn-en/input-file
* Quitting at this step
--------------------------------

Look at the attempt logs a bit deeper down, and i see the attempts failed due to the following reasons:
---------------------------------
2014-01-29 10:52:12,107 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.Counter, but class was expected
    at edu.jhu.thrax.hadoop.features.WordLexicalProbabilityCalculator$Map.map(WordLexicalProbabilityCalculator.java:56)
    at edu.jhu.thrax.hadoop.features.WordLexicalProbabilityCalculator$Map.map(WordLexicalProbabilityCalculator.java:28)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:338)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152)

2014-01-29 10:52:12,213 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping MapTask metrics system...
2014-01-29 10:52:12,214 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system stopped.
2014-01-29 10:52:12,214 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system shutdown complete.
-------------------------------------


Not sure the error is due to the hadoop version differences or something else. Any suggestion to resolve this?

Using the hadoop provided in Joshua and running locally, it was fine w/o above errors. However, we plan to use larger training data and running locally seems not a desirable route...

Thanks,

Junhui



Junhui Li

unread,
Jan 29, 2014, 11:40:12 AM1/29/14
to joshua_...@googlegroups.com
The provided hadoop in Joshua is hadoop-0.20.2 and works fine. When I shift it to hadoop-0.23.10 and run locally, it gets similar errors as using the hadoop installed in the machine:
---------------------------------
14/01/29 11:37:48 WARN mapred.LocalJobRunner: job_local296695389_0003
java.lang.Exception: java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.Counter, but class was expected
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:399)
Caused by: java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.Counter, but class was expected

        at edu.jhu.thrax.hadoop.features.WordLexicalProbabilityCalculator$Map.map(WordLexicalProbabilityCalculator.java:56)
        at edu.jhu.thrax.hadoop.features.WordLexicalProbabilityCalculator$Map.map(WordLexicalProbabilityCalculator.java:28)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:726)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:333)
        at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:231)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)
-------------------------

Best,

Junhui

Matt Post

unread,
Jan 29, 2014, 11:58:08 AM1/29/14
to joshua_...@googlegroups.com
Hi,

What version of Joshua are you using?

You shouldn't have to use --hadoop at all. Just set the environment variable $HADOOP and it will be automatically detected.

matt


--
You received this message because you are subscribed to the Google Groups "Joshua Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to joshua_suppor...@googlegroups.com.
To post to this group, send email to joshua_...@googlegroups.com.
Visit this group at http://groups.google.com/group/joshua_support.
For more options, visit https://groups.google.com/groups/opt_out.

Junhui Li

unread,
Jan 29, 2014, 12:17:26 PM1/29/14
to joshua_...@googlegroups.com


On Wednesday, 29 January 2014 11:58:08 UTC-5, Matt Post wrote:
Hi,

What version of Joshua are you using?
Joshua 5.0.
 

You shouldn't have to use --hadoop at all. Just set the environment variable $HADOOP and it will be automatically detected.



Same happened if I set the enviroment variable $HADOOP as /usr and not use --hadoop

Thanks,

Junhui

Juri Ganitkevitch

unread,
Jan 29, 2014, 12:46:41 PM1/29/14
to Joshua Support
Hi Junhui,

The error itself seems to be a type conflict. While Joshua is built against 0.22, I haven't had any trouble running Thrax with Hadoop 1.0.

Setting $HADOOP to /usr seems incorrect though – you'll want to set it to the actual Hadoop directory, and not the enclosing one.

-- Juri

Junhui Li

unread,
Jan 29, 2014, 1:22:04 PM1/29/14
to joshua_...@googlegroups.com


On Wednesday, 29 January 2014 12:46:41 UTC-5, Juri wrote:
Hi Junhui,

The error itself seems to be a type conflict. While Joshua is built against 0.22, I haven't had any trouble running Thrax with Hadoop 1.0.

Setting $HADOOP to /usr seems incorrect though – you'll want to set it to the actual Hadoop directory, and not the enclosing one.

Thanks.

I pointed $HADOOP to /opt/cloudera/parcels/CDH and still no luck.

Someone told me that there are significant changes between hadoop 1.0 and 2.0.....

Junhui

 

Juri Ganitkevitch

unread,
Jan 29, 2014, 1:25:59 PM1/29/14
to Joshua Support
I haven't followed the changes in Hadoop, but if using 1.0 is an option for you I'd recommend switching to that. If you are stuck with 2.0 and the conflicts are limited to the counters (as the errors suggest), you can quite easily comment those out in the source code. The counters provide diagnostic information, and are not essential to the grammar extraction.

Junhui Li

unread,
Jan 29, 2014, 2:59:37 PM1/29/14
to joshua_...@googlegroups.com


On Wednesday, 29 January 2014 13:25:59 UTC-5, Juri wrote:
I haven't followed the changes in Hadoop, but if using 1.0 is an option for you I'd recommend switching to that. If you are stuck with 2.0 and the conflicts are limited to the counters (as the errors suggest), you can quite easily comment those out in the source code. The counters provide diagnostic information, and are not essential to the grammar extraction.

Thanks for the suggestion.

I downloaded thrax and complied it with hadoop 2.0 (need to do some modification in build.xml).

It's good now w/o errors.

Thanks,

Junhui

Junhui Li

unread,
Jan 30, 2014, 11:02:21 PM1/30/14
to joshua_...@googlegroups.com
I am running the thrax on parallel data with ~5M sentence pairs. The extraction command was "$JOSHUA/bin/pipeline.pl --first-step THRAX --last-step THRAX --alignment input/train.al --thrax-conf thrax-hiero.conf --corpus input/train --source zh --target en --hadoop-mem 120g" and it failed at "collection" step, with following error:

-output for attempt_1390542135497_0134_m_000263_0
2014-01-30 13:58:35,509 INFO [fetcher#9] org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 7414066, inMemoryMapOutputs.size() -> 19, commitMemory -> 109146813, usedMemory ->474594763
2014-01-30 13:58:35,511 INFO [fetcher#9] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#9 about to shuffle output of map attempt_1390542135497_0134_m_000283_0 decomp: 7498823 len: 7498827 to MEMORY
2014-01-30 13:58:35,544 INFO [fetcher#2] org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 7460535 bytes from map-output for attempt_1390542135497_0134_m_000178_0
2014-01-30 13:58:35,544 INFO [fetcher#2] org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 7460535, inMemoryMapOutputs.size() -> 20, commitMemory -> 116560879, usedMemory ->482093586
2014-01-30 13:58:35,818 INFO [fetcher#2] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: bespin04b.umiacs.umd.edu:8080 freed by fetcher#2 in 3397s
2014-01-30 13:58:35,818 ERROR [main] org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:lijunhui (auth:SIMPLE) cause:org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#2
2014-01-30 13:58:35,820 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#2
	at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:121)
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:379)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152)
Caused by: java.lang.OutOfMemoryError: Java heap space
	at org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:58)
	at org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:45)
	at org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.<init>(InMemoryMapOutput.java:63)
	at org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.unconditionalReserve(MergeManagerImpl.java:297)
	at org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.reserve(MergeManagerImpl.java:287)
	at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:360)
	at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:295)
	at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:154)

2014-01-30 13:58:35,822 INFO [main] org.apache.hadoop.mapred.Task: Runnning cleanup for the task
2014-01-30 13:58:35,832 INFO [fetcher#9] org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 7498823 bytes from map-output for attempt_1390542135497_0134_m_000283_0
2014-01-30 13:58:35,833 INFO [fetcher#9] org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 7498823, inMemoryMapOutputs.size() -> 21, commitMemory -> 124021414, usedMemory ->604052459
2014-01-30 13:58:35,833 INFO [fetcher#9] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: bespin04a.umiacs.umd.edu:8080 freed by fetcher#9 in 3412s
2014-01-30 13:58:35,838 WARN [main] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: Could not delete hdfs://bespinrm.umiacs.umd.edu:8020/user/lijunhui/pipeline-zh-en-hiero-_fs_clip-scratch_lijunhui_joshua-v5.0_zh-en/collected/_temporary/1/_temporary/attempt_1390542135497_0134_r_000013_1
2014-01-30 13:58:35,940 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping ReduceTask metrics system...
2014-01-30 13:58:35,941 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ReduceTask metrics system stopped.
2014-01-30 13:58:35,941 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ReduceTask metrics system shutdown complete.


I suppose the error is due to out of memory. Since I have already set "--hadoop-mem 120g", doesn't that mean 120g is still not sufficient or I need to set the memory requirement somewhere else?

Thanks,

Junhui

Matt Post

unread,
Jan 31, 2014, 9:44:43 AM1/31/14
to joshua_...@googlegroups.com
You should never need anything near 120 GB of memory. This is the memory given to individual Hadoop mappers and reducers, which are only dealing with small amounts of data. The default of 2 GB is probably fine, but you could increase it to 4 GB.

Junhui Li

unread,
Feb 1, 2014, 10:47:30 AM2/1/14
to joshua_...@googlegroups.com

I am still got stuck at the "Collect" step. The map jobs are sucessful, but the reduce jobs are died with the following errors.

Can someone help me?

Thanks,

-----------------

Log Type: stderr

Log Length: 0

Log Type: stdout

Log Length: 0

Log Type: syslog

Log Length: 2631683

Showing 4096 bytes of 2631683 total. Click here for the full log.

put for attempt_1390542135497_0221_m_000288_0
2014-02-01 01:03:39,774 INFO [fetcher#2] org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 7199194, inMemoryMapOutputs.size() -> 35, commitMemory -> 203307256, usedMemory ->504809368
2014-02-01 01:03:39,774 INFO [fetcher#2] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#2 about to shuffle output of map attempt_1390542135497_0221_m_000139_0 decomp: 2 len: 6 to MEMORY
2014-02-01 01:03:39,774 INFO [fetcher#2] org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 2 bytes from map-output for attempt_1390542135497_0221_m_000139_0
2014-02-01 01:03:39,774 INFO [fetcher#2] org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 2, inMemoryMapOutputs.size() -> 36, commitMemory -> 210506450, usedMemory ->504809370
2014-02-01 01:03:39,774 INFO [fetcher#2] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#2 about to shuffle output of map attempt_1390542135497_0221_m_000138_0 decomp: 2 len: 6 to MEMORY
2014-02-01 01:03:39,774 INFO [fetcher#2] org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 2 bytes from map-output for attempt_1390542135497_0221_m_000138_0
2014-02-01 01:03:39,774 INFO [fetcher#2] org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 2, inMemoryMapOutputs.size() -> 37, commitMemory -> 210506452, usedMemory ->504809372
2014-02-01 01:03:40,053 INFO [fetcher#2] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: bespin03b.umiacs.umd.edu:8080 freed by fetcher#2 in 12881s
2014-02-01 01:03:40,054 ERROR [main] org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:lijunhui (auth:SIMPLE) cause:org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#2
2014-02-01 01:03:40,055 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#2
	at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:121)
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:379)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152)
Caused by: java.lang.OutOfMemoryError: Java heap space
	at org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:58)
	at org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:45)
	at org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.<init>(InMemoryMapOutput.java:63)
	at org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.unconditionalReserve(MergeManagerImpl.java:297)
	at org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.reserve(MergeManagerImpl.java:287)
	at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:360)
	at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:295)
	at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:154)

2014-02-01 01:03:40,061 INFO [main] org.apache.hadoop.mapred.Task: Runnning cleanup for the task
2014-02-01 01:03:40,081 WARN [main] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: Could not delete hdfs://bespinrm.umiacs.umd.edu:8020/user/lijunhui/pipeline-zh-en-hiero-_fs_clip-scratch_lijunhui_joshua-v5.0_zh-en/collected/_temporary/1/_temporary/attempt_1390542135497_0221_r_000001_0
2014-02-01 01:03:40,188 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping ReduceTask metrics system...
2014-02-01 01:03:40,188 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ReduceTask metrics system stopped.
2014-02-01 01:03:40,188 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ReduceTask metrics system shutdown complete.



Junhui Li

unread,
Feb 6, 2014, 5:06:08 PM2/6/14
to joshua_...@googlegroups.com
Thanks all.

I got the above errors because it contains "(", "[" words in the input files. It's fixed now.

Junhui
Reply all
Reply to author
Forward
0 new messages