--
You received this message because you are subscribed to the Google Groups "Joshua Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to joshua_suppor...@googlegroups.com.
To post to this group, send email to joshua_...@googlegroups.com.
Visit this group at http://groups.google.com/group/joshua_support.
For more options, visit https://groups.google.com/groups/opt_out.
Hi,What version of Joshua are you using?
You shouldn't have to use --hadoop at all. Just set the environment variable $HADOOP and it will be automatically detected.
Hi Junhui,The error itself seems to be a type conflict. While Joshua is built against 0.22, I haven't had any trouble running Thrax with Hadoop 1.0.Setting $HADOOP to /usr seems incorrect though – you'll want to set it to the actual Hadoop directory, and not the enclosing one.
I haven't followed the changes in Hadoop, but if using 1.0 is an option for you I'd recommend switching to that. If you are stuck with 2.0 and the conflicts are limited to the counters (as the errors suggest), you can quite easily comment those out in the source code. The counters provide diagnostic information, and are not essential to the grammar extraction.
-output for attempt_1390542135497_0134_m_000263_0 2014-01-30 13:58:35,509 INFO [fetcher#9] org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 7414066, inMemoryMapOutputs.size() -> 19, commitMemory -> 109146813, usedMemory ->474594763 2014-01-30 13:58:35,511 INFO [fetcher#9] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#9 about to shuffle output of map attempt_1390542135497_0134_m_000283_0 decomp: 7498823 len: 7498827 to MEMORY 2014-01-30 13:58:35,544 INFO [fetcher#2] org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 7460535 bytes from map-output for attempt_1390542135497_0134_m_000178_0 2014-01-30 13:58:35,544 INFO [fetcher#2] org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 7460535, inMemoryMapOutputs.size() -> 20, commitMemory -> 116560879, usedMemory ->482093586 2014-01-30 13:58:35,818 INFO [fetcher#2] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: bespin04b.umiacs.umd.edu:8080 freed by fetcher#2 in 3397s 2014-01-30 13:58:35,818 ERROR [main] org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:lijunhui (auth:SIMPLE) cause:org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#2 2014-01-30 13:58:35,820 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#2 at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:121) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:379) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152) Caused by: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:58) at org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:45) at org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.<init>(InMemoryMapOutput.java:63) at org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.unconditionalReserve(MergeManagerImpl.java:297) at org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.reserve(MergeManagerImpl.java:287) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:360) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:295) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:154) 2014-01-30 13:58:35,822 INFO [main] org.apache.hadoop.mapred.Task: Runnning cleanup for the task 2014-01-30 13:58:35,832 INFO [fetcher#9] org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 7498823 bytes from map-output for attempt_1390542135497_0134_m_000283_0 2014-01-30 13:58:35,833 INFO [fetcher#9] org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 7498823, inMemoryMapOutputs.size() -> 21, commitMemory -> 124021414, usedMemory ->604052459 2014-01-30 13:58:35,833 INFO [fetcher#9] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: bespin04a.umiacs.umd.edu:8080 freed by fetcher#9 in 3412s 2014-01-30 13:58:35,838 WARN [main] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: Could not delete hdfs://bespinrm.umiacs.umd.edu:8020/user/lijunhui/pipeline-zh-en-hiero-_fs_clip-scratch_lijunhui_joshua-v5.0_zh-en/collected/_temporary/1/_temporary/attempt_1390542135497_0134_r_000013_1 2014-01-30 13:58:35,940 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping ReduceTask metrics system... 2014-01-30 13:58:35,941 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ReduceTask metrics system stopped. 2014-01-30 13:58:35,941 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ReduceTask metrics system shutdown complete.
I suppose the error is due to out of memory. Since I have already set "--hadoop-mem 120g", doesn't that mean 120g is still not sufficient or I need to set the memory requirement somewhere else?
Thanks,
Junhui
I am still got stuck at the "Collect" step. The map jobs are sucessful, but the reduce jobs are died with the following errors.
Can someone help me?
Thanks,
-----------------
Log Type: stderr
Log Length: 0
Log Type: stdout
Log Length: 0
Log Type: syslog
Log Length: 2631683
Showing 4096 bytes of 2631683 total. Click here for the full log.
put for attempt_1390542135497_0221_m_000288_0 2014-02-01 01:03:39,774 INFO [fetcher#2] org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 7199194, inMemoryMapOutputs.size() -> 35, commitMemory -> 203307256, usedMemory ->504809368 2014-02-01 01:03:39,774 INFO [fetcher#2] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#2 about to shuffle output of map attempt_1390542135497_0221_m_000139_0 decomp: 2 len: 6 to MEMORY 2014-02-01 01:03:39,774 INFO [fetcher#2] org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 2 bytes from map-output for attempt_1390542135497_0221_m_000139_0 2014-02-01 01:03:39,774 INFO [fetcher#2] org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 2, inMemoryMapOutputs.size() -> 36, commitMemory -> 210506450, usedMemory ->504809370 2014-02-01 01:03:39,774 INFO [fetcher#2] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#2 about to shuffle output of map attempt_1390542135497_0221_m_000138_0 decomp: 2 len: 6 to MEMORY 2014-02-01 01:03:39,774 INFO [fetcher#2] org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 2 bytes from map-output for attempt_1390542135497_0221_m_000138_0 2014-02-01 01:03:39,774 INFO [fetcher#2] org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 2, inMemoryMapOutputs.size() -> 37, commitMemory -> 210506452, usedMemory ->504809372 2014-02-01 01:03:40,053 INFO [fetcher#2] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: bespin03b.umiacs.umd.edu:8080 freed by fetcher#2 in 12881s 2014-02-01 01:03:40,054 ERROR [main] org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:lijunhui (auth:SIMPLE) cause:org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#2 2014-02-01 01:03:40,055 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#2 at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:121) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:379) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152) Caused by: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:58) at org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:45) at org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.<init>(InMemoryMapOutput.java:63) at org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.unconditionalReserve(MergeManagerImpl.java:297) at org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.reserve(MergeManagerImpl.java:287) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:360) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:295) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:154) 2014-02-01 01:03:40,061 INFO [main] org.apache.hadoop.mapred.Task: Runnning cleanup for the task 2014-02-01 01:03:40,081 WARN [main] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: Could not delete hdfs://bespinrm.umiacs.umd.edu:8020/user/lijunhui/pipeline-zh-en-hiero-_fs_clip-scratch_lijunhui_joshua-v5.0_zh-en/collected/_temporary/1/_temporary/attempt_1390542135497_0221_r_000001_0 2014-02-01 01:03:40,188 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping ReduceTask metrics system... 2014-02-01 01:03:40,188 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ReduceTask metrics system stopped. 2014-02-01 01:03:40,188 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ReduceTask metrics system shutdown complete.