Hi,
Now I am trying to perform the SAMT system on a training data with about 2 million sentence pairs. The average length is about 13 words and 15 words for the two languages respectively. Using this corpus, I encounter a problem of OutOfMemoryError. The error information is as follows:
3768 13/03/07 04:13:48 INFO mapred.MapTask: record buffer = 262144/327680
3769 13/03/07 04:13:48 INFO compress.CodecPool: Got brand-new decompressor
3770 13/03/07 04:13:48 INFO mapred.MapTask: io.sort.mb = 100
3771 13/03/07 04:13:52 WARN mapred.LocalJobRunner: job_local_0005
3772 java.lang.OutOfMemoryError: Java heap space
3773 at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:781)
3774 at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:524)
3775 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613)
3776 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
3777 at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
3778 13/03/07 04:13:55 INFO mapred.LocalJobRunner:
3779 13/03/07 04:13:55 INFO mapred.LocalJobRunner:
3780 13/03/07 04:13:55 INFO mapred.LocalJobRunner:
After this error, the system is still running. And encounter another OutOfMemoryError:
3825 13/03/07 04:14:44 INFO mapred.LocalJobRunner:
3826 13/03/07 04:14:46 WARN mapred.LocalJobRunner: job_local_0004
3827 java.lang.OutOfMemoryError: GC overhead limit exceeded
3828 at edu.jhu.thrax.hadoop.datatypes.AlignmentArray.readFields(AlignmentArray.java:47)
3829 at edu.jhu.thrax.hadoop.datatypes.RuleWritable.readFields(RuleWritable.java:99)
3830 at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerializati on.java:67)
3831 at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerializati on.java:40)
3832 at org.apache.hadoop.io.SequenceFile$Reader.deserializeKey(SequenceFile.java:2102)
3833 at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2068)
3834 at org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.nextKeyValue(SequenceFileRecordReader.java: 68)
3835 at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:423)
3836 at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
3837 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
3838 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
3839 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
3840 at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
3841 13/03/07 04:14:50 INFO mapred.LocalJobRunner:
3842 13/03/07 04:14:50 INFO mapred.LocalJobRunner:
I try to solve this problem by changing the memory available for hadoop from 2g to 70g and then to 130g. However, the system still fails to get the SAMT grammar because of this error.
Is the memory really the reason that triggers this error? If so, could I resolve this problem without buying a larger memory?
Thank you very much!
Best.
wsknow
--
You received this message because you are subscribed to the Google Groups "Joshua Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to joshua_suppor...@googlegroups.com.
To post to this group, send email to joshua_...@googlegroups.com.
Visit this group at http://groups.google.com/group/joshua_support?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.