Hello all,
I'm using 4.6rc6, and I'm running in to an issue with LZO w/ CFIF.
If all the files are smaller than my combine size, then everything works as expected.
_I Believe_ the problem occurs if one of the input files is larger than my target Split Size.
In my case I targeted 256MB, and some of the files are 330MB.
I either get the lzo -6 error, or it appears that I lose the newline, and pick up on the next line, which Cascading catches for me.
I'm working on a test case...
cascading.flow.FlowException: internal error during mapper execution
at cascading.flow.hadoop.FlowMapper.run(FlowMapper.java:148)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:348)
at org.apache.hadoop.mapred.Child$4.run(Child.java:282)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1117)
at org.apache.hadoop.mapred.Child.main(Child.java:271)
Caused by: java.lang.InternalError: lzo1x_decompress_safe returned: -6
at com.hadoop.compression.lzo.LzoDecompressor.decompressBytesDirect(Native Method)
at com.hadoop.compression.lzo.LzoDecompressor.decompress(LzoDecompressor.java:315)
at com.hadoop.compression.lzo.LzopDecompressor.decompress(LzopDecompressor.java:122)
at com.hadoop.compression.lzo.LzopInputStream.decompress(LzopInputStream.java:247)
at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:77)
at java.io.InputStream.read(InputStream.java:101)
at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:205)
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:169)
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:294)
at util.lzo.LzoLineRecordReader.nextKeyValue(LzoLineRecordReader.java:79)
at com.uss.utils.lzo.CompositeRecordReader.nextKeyValue(CompositeRecordReader.java:84)
at util.lzo.DeprecatedInputFormatWrapper$RecordReaderWrapper.next(DeprecatedInputFormatWrapper.java:330)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:227)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:212)
at cascading.tap.hadoop.util.MeasuredRecordReader.next(MeasuredRecordReader.java:61)
at cascading.scheme.hadoop.TextDelimited.source(TextDelimited.java:1005)
at cascading.tuple.TupleEntrySchemeIterator.getNext(TupleEntrySchemeIterator.java:163)
at cascading.tuple.TupleEntrySchemeIterator.hasNext(TupleEntrySchemeIterator.java:136)
at cascading.flow.stream.SourceStage.map(SourceStage.java:76)
at cascading.flow.stream.SourceStage.run(SourceStage.java:58)
at cascading.flow.hadoop.FlowMapper.run(FlowMapper.java:130)
... 7 more
cascading.tuple.TupleException: unable to read from input identifier: 'unknown'
at cascading.tuple.TupleEntrySchemeIterator.hasNext(TupleEntrySchemeIterator.java:149)
at cascading.flow.stream.SourceStage.map(SourceStage.java:76)
at cascading.flow.stream.SourceStage.run(SourceStage.java:58)
at cascading.flow.hadoop.FlowMapper.run(FlowMapper.java:130)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:348)
at org.apache.hadoop.mapred.Child$4.run(Child.java:282)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1117)
at org.apache.hadoop.mapred.Child.main(Child.java:271)
Caused by: cascading.tap.TapException: did not parse correct number of values from input data, expected: 8, got: 14:ACCOUNT:123735572,15:42:06,20131211,1,EMAIL_OPEN_SALE,LC,12.11.13 lc 2TWACCOUNT:100883092,12:02:18,20140111,1,EMAIL_SEND_CORP_OTHER,NM,01.11.14 nm johnny was - remainder,
at cascading.scheme.util.DelimitedParser.onlyParseLine(DelimitedParser.java:404)
at cascading.scheme.util.DelimitedParser.parseLine(DelimitedParser.java:341)
at cascading.scheme.hadoop.TextDelimited.source(TextDelimited.java:1015)
at cascading.tuple.TupleEntrySchemeIterator.getNext(TupleEntrySchemeIterator.java:163)
at cascading.tuple.TupleEntrySchemeIterator.hasNext(TupleEntrySchemeIterator.java:136)
... 10 more