Re: CombineFiles + Lzo continued

Chris K Wensel

unread,

Dec 13, 2013, 10:17:08 PM12/13/13

to cascadi...@googlegroups.com

I can't comment much on it.

the code is a contrib, and the contributors are now realizing there are some core flaws to the CIF classes in Hadoop.

we are pondering next steps to overcome the issues.

My recommendation, unless there becomes an obvious path to make it work, it to stop trying for now.

fwiw, this is why we rarely take patches.

i'm very sorry for the inconveniences.

ckw

On Dec 10, 2013, at 11:42 AM, Jeremy Davis <jdavis....@gmail.com> wrote:

Has anyone run in to this exception, or can shed any light on it?

It happened after I finally got CIF working, and the job starts and I do see it has way less Map tasks (So success there!)

I'm currently straddling this list and elephantbird-dev, and I almost have this thing working…

cascading.flow.FlowException: internal error during mapper execution
	at cascading.flow.hadoop.FlowMapper.run(FlowMapper.java:142)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:418)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
	at org.apache.hadoop.mapred.Child.main(Child.java:264)
Caused by: cascading.flow.stream.DuctException: failure resolving tuple entry
	at cascading.flow.stream.TrapHandler.handleException(TrapHandler.java:139)
	at cascading.flow.stream.TrapHandler.handleException(TrapHandler.java:115)
	at cascading.flow.stream.ElementStage.handleException(ElementStage.java:145)
	at cascading.flow.stream.SourceStage.map(SourceStage.java:93)
	at cascading.flow.stream.SourceStage.run(SourceStage.java:58)
	at cascading.flow.hadoop.FlowMapper.run(FlowMapper.java:127)
	... 7 more
Caused by: cascading.tuple.TupleException: unable to read from input identifier: 'unknown'
	at cascading.tuple.TupleEntrySchemeIterator.hasNext(TupleEntrySchemeIterator.java:127)
	at cascading.flow.stream.SourceStage.map(SourceStage.java:76)
	... 9 more
Caused by: java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
	at org.apache.hadoop.mapred.lib.CombineFileRecordReader.initNextRecordReader(CombineFileRecordReader.java:157)
	at org.apache.hadoop.mapred.lib.CombineFileRecordReader.next(CombineFileRecordReader.java:62)
	at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:228)
	at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:213)
	at cascading.tap.hadoop.util.MeasuredRecordReader.next(MeasuredRecordReader.java:61)
	at cascading.scheme.hadoop.TextDelimited.source(TextDelimited.java:998)
	at cascading.tuple.TupleEntrySchemeIterator.getNext(TupleEntrySchemeIterator.java:140)
	at cascading.tuple.TupleEntrySchemeIterator.hasNext(TupleEntrySchemeIterator.java:120)
	... 10 more
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
	at org.apache.hadoop.mapred.lib.CombineFileRecordReader.initNextRecordReader(CombineFileRecordReader.java:149)
	... 17 more
Caused by: java.io.IOException: Compressed length 1970168680 exceeds max block size 67108864 (probably corrupt file)
	at com.hadoop.compression.lzo.LzopInputStream.getCompressedData(LzopInputStream.java:286)
	at com.hadoop.compression.lzo.LzopInputStream.decompress(LzopInputStream.java:256)
	at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:77)
	at java.io.InputStream.read(InputStream.java:101)
	at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:205)
	at org.apache.hadoop.util.LineReader.readLine(LineReader.java:169)
	at org.apache.hadoop.util.LineReader.readLine(LineReader.java:304)
	at com.hadoop.mapred.DeprecatedLzoLineRecordReader.<init>(DeprecatedLzoLineRecordReader.java:64)
	at com.hadoop.mapred.DeprecatedLzoTextInputFormat.getRecordReader(DeprecatedLzoTextInputFormat.java:158)
	at cascading.tap.hadoop.io.CombineFileRecordReaderWrapper.<init>(CombineFileRecordReaderWrapper.java:61)
	... 22 more

--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at http://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/189873a1-1f66-457c-8551-390f4e0d9642%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--

Chris K Wensel

ch...@concurrentinc.com

http://concurrentinc.com

Jeremy Davis

unread,

Dec 16, 2013, 2:00:51 PM12/16/13

to cascadi...@googlegroups.com

No inconvenience really,

just casting as wide a net as possible to pull in any hints/help. (also on elephantbird-dev).

I'm right there with these people trying to get CIF to work. It will be a big improvement for us.

-JD

schit...@marketshare.com

unread,

Aug 25, 2015, 5:57:10 AM8/25/15

to cascading-user

Why the following error comes?

2015-08-25 02:36:07,811 INFO com.twitter.elephantbird.mapreduce.input.LzoRecordReader: input split: maprfs:/attribution/H11001/etl/input_mms_staged/tellapart/PB_impress

ions_report_20150604.txt.lzo 134313398:193146564

2015-08-25 02:36:07,828 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor

2015-08-25 02:36:07,867 WARN com.hadoop.compression.lzo.LzopInputStream: IOException in getCompressedData; likely LZO corruption.

java.io.IOException: Compressed length 1366582898 exceeds max block size 67108864 (probably corrupt file)

at com.hadoop.compression.lzo.LzopInputStream.getCompressedData(LzopInputStream.java:286)

at com.hadoop.compression.lzo.LzopInputStream.decompress(LzopInputStream.java:256)

at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:77)

at java.io.InputStream.read(InputStream.java:101)

at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:205)

at org.apache.hadoop.util.LineReader.readLine(LineReader.java:169)

at org.apache.hadoop.util.LineReader.readLine(LineReader.java:294)

at com.twitter.elephantbird.mapreduce.input.LzoLineRecordReader.skipToNextSyncPoint(LzoLineRecordReader.java:64)

at com.twitter.elephantbird.mapreduce.input.LzoRecordReader.initialize(LzoRecordReader.java:97)

at com.twitter.elephantbird.mapreduce.input.combine.CompositeRecordReader$DelayedRecordReader.createRecordReader(CompositeRecordReader.java:72)

at com.twitter.elephantbird.mapreduce.input.combine.CompositeRecordReader.nextKeyValue(CompositeRecordReader.java:120)

at com.twitter.elephantbird.mapred.input.DeprecatedInputFormatWrapper$RecordReaderWrapper.initKeyValueObjects(DeprecatedInputFormatWrapper.java:271)

at com.twitter.elephantbird.mapred.input.DeprecatedInputFormatWrapper$RecordReaderWrapper.createKey(DeprecatedInputFormatWrapper.java:291)

at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.createKey(MapTask.java:203)

at cascading.tap.hadoop.util.MeasuredRecordReader.createKey(MeasuredRecordReader.java:76)

at com.uss.utils.genericparser.GenericTextDelimited.sourcePrepare(GenericTextDelimited.java:119)

at cascading.tuple.TupleEntrySchemeIterator.<init>(TupleEntrySchemeIterator.java:107)

at cascading.tap.hadoop.io.HadoopTupleEntrySchemeIterator.<init>(HadoopTupleEntrySchemeIterator.java:49)

at cascading.tap.hadoop.io.HadoopTupleEntrySchemeIterator.<init>(HadoopTupleEntrySchemeIterator.java:44)

at cascading.tap.hadoop.Hfs.openForRead(Hfs.java:518)

Reply all

Reply to author

Forward