FAILED_TO_UNCOMPRESS(5) with Snappy on Hadoop pipeline - how to debug?

270 views
Skip to first unread message

Ye ilho

unread,
Aug 4, 2020, 3:42:11 PM8/4/20
to Xerial

Hi, I am getting FAILED_TO_UNCOMPRESS(5) error when trying to decompress one of thrift field that we set earlier in the pipeline. We run hadoop job to read these data and it runs into the exception shown in the following callstack:

org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:98) org.xerial.snappy.SnappyNative.rawUncompress(Native Method) org.xerial.snappy.Snappy.rawUncompress(Snappy.java:474) org.xerial.snappy.Snappy.uncompress(Snappy.java:513) org.xerial.snappy.SnappyInputStream.readFully(SnappyInputStream.java:147) org.xerial.snappy.SnappyInputStream.readHeader(SnappyInputStream.java:99) org.xerial.snappy.SnappyInputStream.<init>(SnappyInputStream.java:59)

The code that throws an exception is this line:

SnappyInputStream inputStream = new SnappyInputStream(byteArrayInputStream);

The parent thrift has been serialized/deserialized using binary thrift protocol. I can read all the other fields fine. I have tried to debug this issue but because the code that throws an exception is in native C++ code, it seems I am not able to step into those from IntelliJ.

What are the options to debug this, please let me know! I am trying to force to use pure jave but I am having some other trouble at the moment. If any of you have seen issues like in the hadoop pipeline, please let me know.

Thanks!

Ilho Ye

unread,
Aug 4, 2020, 5:44:36 PM8/4/20
to Xerial
I managed to use pure java and saw some more messages. I am seeing PARSING_ERROR from following callstack:

org.xerial.snappy.SnappyError: [PARSING_ERROR] position: 5

at org.xerial.snappy.pure.SnappyRawDecompressor.uncompressAll(SnappyRawDecompressor.java:155)


Unfortunately, this still does not give me insight to understand the underlying problem. This issue happens on one hadoop cluster consistently so I would imagine there is something that's not compatible but can't tell what it is...
As a side note, it looks like length from the byte array is far smaller than actual size of the buffer. Also byte array from the thrift field is much bigger than usual size. i.e. if we compare the input byte array, the one on hadoop seems much bigger.

Reply all
Reply to author
Forward
0 new messages