rg.apache.thrift.transport.TTransportException
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:362)
at org.apache.thrift.protocol.TBinaryProtocol.readByte(TBinaryProtocol.java:251)
at org.apache.thrift.protocol.TBinaryProtocol.readFieldBegin(TBinaryProtocol.java:215)
at streamcorpus.StreamItem$StreamItemStandardScheme.read(StreamItem.java:1496)
at streamcorpus.StreamItem$StreamItemStandardScheme.read(StreamItem.java:1489)
at streamcorpus.StreamItem.read(StreamItem.java:1329)
at test.ReadThrift.main(ReadThrift.java:28)
I know my question is a duplicate of this post
https://groups.google.com/forum/#!topic/streamcorpus/u8oNK3CqiCs but I couldn't find any solution to this problem and the script on git doesn't seem to have been updated. Is there any workaround to extract the text content out of these files, or maybe an equivalent class in Python?