Does scalding-parquet library support reading in snappy compressed Parquet files?
I are trying to read in Parquet files of the form:
> hadoop jar parquet-tools-1.10.1.jar schema /my/path/part-00000.snappy.parquet
message spark_schema {
optional fixed_len_byte_array(8) fieldName1 (DECIMAL(18,0));
optional fixed_len_byte_array(2) fieldName2 (DECIMAL(4,0));
optional binary fieldName3 (UTF8);
}
I are using the following code:
val fields = new Fields("fieldName1","fieldName2","fieldName3")
ParquetTupleSource(fields, inputPath)
.read
.write(Tsv(outputPath))
The fieldName3 column output produces normal output that matches the input string, however, fieldName1 and fieldName2 columns produce garbage output. Does scalding-parquet library support snappy compressed Parquet files? Does it support reading fixed_len_byte_array type, how do I specify this in the TypedParquet setting?
Thank you for your help!
Best,
Yuri