Writing data in Orc format ..

254 views
Skip to first unread message

krishna tiwari

unread,
Mar 7, 2016, 4:05:03 PM3/7/16
to gobblin-users
Hello,

I have my input data in text, I have some logic in converter to modify the data, currently I am writing the output in text and everything is fine.
Now i have to modify the outputformat from text to Orc Format, are there any useful pointers, i should consider?

krishna tiwari

unread,
Mar 7, 2016, 5:23:54 PM3/7/16
to gobblin-users
I tried to set the below conf for the branch 

writer.builder.class.0=gobblin.writer.HiveWritableHdfsDataWriterBuilder

writer.writable.class.0=org.apache.hadoop.hive.ql.io.orc.OrcSerde

writer.output.format.class.0=org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat


but the MR job errors out with the following error 


ata records

java.io.IOException: java.util.concurrent.ExecutionException: java.io.IOException: Failed to create writer

at gobblin.writer.PartitionedDataWriter.write(PartitionedDataWriter.java:110)

Ziyang Liu

unread,
Mar 9, 2016, 8:09:55 PM3/9/16
to gobblin-users
Hi Krishna, if your input is CSV and output is ORC, you can try using the HiveSerDeConverter to convert from CSV to ORC, using the following properties:

serde.serializer.type=ORC

serde.deserializer.type=org.apache.hadoop.hive.serde2.OpenCSVSerde

serde.deserializer.input.format.type=org.apache.hadoop.mapred.TextInputFormat

serde.deserializer.output.format.type=org.apache.hadoop.mapred.TextOutputFormat


If the converter successfully converts the record, the HiveWritableHdfsDataWriter should be able to write it.

Reply all
Reply to author
Forward
0 new messages