Is there a way to create parquet file from xml/json input file without .avsc file and without impala

137 views
Skip to first unread message

srinivasarao daruna

unread,
Mar 29, 2016, 12:31:29 PM3/29/16
to CDK Development

I want to convert my input file (xml/json) to parquet. I have already have one solution that works with spark, and creates required parquet file.

However, due to other client requirements, i might need to create a solution that does not involve hadoop eco system such as hive, impala, spark or mapreduce.

And, Kite SDK is using .avsc file to create parquet data, kindly correct me if i am wrong. I might be short sighted but, looks like it needs avro schema file. So, is there any library that can create parquet file from self explanatory files such as xml or json.?


Note: If it feels like not a proper approach, i would like to understand the reasons why it is not a recommended approach, so that i can earn some knowledge or understand the areas that i might have missed.

Antwnis

unread,
Mar 29, 2016, 2:50:09 PM3/29/16
to srinivasarao daruna, CDK Development
You can prob use spark in local mode - that shouldn't require a hadoop ecosystem 

Another alternative is the `eel` tool - and it was build around use cases like the one you describe..

--
You received this message because you are subscribed to the Google Groups "CDK Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cdk-dev+u...@cloudera.org.
For more options, visit https://groups.google.com/a/cloudera.org/d/optout.

Reply all
Reply to author
Forward
0 new messages