Sinking to parquet files

Skip to first unread message

David Kincaid

Jun 14, 2016, 8:36:06 PM6/14/16
to cascalog-user
Does anyone have an example of sinking a Cascalog query output to Parquet files? I'm especially interested in how one would sink a nested data structure using a Cascalog query. Is anyone doing this now?



Andy Xue

Jun 21, 2016, 3:01:31 PM6/21/16
to cascalog-user
+1 to this question

Sam Ritchie

Jun 21, 2016, 3:23:07 PM6/21/16
There are a few other libraries out there that can help with this:

Haven't used them myself, but I'd start here!

You received this message because you are subscribed to the Google Groups "cascalog-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
For more options, visit

Sam Ritchie, Stripe Inc.

(Too brief? Here's why!

David Kincaid

Jun 26, 2016, 2:32:50 PM6/26/16
to cascalog-user
Thanks, Sam. I ended up using the ParquetTBaseScheme from parquet-mr as a guide and created my own ParquetAvroScheme ( I only tested it as a sink since that's all I needed right now and it's only setup to use Avro GenericRecord's, but at least it's something that works. The ones from Datasio and Pulseio are too dated to work and the example of ParquetTBaseScheme really made it pretty easy to implement.

- Dave
Reply all
Reply to author
0 new messages