Sinking to parquet files

23 views
Skip to first unread message

David Kincaid

unread,
Jun 14, 2016, 8:36:06 PM6/14/16
to cascalog-user
Does anyone have an example of sinking a Cascalog query output to Parquet files? I'm especially interested in how one would sink a nested data structure using a Cascalog query. Is anyone doing this now?

Thanks,

Dave

Andy Xue

unread,
Jun 21, 2016, 3:01:31 PM6/21/16
to cascalog-user
+1 to this question

Sam Ritchie

unread,
Jun 21, 2016, 3:23:07 PM6/21/16
to cascal...@googlegroups.com
There are a few other libraries out there that can help with this:


Haven't used them myself, but I'd start here!

--
You received this message because you are subscribed to the Google Groups "cascalog-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascalog-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Sam Ritchie, Stripe Inc.

(Too brief? Here's why! http://emailcharter.org)

David Kincaid

unread,
Jun 26, 2016, 2:32:50 PM6/26/16
to cascalog-user
Thanks, Sam. I ended up using the ParquetTBaseScheme from parquet-mr as a guide and created my own ParquetAvroScheme (https://github.com/dkincaid/cascading-avro-parquet). I only tested it as a sink since that's all I needed right now and it's only setup to use Avro GenericRecord's, but at least it's something that works. The ones from Datasio and Pulseio are too dated to work and the example of ParquetTBaseScheme really made it pretty easy to implement.

- Dave
Reply all
Reply to author
Forward
0 new messages