[Scalding] Filter predicate pushdown with parquet/avro

330 views
Skip to first unread message

Ishaaq Chandy

unread,
Mar 12, 2015, 8:26:11 AM3/12/15
to cascadi...@googlegroups.com
Hi all,
So, I've managed to get scalding to process parquet/avro files. This was primarily using code that Oleksii Iepishkin forked here: https://github.com/epishkin/scalding/tree/parquet_avro/scalding-parquet (thanks Oleksii!! by the way, are there any plans to merge this code back into scalding?)

While I've been able to use projections to reduce the number of fields read, what I haven't worked out how to do is filter predicate push-down to reduce the number of rows iterated over. Any pointers/examples?

Thanks,
Ishaaq

Neville Li

unread,
Mar 12, 2015, 9:50:16 AM3/12/15
to cascadi...@googlegroups.com
You want to build a FilterPredicate expression tree with this API:

Here's also a library that uses macros to compile scala lambda into projections and predicates: https://github.com/nevillelyh/parquet-avro-extra

--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at http://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/CAMXp9Vv_HEbTNi9yyn11_j9QtgSp4qAU-H3YchYdV9hgXKQzzg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages