Configuring sub-hour bucket resolution

27 views
Skip to first unread message

tomas...@gmail.com

unread,
Mar 5, 2014, 3:33:09 PM3/5/14
to camu...@googlegroups.com
Is there an easy way to set Camus up so that the HDFS data is bucketed at temporal resolutions smaller than one hour? (Say, every 30 mins, or every 5.)

If not, is changing the Partitioner the way to go?

Thanks,

- Tomas

Ken Goodhope

unread,
Mar 5, 2014, 10:05:41 PM3/5/14
to tomas...@gmail.com, camu...@googlegroups.com
You can partition the files on smaller boundaries, but if you want
directories based on smaller buckets you will need to create a new
Partitioner.

Ken
>--
>You received this message because you are subscribed to the Google Groups
>"Camus - Kafka ETL for Hadoop" group.
>To unsubscribe from this group and stop receiving emails from it, send an
>email to camus_etl+...@googlegroups.com.
>For more options, visit https://groups.google.com/groups/opt_out.


Tomas Uribe

unread,
Mar 12, 2014, 3:25:37 PM3/12/14
to camu...@googlegroups.com, tomas...@gmail.com
Changing OUTPUT_DATE_FORMAT to YYYY/MM/dd/HH/mm in the default Partitioner, and setting 

etl.output.file.time.partition.mins to a value smaller than 60, seems to work --- can bucket by 5, 10, 15 minutes this way.

Thanks,

- Tomas

Zhu Wayne

unread,
Apr 7, 2014, 9:16:03 AM4/7/14
to camu...@googlegroups.com, tomas...@gmail.com
Since Hive can't take location having sub directories, you can't have hourly Hive partitions then. You have to create sub-hour partitions for user to query in Hive.
Reply all
Reply to author
Forward
0 new messages