Re: Rotate files system in HDFS

99 views
Skip to first unread message

Gaurav Gupta

unread,
Apr 1, 2014, 6:34:43 PM4/1/14
to Marcelo Valle, camu...@googlegroups.com
Yup that’s the right way to go.
The generateParitionedPath function takes in a set of parameters and creates a path based on that.

In your case, you would need to pass on the Kafka message number also.
You might want to make that a part of the IEtlKey itself.

Gaurav

From: Marcelo Valle <mvall...@gmail.com>
Date: Tuesday, April 1, 2014 at 3:20 PM
To: Gaurav Gupta <ggu...@linkedin.com>
Subject: Re: Rotate files system in HDFS

Hi Gaurav,

Thanks for the reply
Yes, my idea is to create a HDFS file for each Kafka messages (thinking in big binary files like large images).

Exactly, to partition all the messages in different HDFS files by the Kafka message number or by the size of the HDFS file (for example save a set of 256MB HDFS Files)

I've been searching some information and it seems that the way to do this is by code

Am I in the right way?
Thanks!!

Marcelo.




2014-04-01 19:26 GMT+02:00 Gaurav Gupta <ggu...@linkedin.com>:
Hi Marcelo,

I am not really sure what you mean by rotate?

You want to create an HDFS file for each message. (You might want to consider this as it will create a lot of small files on HDFS)
These message are currently partitioned by time and you want this to be partitioned by the Kafka message number. Is that it?

Gaurav

From: Marcelo Valle <mvall...@gmail.com>
Date: Tuesday, April 1, 2014 at 2:37 AM
To: "camu...@googlegroups.com" <camu...@googlegroups.com>
Subject: Rotate files system in HDFS

Hello again community,

I'm triying to create one file in HDFS for each message in Kafka topic (my idea is to write binary files from Kafka directly in HDFS), but the rotate of HDFS files is done by the current time or for any batch (Camus job execution).

Is there any posibility to rotate files based in a number of Kafka Messages?
And in the same way, can we rotate the HDFS file by file size?

Thanks!

--
You received this message because you are subscribed to the Google Groups "Camus - Kafka ETL for Hadoop" group.
To unsubscribe from this group and stop receiving emails from it, send an email to camus_etl+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages