Preserving original file format

14 views
Skip to first unread message

basam

unread,
Aug 17, 2010, 1:37:58 PM8/17/10
to Flume Users

Hey,
I have a simple flume setup for getting to know more about
flume. A single agent (a01) feeds a single collector (c01) which then
sinks to HDFS.

I am trying to transfer access logs from webservers over to hadoop. Is
there a way to preserve the original file format when the sink is
HDFS? The default format writes out hadoop sequence files. Is there
anyway to preserve these files as is in HDFS?

thanks,
Sridhar

Jonathan Hsieh

unread,
Aug 17, 2010, 2:02:12 PM8/17/10
to basam, Flume Users
Sridhar,

Right now the file name is .seq but the default is to write in an avrojson format.  Enough folks have asked for how to get raw/original data and it sounds like a good idea to make that the default output format.

add to flume-site.xml
----
<property>
    <name>flume.collector.output.format</name>
    <value>raw</value>
    <description>The output format for the data written by a Flume 
    collector node.  There are several formats available:
      syslog - outputs events in a syslog-like format
      log4j - outputs events in a pattern similar to Hadoop's log4j pattern 
      avrojson - this outputs data as json encoded by avro
      avrodata - this outputs data as a avro binary encoded data
      debug - used only for debugging
      raw - output only the event body, no metadata
    </description>
  </property>  
----
Jon
--
// Jonathan Hsieh (shay)
// j...@cloudera.com
Reply all
Reply to author
Forward
0 new messages