JSON splittable FileFormat and RecordReader in non-LZO (Snappy)

15 views
Skip to first unread message

D Campo

unread,
Jul 23, 2014, 12:38:29 PM7/23/14
to elephant...@googlegroups.com
I'm new to EB, (not to Hadoop or Java dev).
I have bigger-than-blocksize JSON files whose records do not necessarily begin with a newline or similar marker, and newlines can be found separating regular lines too.
I am actually using Snappy as a compression format.
I am using the files in a series of map-reduce jobs that form an ETL. 
No Hive/Pig requirements.
Actual Usage: The JSON has some header fields, and one of these fields comprises the actual array of records.

I have read in a previous thread reply in this forum that LZO is not required: https://groups.google.com/d/msg/elephantbird-dev/lD8lAt351VY/5mpcNkG8CDgJ 
I would like to know if I can use or benefit from EB's JSON RecordReader and FileFormats without having to use LZO.
Would that be applicable to the needs presented in this case?

Thanks a lot

PS. I've been redirected here from reading Hadoop in Practice. I don't know how outdated that information may be :)
Reply all
Reply to author
Forward
0 new messages