I'm new to EB, (not to Hadoop or Java dev).
I have bigger-than-blocksize JSON files whose records do not necessarily begin with a newline or similar marker, and newlines can be found separating regular lines too.
I am actually using Snappy as a compression format.
I am using the files in a series of map-reduce jobs that form an ETL.
No Hive/Pig requirements.
Actual Usage: The JSON has some header fields, and one of these fields comprises the actual array of records.
I would like to know if I can use or benefit from EB's JSON RecordReader and FileFormats without having to use LZO.
Would that be applicable to the needs presented in this case?
Thanks a lot
PS. I've been redirected here from reading Hadoop in Practice. I don't know how outdated that information may be :)