JSON splittable FileFormat and RecordReader in non-LZO (Snappy)

15 views

Skip to first unread message

D Campo

unread,

Jul 23, 2014, 12:38:29 PM7/23/14

to elephant...@googlegroups.com

I'm new to EB, (not to Hadoop or Java dev).

I have bigger-than-blocksize JSON files whose records do not necessarily begin with a newline or similar marker, and newlines can be found separating regular lines too.

I am actually using Snappy as a compression format.

I am using the files in a series of map-reduce jobs that form an ETL.

No Hive/Pig requirements.

Actual Usage: The JSON has some header fields, and one of these fields comprises the actual array of records.

I have read in a previous thread reply in this forum that LZO is not required: https://groups.google.com/d/msg/elephantbird-dev/lD8lAt351VY/5mpcNkG8CDgJ

I would like to know if I can use or benefit from EB's JSON RecordReader and FileFormats without having to use LZO.

Would that be applicable to the needs presented in this case?

Thanks a lot

PS. I've been redirected here from reading Hadoop in Practice. I don't know how outdated that information may be :)

Reply all

Reply to author

Forward

0 new messages