Scoobi generated Intermediate files

24 views

Skip to first unread message

shil...@gmail.com

unread,

Oct 6, 2014, 2:20:38 PM10/6/14

to scoobi...@googlegroups.com

For one of our jobs we are reading avro snappy compressed data of 4.5 TB and applying a filter to this data. We expect the filtered output written out to be much less than the input size but we see that Scoobi is writing out an intermediate sequence file of 8 TB to the /tmp/scoobi-user/ directory.

Why is the intermediate data size double that of the input size inspite of a filter being applied? Is this expected behavior from Scoobi?

Reply all

Reply to author

Forward

0 new messages