Scoobi generated Intermediate files

23 views
Skip to first unread message

shil...@gmail.com

unread,
Oct 6, 2014, 2:20:38 PM10/6/14
to scoobi...@googlegroups.com
For one of our jobs  we are reading avro snappy compressed data of 4.5 TB and applying a filter to this data. We expect the filtered output written out to be much less than the input size but we see that Scoobi is writing out an intermediate sequence file of 8 TB to the /tmp/scoobi-user/ directory. 

Why is the intermediate data size double that of the input size inspite of a filter being applied? Is this expected behavior from Scoobi? 



Reply all
Reply to author
Forward
0 new messages