In a multi-step job, Scoobi is writing the output of each step as sequence files to the /tmp/scoobi-<user>/ directory but these sequence files are not compressed. Is there a way to enable compression for these temporary sequence files?
Here are the parameters I see that scoobi sets:
mapreduce.output.fileoutputformat.outputdir /tmp/scoobi-user/ReportingProcess$-1009-192939-832207653/tmp-out-step_2_of_6
scoobi.output.213:405.format org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat
scoobi.output.213:405.key org.apache.hadoop.io.NullWritable
scoobi.output.213:405.value BSd857af22-5a4f-4cf2-b5b4-0c1a66abfb6f
scoobi.output.231:451.format org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat
scoobi.output.231:451.key org.apache.hadoop.io.NullWritable
scoobi.output.231:451.value BS6b1148ad-0c5e-4c07-bb3b-47f3165acaf3
The compression params are also set:
mapreduce.map.output.compress true job.xml ? mapred-site.xml
mapreduce.map.output.compress.codec com.hadoop.compression.lzo.LzoCodec job.xml ? mapred-site.xml
mapreduce.output.fileoutputformat.compress true job.xml ? mapred-site.xml
mapreduce.output.fileoutputformat.compress.codec org.apache.hadoop.io.compress.GzipCodec job.xml ? mapred-site.xml
mapreduce.output.fileoutputformat.compress.type BLOCK