Scoobi Sequence files in /tmp/scoobi-user directory not compressed?

25 views
Skip to first unread message

shil...@gmail.com

unread,
Oct 13, 2014, 3:06:16 PM10/13/14
to scoob...@googlegroups.com
In a multi-step job, Scoobi is writing the output of each step as sequence files to the /tmp/scoobi-<user>/ directory but these sequence files are not compressed. Is there a way to enable compression for these temporary sequence files?

Here are the parameters I see that scoobi sets:

mapreduce.output.fileoutputformat.outputdir /tmp/scoobi-user/ReportingProcess$-1009-192939-832207653/tmp-out-step_2_of_6
scoobi.output.213:405.format org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat
scoobi.output.213:405.key org.apache.hadoop.io.NullWritable
scoobi.output.213:405.value BSd857af22-5a4f-4cf2-b5b4-0c1a66abfb6f
scoobi.output.231:451.format org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat
scoobi.output.231:451.key org.apache.hadoop.io.NullWritable
scoobi.output.231:451.value BS6b1148ad-0c5e-4c07-bb3b-47f3165acaf3

The compression params are also set:
mapreduce.map.output.compress true job.xml ? mapred-site.xml
mapreduce.map.output.compress.codec com.hadoop.compression.lzo.LzoCodec job.xml ? mapred-site.xml
mapreduce.output.fileoutputformat.compress true job.xml ? mapred-site.xml
mapreduce.output.fileoutputformat.compress.codec org.apache.hadoop.io.compress.GzipCodec job.xml ? mapred-site.xml
mapreduce.output.fileoutputformat.compress.type BLOCK

Eric Torreborre

unread,
Oct 14, 2014, 9:34:56 PM10/14/14
to scoob...@googlegroups.com
There is no way to do that at the moment but that seems very doable.

I can't unfortunately spend much time on Scoobi at the moment but pull requests are welcome as usual :-)

and just call compressWith on the newly created bridge, then you should see those intermediate files being compressed.

Eric.
Reply all
Reply to author
Forward
0 new messages