John--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To view this discussion on the web visit https://groups.google.com/d/msg/cascading-user/-/L46JE-L3LDUJ.
To post to this group, send email to cascadi...@googlegroups.com.
To unsubscribe from this group, send email to cascading-use...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/cascading-user?hl=en.
i noticed that scalding in general ignores the compression setting in our mapred-site.xml
we are using 0.8.1
On Wed, Nov 14, 2012 at 7:19 PM, Oscar Boykin <> wrote:
What version of scalding are you using?I expect this to work too, so I'm a little confused.
On Wed, Nov 14, 2012 at 3:20 AM, John <> wrote:
Hello,I am struggling with a simple task : I would like to compress the entire Tsv output of a scalding job.Basically, the equivalent of this plain old mapreduce job configuration:conf.setOutputFormat(TextOutputFormat.class);TextOutputFormat.setCompressOutput(conf, true);TextOutputFormat.setOutputCompressorClass(conf, BZip2Codec.class);I tried to override the config method in my scalding job like this:override def config(implicit mode: Mode) = super.config ++ Map("mapred.output.compress" -> "true", "mapred.output.compression.codec" -> "org.apache.hadoop.io.compress.BZip2Codec")But it somehow gets ignored and when I look in the job configuration through the jobtracker web interface it says mapred.output.compress = falseAny help would be greatly appreciated, thanks in advance
John--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To view this discussion on the web visit https://groups.google.com/d/msg/cascading-user/-/L46JE-L3LDUJ.
To post to this group, send email to cascading-user@googlegroups.com.
To unsubscribe from this group, send email to cascading-user+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/cascading-user?hl=en.
--
Oscar Boykin :: @posco :: https://twitter.com/intent/user?screen_name=posco
--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To post to this group, send email to cascading-user@googlegroups.com.
To unsubscribe from this group, send email to cascading-user+unsubscribe@googlegroups.com.
i ran into this issue again, and realized its because cascading's TextLine and TextDelimited have by default sinkCompression set to Compress.DISABLE, which means they override whatever was set in the jobConf. i find this confusing, however i dont think this will change in Cascading.
how about if in scalding we have all the Source classes that use TextLine or TextDelimited set sinkCompression to Compress.DEFAULT? that way i think (have to check) the sinks will respect mapred.compress.output
On Wednesday, November 14, 2012 8:19:12 PM UTC-5, Koert wrote:
i noticed that scalding in general ignores the compression setting in our mapred-site.xml
we are using 0.8.1
On Wed, Nov 14, 2012 at 7:19 PM, Oscar Boykin <> wrote:
What version of scalding are you using?I expect this to work too, so I'm a little confused.
On Wed, Nov 14, 2012 at 3:20 AM, John <> wrote:
Hello,I am struggling with a simple task : I would like to compress the entire Tsv output of a scalding job.Basically, the equivalent of this plain old mapreduce job configuration:conf.setOutputFormat(TextOutputFormat.class);TextOutputFormat.setCompressOutput(conf, true);TextOutputFormat.setOutputCompressorClass(conf, BZip2Codec.class);I tried to override the config method in my scalding job like this:override def config(implicit mode: Mode) = super.config ++ Map("mapred.output.compress" -> "true", "mapred.output.compression.codec" -> "org.apache.hadoop.io.compress.BZip2Codec")But it somehow gets ignored and when I look in the job configuration through the jobtracker web interface it says mapred.output.compress = falseAny help would be greatly appreciated, thanks in advance
John--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To view this discussion on the web visit https://groups.google.com/d/msg/cascading-user/-/L46JE-L3LDUJ.
To post to this group, send email to cascadi...@googlegroups.com.
To unsubscribe from this group, send email to cascading-use...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/cascading-user?hl=en.
--
Oscar Boykin :: @posco :: https://twitter.com/intent/user?screen_name=posco
--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To post to this group, send email to cascadi...@googlegroups.com.
To unsubscribe from this group, send email to cascading-use...@googlegroups.com.
HadoopSchemeInstance(new CHTextDelimited(fields, null, skipHeader, writeHeader, separator, strict, quote, types, safe))
Method setSinkCompression sets the sinkCompression of this TextLine object. If null, compression will remain disabled.