Hi Alexy,
How are you doing the Avro compression? It should get pretty small. I have a small example (using Pig rather than Scoobi but the compression parts are the same). Without specifying anything the Avro output size looks like this:
-rw-r--r-- 3 csevers gid-csevers 4821995 2012-07-03 17:51 /user/csevers/testavro2/part-m-00000.avro
If I add the following in Pig (I know there are regular Hadoop equivalents) SET avro.mapred.deflate.level 6;
SET mapred.output.compress true;, for the same input data I get this:
-rw-r--r-- 3 csevers gid-csevers 1468737 2012-07-03 17:48 /user/csevers/testavro/part-m-00000.avro
I don't know if this is possible right now in Scoobi. I think the Avro support in general needs to be slightly modified to be more generic.
Regards,
Chris