Spill failed when tweaking parameters

355 views
Skip to first unread message

jd

unread,
Oct 4, 2012, 2:07:06 PM10/4/12
to cascadi...@googlegroups.com
This is an odd one, and I'm not sure if cascading is necessarily to blame, but maybe someone has some insight.
Recently I was tweaking these params:
io.sort.record.percent
io.sort.spill.percent
io.sort.facto
io.sort.mb

And for some reason doing so started causing the following errors, which I suspect are related to deserializing a serialized object or seqfile. I also have large processes that only use textDelim, and they were not affected. 


java.io.IOException: Spill failed at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1342) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:406) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1109) at org.apache.hadoop.mapred.Child.main(Child.java:264) Caused by: cascading.CascadingException: unable to compare Tuples, likely a CoGroup is being attempted on fields of different types or custom comparators are incorrectly set on Fields at cascading.tuple.hadoop.util.TupleElementComparator.compare(TupleElementComparator.java:80) at cascading.tuple.hadoop.util.TupleElementComparator.compare(TupleElementComparator.java:32) at cascading.tuple.hadoop.util.DelegatingTupleElementComparator.compare(DelegatingTupleElementComparator.java:74) at cascading.tuple.hadoop.util.DelegatingTupleElementComparator.compare(DelegatingTupleElementComparator.java:34) at cascading.tuple.hadoop.util.DeserializerComparator.compareTuples(DeserializerComparator.java:148) at cascading.tuple.hadoop.util.IndexTupleCoGroupingComparator.compare(IndexTupleCoGroupingComparator.java:41) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.comparekeys(MapTask.java:939) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$KvOffset.compare(MapTask.java:1016) at org.apache.hadoop.util.QuickSort.sortInternal(QuickSort.java:95) at org.apache.hadoop.util.QuickSort.sort(QuickSort.java:59) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1495) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$2000(MapTask.java:733) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1413) Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Integer at java.lang.Integer.compareTo(Integer.java:37) at cascading.tuple.hadoop.util.TupleElementComparator$1.compare(TupleElementComparator.java:48) at cascading.tuple.hadoop.util.TupleElementComparator$1.compare(TupleElementComparator.java:35) at cascading.tuple.hadoop.util.TupleElementComparator.compare(TupleElementComparator.java:76) ... 12 more

also:

java.io.IOException: Spill failed
	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1342)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:406)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1109)
	at org.apache.hadoop.mapred.Child.main(Child.java:264)
Caused by: cascading.CascadingException: unable to compare Tuples, likely a CoGroup is being attempted on fields of different types or custom comparators are incorrectly set on Fields
	at cascading.tuple.hadoop.util.TupleElementComparator.compare(TupleElementComparator.java:80)
	at cascading.tuple.hadoop.util.TupleElementComparator.compare(TupleElementComparator.java:32)
	at cascading.tuple.hadoop.util.DelegatingTupleElementComparator.compare(DelegatingTupleElementComparator.java:74)
	at cascading.tuple.hadoop.util.DelegatingTupleElementComparator.compare(DelegatingTupleElementComparator.java:34)
	at cascading.tuple.hadoop.util.DeserializerComparator.compareTuples(DeserializerComparator.java:148)
	at cascading.tuple.hadoop.util.IndexTupleCoGroupingComparator.compare(IndexTupleCoGroupingComparator.java:41)
	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.comparekeys(MapTask.java:939)
	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$KvOffset.compare(MapTask.java:1016)
	at org.apache.hadoop.util.QuickSort.sortInternal(QuickSort.java:95)
	at org.apache.hadoop.util.QuickSort.sort(QuickSort.java:59)
	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1495)
	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$2000(MapTask.java:733)
	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1413)
Caused by: java.io.EOFException
	at java.io.DataInputStream.readFully(DataInputStream.java:180)
	at java.io.DataInputStream.readFully(DataInputStream.java:152)
	at org.apache.hadoop.io.WritableUtils.readString(WritableUtils.java:116)
	at cascading.tuple.hadoop.io.HadoopTupleInputStream.readString(HadoopTupleInputStream.java:75)
	at cascading.tuple.hadoop.io.HadoopTupleInputStream.readType(HadoopTupleInputStream.java:85)
	at cascading.tuple.hadoop.io.HadoopTupleInputStream.getNextElement(HadoopTupleInputStream.java:52)
	at cascading.tuple.hadoop.util.TupleElementComparator.compare(TupleElementComparator.java:73)
	... 12 more



Chris K Wensel

unread,
Oct 9, 2012, 10:55:15 AM10/9/12
to cascadi...@googlegroups.com
My initial reaction is that some combination of those parameters are confusing the output/input streams during the map spill.

Cascading attempts to use a RawComparator (rebranded internally as a StreamComparator) to improve comparator performance during sorting (by lazy deserializing the underlying tuple elements).

if the underlying stream (managed by Hadoop) is borked, Cascading can't skip or rollback bytes read from the stream reliably. 

what hadoop version and vendor are you using?

--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To view this discussion on the web visit https://groups.google.com/d/msg/cascading-user/-/HLw-pTqVvHoJ.
To post to this group, send email to cascadi...@googlegroups.com.
To unsubscribe from this group, send email to cascading-use...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/cascading-user?hl=en.


Brandon Mason

unread,
Oct 9, 2012, 1:23:46 PM10/9/12
to cascadi...@googlegroups.com
hadoop-0.20.2
Mapr Distro
 
 

Chris K Wensel

unread,
Oct 9, 2012, 5:37:43 PM10/9/12
to cascadi...@googlegroups.com
this very well could be a function of any compression used. 

I think there were some prior issues with compression in MapR, namely some conflicts with using hadoop compression codecs when MapR already does compression or some such.

ckw

On Oct 9, 2012, at 10:23 AM, Brandon Mason <imper...@gmail.com> wrote:

hadoop-0.20.2
Mapr Distro
 
 

--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To view this discussion on the web visit https://groups.google.com/d/msg/cascading-user/-/--CadwlkIyAJ.

To post to this group, send email to cascadi...@googlegroups.com.
To unsubscribe from this group, send email to cascading-use...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/cascading-user?hl=en.

Brandon Mason

unread,
Oct 9, 2012, 7:03:35 PM10/9/12
to cascadi...@googlegroups.com
We can retest using a mapr volume that isn't compressed.  I believe we tried that, but since some other variables were changing we can repeat that test.

Girish kathalagiri

unread,
Oct 9, 2012, 7:28:53 PM10/9/12
to cascadi...@googlegroups.com
Below is the sample code , that causes the same exception, 
Looks like, tmp sequence file looses the Field types  .... 

Am I doing something wrong here ? I am on CDH4. 


 Fields inFields = new Fields("ID","Date");
 Tap inTap = new Hfs( new TextDelimited( inFields, "\t" ), "Transaction" );
 Tap outTap = new Hfs(new TextDelimited( inFields, "\t"), "ActiveTransactions");
 Tap outTapActive = new Hfs(new TextDelimited( new Fields("ID"), "\t"), "ActiveID");

 
 Pipe copyPipe = new Pipe( "Test" );
 copyPipe = new Coerce(copyPipe,inFields,Integer.class,String.class);

 // Do some date filtering 
 // Here DF is a new Date Filter
  copyPipe = new Each(copyPipe,DF);

    Pipe uniqs = new Pipe("copied",copyPipe);

        uniqs = new Retain(uniqs,new Fields("ID"));
        uniqs = new Unique(uniqs, new Fields("ID"));
        uniqs = new Rename(uniqs,new Fields("ID"),new Fields("ActiveID"));

        Pipe joined = new CoGroup(uniqs, new Fields("ActiveID"),copyPipe, new Fields("ID") ,new LeftJoin());
        joined = new Retain(j,new Fields("EntityID","OrderDate"));

      FlowDef flowDef = FlowDef.flowDef()
     .addSource( copyPipe, inTap )
     .addTailSink( j, outTap )
            .addSink(uniqs,outTapActive);

    // run the flow
    flowConnector.connect( flowDef ).complete();



 




Regards
Girish S Kathalagiri


On Tue, Oct 9, 2012 at 4:03 PM, Brandon Mason <imper...@gmail.com> wrote:
We can retest using a mapr volume that isn't compressed.  I believe we tried that, but since some other variables were changing we can repeat that test.

--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To view this discussion on the web visit https://groups.google.com/d/msg/cascading-user/-/vevBsA_NLYAJ.

Chris K Wensel

unread,
Oct 9, 2012, 11:12:27 PM10/9/12
to cascadi...@googlegroups.com
fwiw, CDH4 is not on our compatibility list. see prior posts.


ckw
Reply all
Reply to author
Forward
0 new messages