Spill failed when tweaking parameters

jd

unread,

Oct 4, 2012, 2:07:06 PM10/4/12

to cascadi...@googlegroups.com

This is an odd one, and I'm not sure if cascading is necessarily to blame, but maybe someone has some insight.

Recently I was tweaking these params:

io.sort.record.percent

io.sort.spill.percent

io.sort.facto

io.sort.mb

And for some reason doing so started causing the following errors, which I suspect are related to deserializing a serialized object or seqfile. I also have large processes that only use textDelim, and they were not affected.

java.io.IOException: Spill failed at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1342) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:406) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1109) at org.apache.hadoop.mapred.Child.main(Child.java:264) Caused by: cascading.CascadingException: unable to compare Tuples, likely a CoGroup is being attempted on fields of different types or custom comparators are incorrectly set on Fields at cascading.tuple.hadoop.util.TupleElementComparator.compare(TupleElementComparator.java:80) at cascading.tuple.hadoop.util.TupleElementComparator.compare(TupleElementComparator.java:32) at cascading.tuple.hadoop.util.DelegatingTupleElementComparator.compare(DelegatingTupleElementComparator.java:74) at cascading.tuple.hadoop.util.DelegatingTupleElementComparator.compare(DelegatingTupleElementComparator.java:34) at cascading.tuple.hadoop.util.DeserializerComparator.compareTuples(DeserializerComparator.java:148) at cascading.tuple.hadoop.util.IndexTupleCoGroupingComparator.compare(IndexTupleCoGroupingComparator.java:41) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.comparekeys(MapTask.java:939) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$KvOffset.compare(MapTask.java:1016) at org.apache.hadoop.util.QuickSort.sortInternal(QuickSort.java:95) at org.apache.hadoop.util.QuickSort.sort(QuickSort.java:59) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1495) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$2000(MapTask.java:733) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1413) Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Integer at java.lang.Integer.compareTo(Integer.java:37) at cascading.tuple.hadoop.util.TupleElementComparator$1.compare(TupleElementComparator.java:48) at cascading.tuple.hadoop.util.TupleElementComparator$1.compare(TupleElementComparator.java:35) at cascading.tuple.hadoop.util.TupleElementComparator.compare(TupleElementComparator.java:76) ... 12 more

also:

java.io.IOException: Spill failed
	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1342)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:406)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1109)
	at org.apache.hadoop.mapred.Child.main(Child.java:264)
Caused by: cascading.CascadingException: unable to compare Tuples, likely a CoGroup is being attempted on fields of different types or custom comparators are incorrectly set on Fields
	at cascading.tuple.hadoop.util.TupleElementComparator.compare(TupleElementComparator.java:80)
	at cascading.tuple.hadoop.util.TupleElementComparator.compare(TupleElementComparator.java:32)
	at cascading.tuple.hadoop.util.DelegatingTupleElementComparator.compare(DelegatingTupleElementComparator.java:74)
	at cascading.tuple.hadoop.util.DelegatingTupleElementComparator.compare(DelegatingTupleElementComparator.java:34)
	at cascading.tuple.hadoop.util.DeserializerComparator.compareTuples(DeserializerComparator.java:148)
	at cascading.tuple.hadoop.util.IndexTupleCoGroupingComparator.compare(IndexTupleCoGroupingComparator.java:41)
	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.comparekeys(MapTask.java:939)
	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$KvOffset.compare(MapTask.java:1016)
	at org.apache.hadoop.util.QuickSort.sortInternal(QuickSort.java:95)
	at org.apache.hadoop.util.QuickSort.sort(QuickSort.java:59)
	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1495)
	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$2000(MapTask.java:733)
	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1413)
Caused by: java.io.EOFException
	at java.io.DataInputStream.readFully(DataInputStream.java:180)
	at java.io.DataInputStream.readFully(DataInputStream.java:152)
	at org.apache.hadoop.io.WritableUtils.readString(WritableUtils.java:116)
	at cascading.tuple.hadoop.io.HadoopTupleInputStream.readString(HadoopTupleInputStream.java:75)
	at cascading.tuple.hadoop.io.HadoopTupleInputStream.readType(HadoopTupleInputStream.java:85)
	at cascading.tuple.hadoop.io.HadoopTupleInputStream.getNextElement(HadoopTupleInputStream.java:52)
	at cascading.tuple.hadoop.util.TupleElementComparator.compare(TupleElementComparator.java:73)
	... 12 more

Chris K Wensel

unread,

Oct 9, 2012, 10:55:15 AM10/9/12

to cascadi...@googlegroups.com

My initial reaction is that some combination of those parameters are confusing the output/input streams during the map spill.

Cascading attempts to use a RawComparator (rebranded internally as a StreamComparator) to improve comparator performance during sorting (by lazy deserializing the underlying tuple elements).

if the underlying stream (managed by Hadoop) is borked, Cascading can't skip or rollback bytes read from the stream reliably.

what hadoop version and vendor are you using?

--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To view this discussion on the web visit https://groups.google.com/d/msg/cascading-user/-/HLw-pTqVvHoJ.
To post to this group, send email to cascadi...@googlegroups.com.
To unsubscribe from this group, send email to cascading-use...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/cascading-user?hl=en.

--

Chris K Wensel

ch...@concurrentinc.com

http://concurrentinc.com

Brandon Mason

unread,

Oct 9, 2012, 1:23:46 PM10/9/12

to cascadi...@googlegroups.com

hadoop-0.20.2

Mapr Distro

Chris K Wensel

unread,

Oct 9, 2012, 5:37:43 PM10/9/12

to cascadi...@googlegroups.com

this very well could be a function of any compression used.

I think there were some prior issues with compression in MapR, namely some conflicts with using hadoop compression codecs when MapR already does compression or some such.

ckw

On Oct 9, 2012, at 10:23 AM, Brandon Mason <imper...@gmail.com> wrote:

hadoop-0.20.2
Mapr Distro

--
You received this message because you are subscribed to the Google Groups "cascading-user" group.

To view this discussion on the web visit https://groups.google.com/d/msg/cascading-user/-/--CadwlkIyAJ.

To post to this group, send email to cascadi...@googlegroups.com.
To unsubscribe from this group, send email to cascading-use...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/cascading-user?hl=en.

Brandon Mason

unread,

Oct 9, 2012, 7:03:35 PM10/9/12

to cascadi...@googlegroups.com

We can retest using a mapr volume that isn't compressed. I believe we tried that, but since some other variables were changing we can repeat that test.

Girish kathalagiri

unread,

Oct 9, 2012, 7:28:53 PM10/9/12

to cascadi...@googlegroups.com

Below is the sample code , that causes the same exception,

Looks like, tmp sequence file looses the Field types ....

Am I doing something wrong here ? I am on CDH4.

Fields inFields = new Fields("ID","Date");

Tap inTap = new Hfs( new TextDelimited( inFields, "\t" ), "Transaction" );

Tap outTap = new Hfs(new TextDelimited( inFields, "\t"), "ActiveTransactions");

Tap outTapActive = new Hfs(new TextDelimited( new Fields("ID"), "\t"), "ActiveID");

Pipe copyPipe = new Pipe( "Test" );

copyPipe = new Coerce(copyPipe,inFields,Integer.class,String.class);

// Do some date filtering

// Here DF is a new Date Filter

copyPipe = new Each(copyPipe,DF);

Pipe uniqs = new Pipe("copied",copyPipe);

uniqs = new Retain(uniqs,new Fields("ID"));

uniqs = new Unique(uniqs, new Fields("ID"));

uniqs = new Rename(uniqs,new Fields("ID"),new Fields("ActiveID"));

Pipe joined = new CoGroup(uniqs, new Fields("ActiveID"),copyPipe, new Fields("ID") ,new LeftJoin());

joined = new Retain(j,new Fields("EntityID","OrderDate"));

FlowDef flowDef = FlowDef.flowDef()

.addSource( copyPipe, inTap )

.addTailSink( j, outTap )

.addSink(uniqs,outTapActive);

// run the flow

flowConnector.connect( flowDef ).complete();

Regards
Girish S Kathalagiri

On Tue, Oct 9, 2012 at 4:03 PM, Brandon Mason <imper...@gmail.com> wrote:

We can retest using a mapr volume that isn't compressed. I believe we tried that, but since some other variables were changing we can repeat that test.

--

You received this message because you are subscribed to the Google Groups "cascading-user" group.

To view this discussion on the web visit https://groups.google.com/d/msg/cascading-user/-/vevBsA_NLYAJ.

Chris K Wensel

unread,

Oct 9, 2012, 11:12:27 PM10/9/12

to cascadi...@googlegroups.com

fwiw, CDH4 is not on our compatibility list. see prior posts.

http://www.cascading.org/support/compatibility/

ckw

Reply all

Reply to author

Forward