Suggestion for how to debug tez error

145 views
Skip to first unread message

Benjamin Ross

unread,
Jul 22, 2016, 11:29:26 AM7/22/16
to cascading-user
I'm having trouble with an error deep within the bowels of Tez.  I'm also having difficulty reproducing the issue locally - it only reproduces on our hdfs cluster.  One of my big issues is that I can't figure out how to correlate between the TEZ dag and the cascading DAG in order to pinpoint the vertex where the problem is occurring.  Any suggestion is appreciated... thanks.

Vertex failed, vertexName=D5EE2A54CD5444268212860379F1B95D, vertexId=vertex_1468858462978_7108_1_01, diagnostics=[Task failed, taskId=task_1468858462978_7108_1_01_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task: attempt_1468858462978_7108_1_01_000000_0:cascading.CascadingException: unable to compare stream elements in position: 0
at cascading.tuple.hadoop.util.DeserializerComparator.compareUnTypedTuples(DeserializerComparator.java:239)
at cascading.tuple.hadoop.util.DeserializerComparator.compareTuples(DeserializerComparator.java:194)
at cascading.tuple.hadoop.util.GroupingSortingComparator.compare(GroupingSortingComparator.java:62)
at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SortSpan.compareKeys(PipelinedSorter.java:941)
at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SortSpan.compare(PipelinedSorter.java:956)
at org.apache.hadoop.util.QuickSort.sortInternal(QuickSort.java:74)
at org.apache.hadoop.util.QuickSort.sortInternal(QuickSort.java:126)
at org.apache.hadoop.util.QuickSort.sortInternal(QuickSort.java:126)
at org.apache.hadoop.util.QuickSort.sort(QuickSort.java:63)
at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SortSpan.sort(PipelinedSorter.java:902)
at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.flush(PipelinedSorter.java:631)
at org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput.close(OrderedPartitionedKVOutput.java:182)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.close(LogicalIOProcessorRuntimeTask.java:378)
at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:80)
at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:59)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:59)
at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:36)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: cascading.CascadingException: unable to compare Tuples, likely a CoGroup is being attempted on fields of different types or custom comparators are incorrectly set on Fields, lhs: 'null' rhs: 'null'
at cascading.tuple.hadoop.util.TupleElementComparator.compare(TupleElementComparator.java:91)
at cascading.tuple.hadoop.util.TupleElementComparator.compare(TupleElementComparator.java:33)
at cascading.tuple.hadoop.util.DeserializerComparator.compareUnTypedTuples(DeserializerComparator.java:235)
... 24 more
Caused by: java.lang.NullPointerException
at java.util.Collections$ReverseComparator.compare(Collections.java:3578)
at java.util.Collections$ReverseComparator.compare(Collections.java:3569)
at cascading.tuple.hadoop.util.TupleElementComparator.compare(TupleElementComparator.java:87)
... 26 more
], TaskAttempt 1 failed, info=[Error: Failure while running task: attempt_1468858462978_7108_1_01_000000_1:cascading.CascadingException: unable to compare stream elements in position: 0
at cascading.tuple.hadoop.util.DeserializerComparator.compareUnTypedTuples(DeserializerComparator.java:239)
at cascading.tuple.hadoop.util.DeserializerComparator.compareTuples(DeserializerComparator.java:194)
at cascading.tuple.hadoop.util.GroupingSortingComparator.compare(GroupingSortingComparator.java:62)
at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SortSpan.compareKeys(PipelinedSorter.java:941)
at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SortSpan.compare(PipelinedSorter.java:956)
at org.apache.hadoop.util.QuickSort.sortInternal(QuickSort.java:74)
at org.apache.hadoop.util.QuickSort.sortInternal(QuickSort.java:126)
at org.apache.hadoop.util.QuickSort.sortInternal(QuickSort.java:126)
at org.apache.hadoop.util.QuickSort.sort(QuickSort.java:63)
at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SortSpan.sort(PipelinedSorter.java:902)
at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.flush(PipelinedSorter.java:631)
at org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput.close(OrderedPartitionedKVOutput.java:182)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.close(LogicalIOProcessorRuntimeTask.java:378)
at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:80)
at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:59)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:59)
at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:36)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: cascading.CascadingException: unable to compare Tuples, likely a CoGroup is being attempted on fields of different types or custom comparators are incorrectly set on Fields, lhs: 'null' rhs: 'null'
at cascading.tuple.hadoop.util.TupleElementComparator.compare(TupleElementComparator.java:91)
at cascading.tuple.hadoop.util.TupleElementComparator.compare(TupleElementComparator.java:33)
at cascading.tuple.hadoop.util.DeserializerComparator.compareUnTypedTuples(DeserializerComparator.java:235)
... 24 more
Caused by: java.lang.NullPointerException
at java.util.Collections$ReverseComparator.compare(Collections.java:3578)
at java.util.Collections$ReverseComparator.compare(Collections.java:3569)
at cascading.tuple.hadoop.util.TupleElementComparator.compare(TupleElementComparator.java:87)
... 26 more
], TaskAttempt 2 failed, info=[Error: Failure while running task: attempt_1468858462978_7108_1_01_000000_2:cascading.CascadingException: unable to compare stream elements in position: 0
at cascading.tuple.hadoop.util.DeserializerComparator.compareUnTypedTuples(DeserializerComparator.java:239)
at cascading.tuple.hadoop.util.DeserializerComparator.compareTuples(DeserializerComparator.java:194)
at cascading.tuple.hadoop.util.GroupingSortingComparator.compare(GroupingSortingComparator.java:62)
at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SortSpan.compareKeys(PipelinedSorter.java:941)
at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SortSpan.compare(PipelinedSorter.java:956)
at org.apache.hadoop.util.QuickSort.sortInternal(QuickSort.java:74)
at org.apache.hadoop.util.QuickSort.sortInternal(QuickSort.java:126)
at org.apache.hadoop.util.QuickSort.sortInternal(QuickSort.java:126)
at org.apache.hadoop.util.QuickSort.sort(QuickSort.java:63)
at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SortSpan.sort(PipelinedSorter.java:902)
at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.flush(PipelinedSorter.java:631)
at org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput.close(OrderedPartitionedKVOutput.java:182)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.close(LogicalIOProcessorRuntimeTask.java:378)
at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:80)
at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:59)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:59)
at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:36)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: cascading.CascadingException: unable to compare Tuples, likely a CoGroup is being attempted on fields of different types or custom comparators are incorrectly set on Fields, lhs: 'null' rhs: 'null'
at cascading.tuple.hadoop.util.TupleElementComparator.compare(TupleElementComparator.java:91)
at cascading.tuple.hadoop.util.TupleElementComparator.compare(TupleElementComparator.java:33)
at cascading.tuple.hadoop.util.DeserializerComparator.compareUnTypedTuples(DeserializerComparator.java:235)
... 24 more
Caused by: java.lang.NullPointerException
at java.util.Collections$ReverseComparator.compare(Collections.java:3578)
at java.util.Collections$ReverseComparator.compare(Collections.java:3569)
at cascading.tuple.hadoop.util.TupleElementComparator.compare(TupleElementComparator.java:87)
... 26 more
], TaskAttempt 3 failed, info=[Error: Failure while running task: attempt_1468858462978_7108_1_01_000000_3:cascading.CascadingException: unable to compare stream elements in position: 0
at cascading.tuple.hadoop.util.DeserializerComparator.compareUnTypedTuples(DeserializerComparator.java:239)
at cascading.tuple.hadoop.util.DeserializerComparator.compareTuples(DeserializerComparator.java:194)
at cascading.tuple.hadoop.util.GroupingSortingComparator.compare(GroupingSortingComparator.java:62)
at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SortSpan.compareKeys(PipelinedSorter.java:941)
at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SortSpan.compare(PipelinedSorter.java:956)
at org.apache.hadoop.util.QuickSort.sortInternal(QuickSort.java:74)
at org.apache.hadoop.util.QuickSort.sortInternal(QuickSort.java:126)
at org.apache.hadoop.util.QuickSort.sortInternal(QuickSort.java:126)
at org.apache.hadoop.util.QuickSort.sort(QuickSort.java:63)
at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SortSpan.sort(PipelinedSorter.java:902)
at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.flush(PipelinedSorter.java:631)
at org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput.close(OrderedPartitionedKVOutput.java:182)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.close(LogicalIOProcessorRuntimeTask.java:378)
at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:80)
at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:59)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:59)
at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:36)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: cascading.CascadingException: unable to compare Tuples, likely a CoGroup is being attempted on fields of different types or custom comparators are incorrectly set on Fields, lhs: 'null' rhs: 'null'
at cascading.tuple.hadoop.util.TupleElementComparator.compare(TupleElementComparator.java:91)
at cascading.tuple.hadoop.util.TupleElementComparator.compare(TupleElementComparator.java:33)
at cascading.tuple.hadoop.util.DeserializerComparator.compareUnTypedTuples(DeserializerComparator.java:235)
... 24 more
Caused by: java.lang.NullPointerException
at java.util.Collections$ReverseComparator.compare(Collections.java:3578)
at java.util.Collections$ReverseComparator.compare(Collections.java:3569)
at cascading.tuple.hadoop.util.TupleElementComparator.compare(TupleElementComparator.java:87)
... 26 more
]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex vertex_1468858462978_7108_1_01 [D5EE2A54CD5444268212860379F1B95D] killed/failed due to:OWN_TASK_FAILURE]
Vertex killed, vertexName=BB39865F8DFA4C32A744A14E8D849E5B, vertexId=vertex_1468858462978_7108_1_02, diagnostics=[Vertex received Kill while in RUNNING state., Vertex did not succeed due to OTHER_VERTEX_FAILURE, failedTasks:0 killedTasks:1, Vertex vertex_1468858462978_7108_1_02 [BB39865F8DFA4C32A744A14E8D849E5B] killed/failed due to:OTHER_VERTEX_FAILURE]
DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1

Benjamin Ross

unread,
Jul 22, 2016, 11:46:21 AM7/22/16
to cascading-user
This is using Tez 0.8.2 and Cascading 3.1.0.  Thanks.

Chris K Wensel

unread,
Jul 22, 2016, 4:30:35 PM7/22/16
to cascadi...@googlegroups.com
are you doing a reverse sort on a grouping operation?

if so, try the app without the reverse sort. 

if the NPE goes away, see if you can write a simple test that reproduces the issue we can add to the test suite.

given all the above, the fix could be re-writing the java Collections reverse comparator to like null values and issue a new wip to try out.

ckw

-- 
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at https://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/fc901689-8485-4f2c-8e59-3dca51446e3d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Chris K Wensel




Chris K Wensel

unread,
Jul 22, 2016, 6:51:09 PM7/22/16
to cascadi...@googlegroups.com
I think I figured out a way to reproduce this. i’ll sort out what I can with my error, but having a test case to confirm its the same as your issue will be helpful, but not a blocker on getting a wip out.

ckw


For more options, visit https://groups.google.com/d/optout.

Chris K Wensel




Chris K Wensel

unread,
Jul 22, 2016, 7:06:21 PM7/22/16
to cascadi...@googlegroups.com
sorry, premature reply. I can’t repro the issue, my bug was in the test itself, not testing for null.

that said, don’t use Collections.reverseOrder() as a field comparator if you have null values. that will probably fix your issue as that comparator will throw a NPE on null values.

and I’ll remove any internal chance use of Collections.reverseOrder with a custom reverse comparator, just in case.

regardless, if you are not using Collections.reverseOrder as a field comparator, do make a pull request with a test case.


ckw


For more options, visit https://groups.google.com/d/optout.

Chris K Wensel




Benjamin Ross

unread,
Jul 22, 2016, 7:17:55 PM7/22/16
to cascadi...@googlegroups.com
Hey Chris,
Yeah that's exactly what it was.  I actually figured it out a while back and forgot to post a reply.  I was using Collections.reverseOrder() as the comparator for a Fields.  I solved it by just wrapping it in a NullComparator from apache commons.

On a related note:  Seriously, Java?  Your default comparator contractually throws NPEs when the values to sort on are null?  I can't think of a single use case where you would want that kind of behavior - why make that the default...

Thanks again!

--
You received this message because you are subscribed to a topic in the Google Groups "cascading-user" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/cascading-user/LSNMKcoZ5I0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to cascading-use...@googlegroups.com.

To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at https://groups.google.com/group/cascading-user.

For more options, visit https://groups.google.com/d/optout.



--
Benjamin Ross
432 Norfolk St., Unit 3J
Somerville, MA  02143

Chris K Wensel

unread,
Jul 22, 2016, 7:27:47 PM7/22/16
to cascadi...@googlegroups.com
good news, thanks for the update.

also, to be safe, 3.2 won’t ever call Collections.reverseOrder for the singleton but do delegate for the decorator, though I don’t think the singleton was ever applied otherwise.

ckw


For more options, visit https://groups.google.com/d/optout.

Chris K Wensel




Reply all
Reply to author
Forward
0 new messages