DuctException in cascading

300 views
Skip to first unread message

Bhavesh Shah

unread,
Feb 4, 2016, 12:06:06 PM2/4/16
to cascading-user
Hi,

From last couple of days I am facing Duct Exception. I am not able to understand what is this exception? When I ran one job, sometime it works and sometime it throws DuctException. I checked the hadoop logs but I didn't understand from it. Below is the detailed stacktrace:
Exception:
2016-02-03 14:33:03,719 FATAL [IPC Server handler 2 on 56181] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1453327553320_17435_m_000004_0 - exited : cascading.flow.FlowException: internal error during mapper execution
at cascading.flow.hadoop.FlowMapper.run(FlowMapper.java:160)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: cascading.flow.stream.duct.DuctException: internal error: [null, '-1105210807895', 'AAAA  ', 'AAAA  ', '1', null, '20000309025000']
at cascading.flow.hadoop.stream.HadoopGroupGate.receive(HadoopGroupGate.java:116)
at cascading.flow.hadoop.stream.HadoopGroupGate.receive(HadoopGroupGate.java:45)
at cascading.flow.stream.duct.Stage.receive(Stage.java:30)
at cascading.flow.stream.element.SourceStage.map(SourceStage.java:110)
at cascading.flow.stream.element.SourceStage.run(SourceStage.java:66)
at cascading.flow.hadoop.FlowMapper.run(FlowMapper.java:142)
... 7 more
Caused by: java.lang.IllegalArgumentException: invalid index: 2, length: 2
at cascading.tuple.util.OverrideTupleList.get(OverrideTupleList.java:79)
at java.util.AbstractList$Itr.next(AbstractList.java:358)
at cascading.tuple.io.TupleOutputStream.write(TupleOutputStream.java:88)
at cascading.tuple.io.TupleOutputStream.writeTuple(TupleOutputStream.java:64)
at cascading.tuple.hadoop.io.TupleSerializer.serialize(TupleSerializer.java:37)
at cascading.tuple.hadoop.io.TupleSerializer.serialize(TupleSerializer.java:28)
at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1327)
at org.apache.hadoop.mapred.SequenceFileOutputFormat$1.write(SequenceFileOutputFormat.java:76)
at org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.collect(MapTask.java:858)
at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:610)
at cascading.tap.hadoop.util.MeasuredOutputCollector.collect(MeasuredOutputCollector.java:70)
at cascading.flow.hadoop.stream.element.HadoopGroupByGate.wrapGroupingAndCollect(HadoopGroupByGate.java:54)
at cascading.flow.hadoop.stream.HadoopGroupGate.receive(HadoopGroupGate.java:103)
... 12 more

Below is the visualization of cascading job in Driven:


In what cases is this exception thrown?

Thanks,
Bhavesh






Chris K Wensel

unread,
Feb 4, 2016, 12:20:43 PM2/4/16
to cascadi...@googlegroups.com
If you are on Cascading 3, give 3.1-wip a try

also, Merging before a GroupBy is redundant. if you just remove the Merge, and use the GroupBy as a merge, your error will probably go away.

ckw

--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at https://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/e59ebc1a-a1ec-4c14-af31-ea2de0ed36d3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Chris K Wensel




Bhavesh Shah

unread,
Feb 4, 2016, 5:46:06 PM2/4/16
to cascading-user
Thanks Chris for the reply.
I have tried it with 3.1-wip, but still it is failing with same exception. But this time it is failing before the flows getting merge.

Thanks,
Bhavesh

Pushpender Garg

unread,
Feb 5, 2016, 3:05:24 AM2/5/16
to cascading-user
if thats the case then i think it would be better if planner can take care of it or atleast fail at planning phase itself but not until execution.

Chris K Wensel

unread,
Feb 5, 2016, 12:43:56 PM2/5/16
to cascadi...@googlegroups.com
what’s the case? removing the Merge before the GroupBy..

it could, but we do not ship with logical optimizations in place. 

for two simple reasons. 

it makes understanding what the developer did and whats happening (what the planner did) easier to comprehend. its easier to build a cost model in your head of the implications of the code you write as you are writing and executing that code.

having logical optimization rules would results in more rules to debug across edge cases, making the resulting execution plans less stable.

that said, in 3.x, you are welcome to create rules that make changes that aren’t strictly necessary. this is especially useful for projects like Scalding or Lingual and some of the commercial companies embedding Cascading where the auto-generated Cascading may have artifacts that are more easily dealt with during planning, not generation.

ckw

On Feb 5, 2016, at 12:05 AM, Pushpender Garg <pushpen...@gmail.com> wrote:

if thats the case then i think it would be better if planner can take care of it or atleast fail at planning phase itself but not until execution.

--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at https://groups.google.com/group/cascading-user.

For more options, visit https://groups.google.com/d/optout.

Chris K Wensel




Chris K Wensel

unread,
Feb 5, 2016, 12:44:41 PM2/5/16
to cascadi...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages