Cascade Hanging?

15 views
Skip to first unread message

Will Briggs

unread,
May 1, 2015, 3:48:40 PM5/1/15
to cascadi...@googlegroups.com
I have a cascade with multiple flows, which I submit hourly via Oozie. In the past day or two, I have started to have problems with the Cascade finishing (and logging) a flow, but never starting the subsequent flow in the Cascade - it just sits there, with no additional log output, forever - killing and restarting the whole cascade typically "fixes" the problem, but that's obviously not ideal. The problem is intermittent, which makes it even more frustrating to debug. Has anyone run into this sort of thing? I'm running Cascading 2.6.1 on YARN, with CDH 5.3.0.

Thanks,
Will

Andre Kelpe

unread,
May 2, 2015, 8:12:29 AM5/2/15
to cascadi...@googlegroups.com
Is the flow not started at all or does it submit jobs, but those are never started? I have seen behaviour like this, when YARN believes it is overcommited on ressources (memory), but would never tell you that. There are some config options to ease that, but before we go there, I'd like to know which behavior you are seeing.

- André

--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at http://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/d9288abf-1105-4f5e-83c6-956abc99af24%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--

William Briggs

unread,
May 4, 2015, 9:26:28 AM5/4/15
to cascadi...@googlegroups.com
Hi Andre, from what I can tell based on logging, the flow is never started - I can't find any jobs in the NEW, SUBMITTED or ACCEPTED state in the YARN resource manager, and the last output from the cascade is a statement telling me that the previous flow finished.

-Will

Chris K Wensel

unread,
May 4, 2015, 3:51:06 PM5/4/15
to cascadi...@googlegroups.com
Are you running the mapreduce job history server?

its probable that job tracker shuts down before the client side knows the last job in the tracker is completed.

also, can you drop a stack/thread dump of the Cascading client side and post it? would be useful to know where the hang actually is.

ckw


For more options, visit https://groups.google.com/d/optout.

Chris K Wensel




William Briggs

unread,
May 5, 2015, 10:11:03 AM5/5/15
to cascadi...@googlegroups.com
Thanks to both of you for the responses; we are running the MapReduce history server. I have attached a copy of the stack dump for the most recent version of the hung app.

Thanks,
Will

hung_cascade.stack

Gera Shegalov

unread,
May 5, 2015, 1:49:30 PM5/5/15
to cascading-user
Multiple job submitter threads hang in DBInputFormat split calculation. There could be too many concurrent connections to your Vertica.




For more options, visit https://groups.google.com/d/optout.



--
@gerashegalov

William Briggs

unread,
May 11, 2015, 10:28:51 AM5/11/15
to cascadi...@googlegroups.com
Thanks Gera; unfortunately, that doesn't seem to be the problem. As you noticed, the stack trace shows that the code is blocked when the DriverManager is instantiating JDBC drivers via reflection. This is still an intermittent issue, and it ends up causing the whole Cascade to hang, since none of the Flows that use the JDBC taps as input can run. Does anyone have some suggestions on debugging this issue further? It's driving me nuts.

Thanks,
Will

Andre Kelpe

unread,
May 11, 2015, 10:54:05 AM5/11/15
to cascadi...@googlegroups.com
Can you confirm that you are able to load the jdbc driver from a simple java app from your cluster? Are you able to run the standard cascading-jdbc tests?

- André


For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages