Azkaban Flows stuck for a really long time

290 views
Skip to first unread message

ravio...@gmail.com

unread,
Sep 28, 2015, 9:34:24 AM9/28/15
to azkaban
This

The flow has 4 jobs, and in that 2 jobs gets stuck in the same way.
Though I have set flow.num.job.threads=10, the job gets stuck in RUNNING state.
Can someone please help, we are stuck on this issue from long.
Help is really appreciated.

Log:
28-09-2015 03:48:09 IST test_job INFO - 2 commands to execute.
28-09-2015 03:48:09 IST test_job INFO - Command: chmod +x tmp/test_job
28-09-2015 03:48:09 IST test_job INFO - Environment variables: {JOB_NAME=test_job, ....
28-09-2015 03:48:09 IST test_job INFO - Working directory: <some_path>/azkaban-exec-server/executions/224932
28-09-2015 03:48:09 IST test_job INFO - Process completed successfully in 0 seconds.
28-09-2015 03:48:09 IST test_job INFO - Command: ./tmp/test_job_run_file
28-09-2015 03:48:09 IST test_job INFO - Environment variables: {JOB_NAME=test_job,...}
28-09-2015 03:48:09 IST test_job INFO - Working directory: <some_path>/azkaban-exec-server/executions/224932
28-09-2015 04:12:57 IST test_job INFO - SLF4J: Class path contains multiple SLF4J bindings.


Settings:

# Azkaban Executor settings

executor.maxThreads=50

executor.maxThreads=50

executor.port=12321

executor.port=12321

executor.flow.threads=50

executor.flow.threads=50


flow.num.job.threads=10

Thanks,
Ravi

psss...@gmail.com

unread,
Oct 26, 2015, 4:06:25 AM10/26/15
to azkaban, ravio...@gmail.com
I come across the same problem for sevral times,what's your azkaban version? Mine is 2.5.

在 2015年9月28日星期一 UTC+8下午9:34:24,ravio...@gmail.com写道:

ravi teja

unread,
Oct 26, 2015, 5:42:00 AM10/26/15
to psss...@gmail.com, azkaban
Figured out the issue.
I was checking the azkaban code and added some debug logs to find out what was happening.
Azkaban spawns a new process with the given command , in our case the underlying process was getting hanged, due to 

1) High CPU load
2) Limited disk space for the hadoop jar command; hadoop jar does a unjar on the hadoop-jar, which required a lot of memory, and since many of these processes were spawned, the processes were stuck for a really long time.

Thanks,
Ravi
Reply all
Reply to author
Forward
0 new messages