More example on Spark Submit Task

1,016 views
Skip to first unread message

tsusuki....@gmail.com

unread,
May 17, 2015, 1:02:30 PM5/17/15
to luigi...@googlegroups.com
Hi, I am a beginner in using Luigi. I would like to have request more example on spark submit task workflow. Currently I have an assignment in which I have three different spark jars in which they should be connected in a single line workflow.
I have the basic idea on how to write a Luigi script in scheduling these spark job together, however my Luigi script continuously hit error.
The error log is as follow:

Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/luigi/worker.py", line 165, in run
task_gen = self.task.run()
File "/usr/local/lib/python2.7/dist-packages/luigi/contrib/spark.py", line 248, in run
raise SparkJobError('Spark job failed {0}'.format(repr(args)), out=stdout, err=stderr)
SparkJobError: Spark job failed ['$SPARK_HOME/bin/spark-submit', '--class', 'euworkflow.ImageProcess', '--driver-memory', '128M', '/home/ubuntu/luigiWD/featureSelection-2.jar']

Since the error output is quite simplified, I cannot investigate which part of the script is doing wrong. But what I can confirm is the spark jar when I run with normal spark-submit (together with those arguments) are working well.

Hence I hope that Mr.author can put up more examples on spark workflow with Luigi in github repo. Thanks.

Have a nice day.

Thierry Jossermoz

unread,
May 18, 2015, 1:48:20 AM5/18/15
to luigi...@googlegroups.com
Hi Miya,

Can you please try replacing $SPARK_HOME in your spark config with the actual path to the spark-submit binary?

For example, in your client.cfg, instead of:
[spark]
spark-submit: $SPARK_HOME/bin/spark-submit

Have:
[spark]
spark-submit: /absolute/path/to/spark/bin/spark-submit

Let me know how it goes.

Thanks,
Thierry

Miya Kazusaki

unread,
May 18, 2015, 7:56:13 AM5/18/15
to luigi...@googlegroups.com
Hi Thierry,

Thanks for your reply.
I changed the spark-submit configuration to the full path of my spark-submit and run. But the problem persists.
The return code of the subprocess.popen is 255. What error will it refer to?
error log:
INFO: Running: ['/usr/local/spark/bin/spark-submit', '--class', 'euworkflow.ImageProcess', '--driver-memory', '128M', '/home/ubuntu/luigiWD/featureSelection-2.jar']
['/usr/local/spark/bin/spark-submit', '--class', 'euworkflow.ImageProcess', '--driver-memory', '128M', '/home/ubuntu/luigiWD/featureSelection-2.jar']
INFO: None
return code 255
ERROR: [pid 6557] Worker Worker(salt=564676788, workers=1, host=ip-10-170-116-99, username=ubuntu, pid=6557) failed    ImageProcess()
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/luigi/worker.py", line 165, in run
    task_gen = self.task.run()
  File "/usr/local/lib/python2.7/dist-packages/luigi/contrib/spark.py", line 249, in run
    raise SparkJobError('Spark job failed {0}'.format(repr(args)), out=stdout, err=stderr)
SparkJobError: Spark job failed ['/usr/local/spark/bin/spark-submit', '--class', 'euworkflow.ImageProcess', '--driver-memory', '128M', '/home/ubuntu/luigiWD/featureSelection-2.jar']
STDERR: Usage: spark-submit [options] <app jar | python file> [app options]

Arash Rouhani

unread,
May 18, 2015, 8:45:55 AM5/18/15
to Miya Kazusaki, luigi...@googlegroups.com
It seems like luigi is shelling out to a command. Try to just run that command then without luigi to make sure that works first.

--
You received this message because you are subscribed to the Google Groups "Luigi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to luigi-user+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Miya Kazusaki

unread,
May 18, 2015, 11:12:29 AM5/18/15
to luigi...@googlegroups.com

Yep it is successful if I run the subprocess command without passing in parameters other than the spark-submit arguments . (in another python script). What is mean by shelling out the command? 

Arash Rouhani

unread,
May 18, 2015, 11:14:10 AM5/18/15
to Miya Kazusaki, luigi...@googlegroups.com
I mean try running just

/usr/local/spark/bin/spark-submit --class euworkflow.ImageProcess --driver-memory 128M /home/ubuntu/luigiWD/featureSelection-2.jar

from the cmd line

On Mon, May 18, 2015 at 5:12 PM, Miya Kazusaki <tsusuki....@gmail.com> wrote:

Yep it is successful if I run the subprocess command without passing in parameters other than the spark-submit arguments . (in another python script). What is mean by shelling out the command? 

--

Miya Kazusaki

unread,
May 18, 2015, 11:26:53 AM5/18/15
to luigi...@googlegroups.com, tsusuki....@gmail.com
Yes it is successful

Arash Rouhani

unread,
May 18, 2015, 11:30:19 AM5/18/15
to Miya Kazusaki, luigi...@googlegroups.com
Ah sorry. By luigi shelling out, I mean just launching the subprocess like you did manually.

If you get exit code 0 when running it manually, but luigi gets 255, I'm clueless. This is not a luigi problem anyway.

On Mon, May 18, 2015 at 5:26 PM, Miya Kazusaki <tsusuki....@gmail.com> wrote:
Yes it is successful

Thierry Jossermoz

unread,
May 19, 2015, 5:42:50 PM5/19/15
to Arash Rouhani, Miya Kazusaki, luigi...@googlegroups.com
Hi Miya,

The error message suggests the command is missing some option (Usage: spark-submit [options] <app jar | python file> [app options]).

Do you have a default spark master set in your $SPARK_HOME/conf/spark-defaults.conf?

Can you add the following line to your client.cfg to try and isolate the issue:

[spark]
spark-submit: /usr/local/spark/bin/spark-submit
master: local[*]

Thanks,
Thierry


--
You received this message because you are subscribed to a topic in the Google Groups "Luigi" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/luigi-user/afcWUs9yYJY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to luigi-user+...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages