Passing args for dataproc batch submit (pyspark)

251 views
Skip to first unread message

Rodel van Rooijen

unread,
Jul 6, 2023, 1:36:32 PM7/6/23
to Google Cloud Dataproc Discussions
I've been trying to pass command line arguments to use in the python (pyspark) scripts that is launched by a DataprocCreateBatchOperator job. I cannot seem to get it to work. I've tried several options but it always comes back as "error: unrecognized arguments: --key value". See below. Also "--key value" doesn't work. Does anyone know the correct way to insert args?
"pyspark_batch": {
"main_python_file_uri": PYTHON_FILE_LOCATION,
"jar_file_uris": [SPARK_BIGQUERY_JAR_FILE],
"args": [
"--key",
"value",
],
},

Kristopher Kane

unread,
Jul 6, 2023, 1:54:12 PM7/6/23
to Google Cloud Dataproc Discussions
That sounds like an error in Python's argparse configuration and not the operator or API.  The batch configuration above looks ok otherwise. 

Rodel van Rooijen

unread,
Jul 6, 2023, 2:46:44 PM7/6/23
to Google Cloud Dataproc Discussions
Right! It was the argparser in the Python script.

Op donderdag 6 juli 2023 om 19:54:12 UTC+2 schreef Kristopher Kane:
Reply all
Reply to author
Forward
0 new messages