Can you fail SparkSubmitTask on a failed Spark job running in cluster mode on Kubernetes?

17 views
Skip to first unread message

Piotr Makarewicz

unread,
Mar 19, 2024, 9:31:11 AMMar 19
to Luigi
Hello, Luigi community!

I am running Spark on Kubernetes with jobs triggered by Luigi's SparkSubmitTask. What makes monitoring of these tasks cumbersome is the fact that spark-submit exits with 0 even if a Spark job has failed. As a result, Luigi's SparkSubmitTask is displayed as succeeded despite the underlying Spark job has crashed.

It would be more clear to the end user if a Luigi task also failed in such a case. I considered two walkarounds which require adding more code:
  • checking the Spark driver pod status in Kubernetes API server (the downside of it is tight binding to Kubernetes as a resource manager)
  • checking Spark application status in Spark history server (this will only work if there exists a Spark history server. It also makes the pipelines depend on it being up all the time)
As you see, there are issues with both approaches. I was looking for configuration options that would provide an out-of-the-box solution but could not find any. Are you aware of any plugins, extensions, etc. to either Luigi or Spark that could achieve it?

Thank you for any suggestions,
Piotr Makarewicz
Reply all
Reply to author
Forward
0 new messages