Hello, Luigi community!
I am running Spark on Kubernetes with jobs triggered by Luigi's SparkSubmitTask. What makes monitoring of these tasks cumbersome is the fact that spark-submit exits with 0 even if a Spark job has failed. As a result, Luigi's SparkSubmitTask is displayed as succeeded despite the underlying Spark job has crashed.
It would be more clear to the end user if a Luigi task also failed in such a case. I considered two walkarounds which require adding more code:
- checking the Spark driver pod status in Kubernetes API server (the downside of it is tight binding to Kubernetes as a resource manager)
- checking Spark application status in Spark history server (this will only work if there exists a Spark history server. It also makes the pipelines depend on it being up all the time)
As you see, there are issues with both approaches. I was looking for configuration options that would provide an out-of-the-box solution but could not find any. Are you aware of any plugins, extensions, etc. to either Luigi or Spark that could achieve it?
Thank you for any suggestions,
Piotr Makarewicz