monitoring Apache spark (running on Dataproc) metrics from Prometheus on GKE

karan alang

unread,

Aug 29, 2022, 12:53:14 AM8/29/22

to Google Cloud Dataproc Discussions

Question 1 :

I've a structured streaming job, which is running on Dataproc .. and prometheus installation on GKE (kubernetes) Prometheus install takes metrics from Kafka, kube-state-metrics etc which installed on the same GKE cluster, but different namespeces

Is there a way to monitor Spark jobs running on Dataproc, using Prometheus on GKE ?

Should the Dataproc & GKE pods be on the same VPC ?

Any pointers on this is appreciated !

2nd Question :

Here is a link, which describe the steps to monitor Spark jobs on Dataproc using Prometheus installed on the same Dataproc cluster.

https://github.com/GoogleCloudDataproc/initialization-actions/blob/master/prometheus/prometheus.sh

I was able to set this up on Dataproc, and i see only the following metrics.

```

spark_app_name spark_application_<app_id>_applicationMaster_numContainersPendingAllocate spark_application_<app_id>_applicationMaster_numExecutorsFailed spark_application_<app_id>_applicationMaster_numExecutorsRunning spark_application_<app_id>_applicationMaster_numLocalityAwareTasks spark_application_<app_id>_applicationMaster_numReleasedContainers

```

Note - this uses statsd, not PrometheusServlet - which can also be used to monitor apache spark, and seems to have a lot of Master/Driver/Exector metrics

I've made following changes to prometheus.sh & re-created the Dataproc cluster :

added PrometheusServlet related changes to $SPARK_HOME/conf/metrics.properties
added Master/Driver/Executor scraping in prometheus.yaml

Both these changes are in the prometheus.sh (attached) included in the initialization-action

Here is the script to create the Dataproc cluster:

```

gcloud beta dataproc clusters create $CNAME \ --enable-component-gateway \ --bucket $BUCKET \ --region $REGION \ --zone $ZONE \ --no-address --master-machine-type $TYPE \ --master-boot-disk-size 500 \ --master-boot-disk-type pd-ssd \ --num-workers $NUM_WORKER \ --worker-machine-type $TYPE \ --worker-boot-disk-type pd-ssd \ --worker-boot-disk-size 1000 \ --image-version $IMG_VERSION \ --scopes 'https://www.googleapis.com/auth/cloud-platform' \ --project $PROJECT \ --initialization-actions 'gs://dataproc-spark-configs/pip_install.sh','gs://dataproc-spark-configs/connectors-feb1.sh','gs://dataproc-spark-configs/prometheus.sh' \ --metadata 'gcs-connector-version=2.0.0' \ --metadata 'bigquery-connector-version=1.2.0' \ --properties 'dataproc:dataproc.logging.stackdriver.job.driver.enable=true,dataproc:job.history.to-gcs.enabled=true,spark:spark.dynamicAllocation.enabled=true,spark:spark.eventLog.dir=gs://dataproc-spark-logs/joblogs,spark:spark.history.fs.logDirectory=gs://dataproc-spark-logs/joblogs'

```

Command to run the Apache spark streaming job :

gcloud dataproc jobs submit pyspark main.py \ --cluster $CLUSTER \ --properties ^#^spark.jars.packages=org.apache.spark:spark-sql-kafka-0-10_2.12:3.1.2,org.mongodb.spark:mongo-spark-connector_2.12:3.0.2#spark.dynamicAllocation.enabled=true#spark.shuffle.service.enabled=true#spark.sql.autoBroadcastJoinThreshold=150m#spark.ui.prometheus.enabled=true#spark.kubernetes.driver.annotation.prometheus.io/scrape=true#spark.kubernetes.driver.annotation.prometheus.io/path=/metrics/executors/prometheus/#spark.kubernetes.driver.annotation.prometheus.io/port=4040#spark.app.name=structuredstreaming-versa\ --region us-east1

Following are set in the SparkConf (as part of the command above), when running the Spark (StructuredStream) job in Dataproc :

spark.ui.prometheus.enabled=true spark.kubernetes.driver.annotation.prometheus.io/scrape=true spark.kubernetes.driver.annotation.prometheus.io/path=/metrics/executors/prometheus/ spark.kubernetes.driver.annotation.prometheus.io/port=4040

Any inputs on this ?

What needs to be done to debug/enable this ?

tia!

P.S : attached is the modified prometheus.sh, used in creation of the Dataproc cluster

prometheus.sh

karan alang

unread,

Aug 29, 2022, 12:54:27 AM8/29/22

to Google Cloud Dataproc Discussions

Here is the corresponding stackoverflow link as well :

https://stackoverflow.com/questions/73524235/monitoring-apache-spark-running-on-dataproc-metrics-from-prometheus-on-gke-2

karan alang

unread,

Aug 29, 2022, 1:41:46 PM8/29/22

to Google Cloud Dataproc Discussions

Hello All

- resending this .. checking to see if anyone has done something similar i.e. monitored Apache spark(structured streaming) on Dataproc using Prometheus on GKE

Reply all

Reply to author

Forward

monitoring Apache spark (running on Dataproc) metrics from Prometheus on GKE - 2 questions

karan alang

karan alang

karan alang