monitoring Apache spark (running on Dataproc) metrics from Prometheus on GKE - 2 questions

396 views
Skip to first unread message

karan alang

unread,
Aug 29, 2022, 12:53:14 AM8/29/22
to Google Cloud Dataproc Discussions

Question 1 :

I've a structured streaming job, which is running on Dataproc .. and prometheus installation on GKE (kubernetes) Prometheus install takes metrics from Kafka, kube-state-metrics etc which installed on the same GKE cluster, but different namespeces

Is there a way to monitor Spark jobs running on Dataproc, using Prometheus on GKE ?

Should the Dataproc & GKE pods be on the same VPC  ?

Any pointers on this is appreciated !

2nd Question :

Here is a link, which describe the steps to monitor Spark jobs on Dataproc using Prometheus installed on the same Dataproc cluster.

https://github.com/GoogleCloudDataproc/initialization-actions/blob/master/prometheus/prometheus.sh

I was able to set this up on Dataproc, and i see only the following metrics.

```

spark_app_name spark_application_<app_id>_applicationMaster_numContainersPendingAllocate spark_application_<app_id>_applicationMaster_numExecutorsFailed spark_application_<app_id>_applicationMaster_numExecutorsRunning spark_application_<app_id>_applicationMaster_numLocalityAwareTasks spark_application_<app_id>_applicationMaster_numReleasedContainers

```

Note - this uses statsd, not PrometheusServlet - which can also be used to monitor apache spark, and seems to have a lot of Master/Driver/Exector metrics

I've made following changes to prometheus.sh & re-created the Dataproc cluster :

  1. added PrometheusServlet related changes to $SPARK_HOME/conf/metrics.properties
  2. added Master/Driver/Executor scraping in prometheus.yaml

Both these changes are in the prometheus.sh (attached) included in the initialization-action

Here is the script to create the Dataproc cluster:

```

gcloud beta dataproc clusters create $CNAME \ --enable-component-gateway \ --bucket $BUCKET \ --region $REGION \ --zone $ZONE \ --no-address --master-machine-type $TYPE \ --master-boot-disk-size 500 \ --master-boot-disk-type pd-ssd \ --num-workers $NUM_WORKER \ --worker-machine-type $TYPE \ --worker-boot-disk-type pd-ssd \ --worker-boot-disk-size 1000 \ --image-version $IMG_VERSION \ --scopes 'https://www.googleapis.com/auth/cloud-platform' \ --project $PROJECT \ --initialization-actions 'gs://dataproc-spark-configs/pip_install.sh','gs://dataproc-spark-configs/connectors-feb1.sh','gs://dataproc-spark-configs/prometheus.sh' \ --metadata 'gcs-connector-version=2.0.0' \ --metadata 'bigquery-connector-version=1.2.0' \ --properties 'dataproc:dataproc.logging.stackdriver.job.driver.enable=true,dataproc:job.history.to-gcs.enabled=true,spark:spark.dynamicAllocation.enabled=true,spark:spark.eventLog.dir=gs://dataproc-spark-logs/joblogs,spark:spark.history.fs.logDirectory=gs://dataproc-spark-logs/joblogs' 

```

Command to run the Apache spark streaming job : 

 gcloud dataproc jobs submit pyspark main.py \ --cluster $CLUSTER \ --properties ^#^spark.jars.packages=org.apache.spark:spark-sql-kafka-0-10_2.12:3.1.2,org.mongodb.spark:mongo-spark-connector_2.12:3.0.2#spark.dynamicAllocation.enabled=true#spark.shuffle.service.enabled=true#spark.sql.autoBroadcastJoinThreshold=150m#spark.ui.prometheus.enabled=true#spark.kubernetes.driver.annotation.prometheus.io/scrape=true#spark.kubernetes.driver.annotation.prometheus.io/path=/metrics/executors/prometheus/#spark.kubernetes.driver.annotation.prometheus.io/port=4040#spark.app.name=structuredstreaming-versa\ --region us-east1 


 Following are set in the SparkConf (as part of the command above), when running the Spark (StructuredStream) job in Dataproc : 

 spark.ui.prometheus.enabled=true spark.kubernetes.driver.annotation.prometheus.io/scrape=true spark.kubernetes.driver.annotation.prometheus.io/path=/metrics/executors/prometheus/ spark.kubernetes.driver.annotation.prometheus.io/port=4040 


Any inputs on this ?

What needs to be done to debug/enable this ?

tia!


P.S : attached is the modified prometheus.sh, used in creation of the Dataproc cluster 

prometheus.sh

karan alang

unread,
Aug 29, 2022, 12:54:27 AM8/29/22
to Google Cloud Dataproc Discussions

karan alang

unread,
Aug 29, 2022, 1:41:46 PM8/29/22
to Google Cloud Dataproc Discussions
Hello All 
- resending this .. checking to see if anyone has done something similar i.e. monitored Apache spark(structured streaming) on Dataproc using Prometheus on GKE
Reply all
Reply to author
Forward
0 new messages