Hi
Kai,
now I try to run my model on databricks cluster multiple nodes say 1 worker with 4 cores and my spark cluster configurations are as follows:-
spark.serializer org.apache.spark.serializer.JavaSerializer
spark.databricks.delta.preview.enabled true
spark.sql.files.openCostInBytes 100
spark.executor.cores 4
spark.driver.maxResultSize 10g
spark.default.parallelis 4
spark.cores.max 8
spark.sql.adaptive.enabled false
spark.sql.files.maxPartitionBytes 200
spark.default.parallelism 4
spark.driver.cores 4
my issue now is one core only is used while applying any action on my spark data frame I read from storage in Azure Data Lake Storage and my container has 320 images with a total size of about 7 MB I tried all possible configurations that I found as mentioned in the above but unfortunately, I am unable to achieve full parallelism or full utilization of my cluster cores only one task is active using only one core and that leading to take more time. on the other hand, while I was working on local mode (only one node cluster) I observed spark use all cores available and the number of tasks =4 when the driver is configured by 4 cores.
so kindly any advice.
best,
mohammed