Spark memory issue

766 views
Skip to first unread message

Sindhuja Balaji

unread,
Nov 1, 2016, 7:45:14 PM11/1/16
to spark-conn...@lists.datastax.com
I am having a 3 node cluster and getting below error when I ran a machine learning algorithm. I have also provided the spark-env.sh configurations please let me know how to fix this error.

Error

16/11/01 17:36:06 ERROR TaskSchedulerImpl: Lost executor 4 on cassandra104-01.dev.wgu.edu: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/01 17:36:06 WARN TaskSetManager: Lost task 4.3 in stage 33.0 (TID 130, cassandra104-01.dev.wgu.edu): ExecutorLostFailure (executor 4 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.

spark-env.sh
​export JAVA_HOME=/usr/lib/jvm/java
export SPARK_MASTER_IP=10.20.20.165
export SPARK_WORKER_CORES=2
export SPARK_WORKER_MEMORY=2g
export SPARK_WORKER_INSTANCES=2
export SPARK_LOCAL_IP=10.20.20.165

spark-default.conf

#spark.executor.extraClassPath      /usr/local/spark/lib/spark-cassandra-connector-assembly-2.0.0-M1-2-g70018a6.jar
spark.executor.extraClassPath      /usr/local/spark/lib/spark-cassandra-connector-1.6.0-M1-s_2.10.jar:/usr/local/spark/lib/cassandra-driver-core-3.0.0.jar:/usr/local/spark/lib/slf4j-api-1.7.5.jar:/usr/local/spark/lib/guava-16.0.1.jar:/usr/local/spark/lib/metrics-core-3.0.2.jar:/usr/local/spark/lib/netty-3.9.0.Final.jar

spark.driver.extraClassPath      /usr/local/spark/lib/spark-cassandra-connector-1.6.0-M1-s_2.10.jar:/usr/local/spark/lib/cassandra-driver-core-3.0.0.jar:/usr/local/spark/lib/slf4j-api-1.7.5.jar:/usr/local/spark/lib/guava-16.0.1.jar:/usr/local/spark/lib/metrics-core-3.0.2.jar:/usr/local/spark/lib/netty-3.9.0.Final.jar
     

--
Thanks,
Sindhuja

Russell Spitzer

unread,
Nov 1, 2016, 8:51:54 PM11/1/16
to spark-conn...@lists.datastax.com
I would recommend against manually setting up the classpath like that (use spark submit instead :) )

But as to your error, you need to check the executor log on cassandra104-01.dev.wgu.edu, should be in work/app-#/#/std[out|err]

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

Sindhuja Balaji

unread,
Nov 1, 2016, 10:47:39 PM11/1/16
to spark-conn...@lists.datastax.com
Yeah In the executor log I could see no class def found error.
The error has been thrown when I write data to Cassandra. The code to write is as below. Am I missing something. Do  I need to add the artifact jsr166e in my classpath.

     val options = Map("table" -> "tr_otp_output"/*"tr_otp_merged_data"*/, "keyspace" -> "edw_data_import")

     df.write.format("org.apache.spark.sql.cassandra").options(options).save()



java.lang.NoClassDefFoundError: com/twitter/jsr166e/LongAdder
	at org.apache.spark.metrics.OutputMetricsUpdater$TaskMetricsSupport$class.$init$(OutputMetricsUpdater.scala:107)
	at org.apache.spark.metrics.OutputMetricsUpdater$TaskMetricsUpdater.<init>(OutputMetricsUpdater.scala:151)
	at org.apache.spark.metrics.OutputMetricsUpdater$.apply(OutputMetricsUpdater.scala:75)
	at com.datastax.spark.connector.writer.TableWriter.write(TableWriter.scala:141)
	at com.datastax.spark.connector.RDDFunctions$$anonfun$saveToCassandra$1.apply(RDDFunctions.scala:37)
	at com.datastax.spark.connector.RDDFunctions$$anonfun$saveToCassandra$1.apply(RDDFunctions.scala:37)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.



--
Thanks,
Sindhuja

kant kodali

unread,
Nov 1, 2016, 10:59:37 PM11/1/16
to spark-conn...@lists.datastax.com
You are referencing a class com/twitter/jsr166e/LongAdder Which is part of some jar. And I believe that jar needs to be under spark/jars on all executor and driver machines (if you are running in a client mode (which is the default mode)) and just executor machines if you are running using a cluster mode. or you may be able to get away with it using a FAT jar that has all transitive dependencies as well.

As @Russell pointed out using Spark-Submit is the best way because spark-submit overrides the default class loader. 
 

Sindhuja Balaji

unread,
Nov 1, 2016, 11:06:40 PM11/1/16
to spark-conn...@lists.datastax.com
I added the jar path in spark submit, but still it is not able to reference the jar and getting the same no class found error. Is the below command correct? I have the other jars added in the extraclasspath in spark-defaults.conf

/usr/local/spark/bin/spark-submit \
  --class main.scala.StudentAssessmentClassifier \
  --master spark://10.20.20.165:7077 \
  --jars /home/sindhuja.dhamodaran/poc/jsr166e-1.1.0.jar \
  /home/sindhuja.dhamodaran/poc/ScalaSaprkIntegration1.4.jar 


On Tue, Nov 1, 2016 at 8:59 PM, kant kodali <kant...@gmail.com> wrote:
You are referencing a class com/twitter/jsr166e/LongAdder Which is part of some jar. And I believe that jar needs to be under spark/jars on all executor and driver machines (if you are running in a client mode (which is the default mode)) and just executor machines if you are running using a cluster mode. or you may be able to get away with it using a FAT jar that has all transitive dependencies as well.

As @Russell pointed out using Spark-Submit is the best way because spark-submit overrides the default class loader. 
 

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.



--
Thanks,
Sindhuja

Russell Spitzer

unread,
Nov 1, 2016, 11:23:55 PM11/1/16
to spark-conn...@lists.datastax.com
It's hard for me to know without your build file. But in almost all cases the right thing to do is use spark packages
https://spark-packages.org/package/datastax/spark-cassandra-connector

Then to run
spark-submit --master spark://10.20.20.165:7077 --packages datastax:spark-cassandra-connector:2.0.0-M2-s_2.10 yourJar

Of course you change that depending on your Connector Version and Scala Version. In general if you are using --jars for anything but your own complete dependencies it is wrong. The only way around using --packages and --jars is to build a fat jar for your assembly code. The main issue you run into with --jars is that it only pushes the jar itself and none of the dependencies.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.



--
Thanks,
Sindhuja

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.
--

Ashish Raisardar

unread,
Mar 24, 2017, 5:33:20 PM3/24/17
to DataStax Spark Connector for Apache Cassandra
Worked for me -

I have passed jar in spark-shell -


spark-shell --driver-memory 8g --executor-memory 8g --total-executor-cores 16 --jars /root/pratik/test/lib/jsr166e-1.1.0.jar,/root/pratik/test/lib/cassandra-driver-core-3.0.2.jar,/root/pratik/test/lib/guava-19.0.jar,/root/pratik/test/lib/spark-cassandra-connector_2.10-1.6.0.jar,/root/pratik/test/lib/spark-csv_2.10-1.4.0.jar --conf "spark.cassandra.connection.host=3.26.5.200,3.26.5.204,3.26.4.154" --conf "spark.cassandra.auth.username=xxxxxxx" --conf "spark.cassandra.auth.password=xxxxx"
Reply all
Reply to author
Forward
0 new messages