I've created a simple scala application in order to reproduce the problem I have. This application just use SparkContext to read data from Apache Cassandra 3.9 hosted on Amazon EC2 instance.
This works fine if I'm running spark locally, but when I deploy application to Spark standalone one node cluster, it connects to cassandra cluster and disconnects after some time without reading data.
I'm using:
scala 2.11.6
spark 2.1.0 (both for standalone spark and dependency in application)
spark-cassandra-connector 2.0.0-M3
Cassandra Java driver 3.0.0
Apache Cassandra 3.9
My spark configuration is:
object SparkContextLoader {
val sparkConf = new SparkConf()
sparkConf.setMaster("spark://127.0.1.1:7077")
sparkConf.set("spark.cores_max","2")
sparkConf.set("spark.executor.memory","2g")
sparkConf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
sparkConf.setAppName("Test application")
sparkConf.set("spark.cassandra.connection.host", "xxx.xxx.xxx.xxx")
sparkConf.set("spark.cassandra.connection.port", "9042")
sparkConf.set("spark.ui.port","4041")
val oneJar="/samplesparkmaven/target/samplesparkmaven-jar.jar"
sparkConf.setJars(List(oneJar))
@transient val sc = new SparkContext(sparkConf)
}
Any idea what could be the problem?
Thanks,
Zoran
I see two things
Guava conflicts with the driver see the scc FAQ . TLDR don't include the Cassandra Java driver as a dependency.
Issues with not using spark submit, manually adding your app jar.
The code listed also has no Cassandra interaction but I assume this was just skipped.
There should be some error somewhere with most of these problems, check your executor logs for them if nothing pops up in the Spark driver
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.
The code listed also has no Cassandra interaction but I assume this was just skipped.
val sc = SparkContextLoader.getSC def runSparkJob():Unit={ val table =sc.cassandraTable("prosolo_logs_zj", "logevents") println(table.collect().mkString("\n")) }
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
17/02/14 23:11:25 INFO SparkContext: Running Spark version 2.1.0
17/02/14 23:11:26 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/02/14 23:11:27 WARN Utils: Your hostname, zoran-Latitude-E5420 resolves to a loopback address: 127.0.1.1; using 192.168.2.68 instead (on interface wlp2s0)
17/02/14 23:11:27 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
17/02/14 23:11:27 INFO SecurityManager: Changing view acls to: zoran
17/02/14 23:11:27 INFO SecurityManager: Changing modify acls to: zoran
17/02/14 23:11:27 INFO SecurityManager: Changing view acls groups to:
17/02/14 23:11:27 INFO SecurityManager: Changing modify acls groups to:
17/02/14 23:11:27 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(zoran); groups with view permissions: Set(); users with modify permissions: Set(zoran); groups with modify permissions: Set()
17/02/14 23:11:28 INFO Utils: Successfully started service 'sparkDriver' on port 33995.
17/02/14 23:11:28 INFO SparkEnv: Registering MapOutputTracker
17/02/14 23:11:28 INFO SparkEnv: Registering BlockManagerMaster
17/02/14 23:11:28 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
17/02/14 23:11:28 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
17/02/14 23:11:28 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-7b25a4cc-cb37-4332-a59b-e36fa45511cd
17/02/14 23:11:28 INFO MemoryStore: MemoryStore started with capacity 870.9 MB
17/02/14 23:11:28 INFO SparkEnv: Registering OutputCommitCoordinator
17/02/14 23:11:28 INFO Utils: Successfully started service 'SparkUI' on port 4041.
17/02/14 23:11:28 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.2.68:4041
17/02/14 23:11:28 INFO SparkContext: Added JAR /samplesparkmaven/target/samplesparkmaven-jar.jar at spark://192.168.2.68:33995/jars/samplesparkmaven-jar.jar with timestamp 1487142688817
17/02/14 23:11:28 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://127.0.1.1:7077...
17/02/14 23:11:28 INFO TransportClientFactory: Successfully created connection to /127.0.1.1:7077 after 62 ms (0 ms spent in bootstraps)
17/02/14 23:11:29 INFO StandaloneSchedulerBackend: Connected to Spark cluster with app ID app-20170214231129-0016
17/02/14 23:11:29 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 36901.
17/02/14 23:11:29 INFO NettyBlockTransferService: Server created on 192.168.2.68:36901
17/02/14 23:11:29 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
17/02/14 23:11:29 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.2.68, 36901, None)
17/02/14 23:11:29 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.2.68:36901 with 870.9 MB RAM, BlockManagerId(driver, 192.168.2.68, 36901, None)
17/02/14 23:11:29 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.2.68, 36901, None)
17/02/14 23:11:29 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.2.68, 36901, None)
17/02/14 23:11:29 INFO StandaloneSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
17/02/14 23:11:29 INFO NettyUtil: Found Netty's native epoll transport in the classpath, using it
17/02/14 23:11:31 INFO Cluster: New Cassandra host /xxx.xxx.xxx.xxx:9042 added
17/02/14 23:11:31 INFO CassandraConnector: Connected to Cassandra cluster: Test Cluster
17/02/14 23:11:32 INFO SparkContext: Starting job: collect at SparkConnector.scala:28
17/02/14 23:11:32 INFO DAGScheduler: Got job 0 (collect at SparkConnector.scala:28) with 6 output partitions
17/02/14 23:11:32 INFO DAGScheduler: Final stage: ResultStage 0 (collect at SparkConnector.scala:28)
17/02/14 23:11:32 INFO DAGScheduler: Parents of final stage: List()
17/02/14 23:11:32 INFO DAGScheduler: Missing parents: List()
17/02/14 23:11:32 INFO DAGScheduler: Submitting ResultStage 0 (CassandraTableScanRDD[0] at RDD at CassandraRDD.scala:18), which has no missing parents
17/02/14 23:11:32 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 8.4 KB, free 870.9 MB)
17/02/14 23:11:32 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 4.4 KB, free 870.9 MB)
17/02/14 23:11:32 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.2.68:36901 (size: 4.4 KB, free: 870.9 MB)
17/02/14 23:11:32 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:996
17/02/14 23:11:32 INFO DAGScheduler: Submitting 6 missing tasks from ResultStage 0 (CassandraTableScanRDD[0] at RDD at CassandraRDD.scala:18)
17/02/14 23:11:32 INFO TaskSchedulerImpl: Adding task set 0.0 with 6 tasks
17/02/14 23:11:39 INFO CassandraConnector: Disconnected from Cassandra cluster: Test Cluster
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.
Your executor log looks very strange. Did you merge the logs of several applications together (/var/lib/spark/worker/appid/#/? )It doesn't seem to even have a Cassandra connection initialization. Also is that stdout and stderr merged together? Does the driver UI show anything when running? How about the worker logs?
I'm going to sleep now but I'll check this out tomorrow if no one else gets back to you.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.
17/02/14 23:11:31 INFO Cluster: New Cassandra host /xxx.xxx.xxx.xxx:9042 added
17/02/14 23:11:31 INFO CassandraConnector: Connected to Cassandra cluster: Test Cluster
Thanks,
Zoran
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.
--
--
--