logistic regression running slowly and failing on >3 iterations

325 views
Skip to first unread message

Jay Yonamine

unread,
Jun 17, 2013, 11:34:06 AM6/17/13
to spark...@googlegroups.com
Hi All, 

I'm trying to run a logistic regression on an hdfs file. The file is 1.5 gigs (110k rows, 1000 columns, read in from S3 with distcp).  My logistic regression code is just a slight modification of the sample code in order to read from hdfs instead of generating the data internally.  The code is here:  https://github.com/jayyonamine/spark-tutorial/blob/master/src/main/scala/sparktutorial/SparkLRhdfs2.scala

I spun up a cluster of 3 m1.large instances, and changed the config file based on reading through lots of previous threads to levels that seemed appropriate.  

#!/usr/bin/env bash

# Set Spark environment variables for your site in this file. Some useful
# variables to set are:
# - MESOS_NATIVE_LIBRARY, to point to your Mesos native library (libmesos.so)
# - SCALA_HOME, to point to your Scala installation
# - SPARK_CLASSPATH, to add elements to Spark's classpath
# - SPARK_JAVA_OPTS, to add JVM options
# - SPARK_MEM, to change the amount of memory used per node (this should
#   be in the same format as the JVM's -Xmx option, e.g. 300m or 1g).
# - SPARK_LIBRARY_PATH, to add extra search paths for native libraries.

export SCALA_HOME=/root/scala-2.9.3
export MESOS_NATIVE_LIBRARY=/usr/local/lib/libmesos.so

# Set Spark's memory per machine; note that you can also comment this out
# and have the master's SPARK_MEM variable get passed to the workers.
export SPARK_MEM=6g

# Set JVM options and Spark Java properties
#SPARK_JAVA_OPTS+=" -Dspark.local.dir=/mnt/spark,/mnt2/spark"
SPARK_JAVA_OPTS+=" -Dspark.worker.timeout=30000 -Dspark.akka.timeout=30000 -Dspark.storage.blockManagerHeartBeatMs=30000  -Dspark.akka.retry.wait=30000 -Dspark.akka.frameSize=10000
export SPARK_JAVA_OPTS


# Uncomment the following to connect shells to the cluster by default
#export MASTER=`cat /root/spark-ec2/cluster-url`


I have 2 questions: 
First, the code runs fine on 3 iterations but seems to be going very slowly (~130 seconds) given the small size of the data relative to the cluster size.  Is there anything I can do to my code to make it more efficient? 

Second, with any more than 3 iterations, the code fails.  The output is long so I've included it as an attachment.

thanks so much for the help, 

-Jay
error_log.txt

Ian O'Connell

unread,
Jun 17, 2013, 11:54:04 AM6/17/13
to spark...@googlegroups.com
Your log is full of OOM errors

13/06/17 15:27:20 ERROR local.LocalScheduler: Exception in task 6
java.lang.OutOfMemoryError: Java heap space
        at java.lang.String.substring(String.java:1913)
        at java.lang.String.split(String.java:2288)
        at java.lang.String.split(String.java:2355)
        at sparktutorial.SparkLRhdfs2$.readPoint(SparkLRhdfs2.scala:20)


Have you altered the 
    val sc = new SparkContext("local", "SparkLRhdfs2", "/home/jayyonamine/devel/spark", List("target/scala-2.9.2/spark-tutorial_2.9.2-0.1.jar"))

when you are running it?, seems obvious but local there will mean it won't use your cluster nodes at all, the log references the local scheduler a lot and only storing data on one node.





-Jay

--
You received this message because you are subscribed to the Google Groups "Spark Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-users...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Jay Yonamine

unread,
Jun 17, 2013, 2:11:44 PM6/17/13
to spark...@googlegroups.com, i...@ianoconnell.com
HI, thanks for that.  definitely an obvious mistake.  I fixed it to be spark://masteruri:7077 so that it looks just like this from another thread: spark://ec2-174-129-64-103.compute-1.amazonaws.com:7077.  FOllowing the instructions on that threat, I made sure that 7077 port was open on the master and slave instances.  But I'm still getting this error: 

[error] (run-main) spark.SparkException: Job failed: Error: Disconnected from Spark cluster
spark.SparkException: Job failed: Error: Disconnected from Spark cluster
        at spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:629)
        at spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:627)
        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
        at spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:627)
        at spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:297)
        at spark.scheduler.DAGScheduler.spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:358)
        at spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:102)
[trace] Stack trace suppressed: run last compile:run-main for the full output.
13/06/17 18:07:14 INFO network.ConnectionManager: Selector thread was interrupted!
java.lang.RuntimeException: Nonzero exit code: 1
        at scala.sys.package$.error(package.scala:27)
[trace] Stack trace suppressed: run last compile:run-main for the full output.
[error] (compile:run-main) Nonzero exit code: 1
[error] Total time: 171 s, completed Jun 17, 2013 6:07:14 PM
13/06/17 18:07:14 INFO cluster.SparkDeploySchedulerBackend: Executor 10 disconnected, so removing it
13/06/17 18:07:14 ERROR cluster.ClusterScheduler: Lost executor 10 on ip-10-232-52-182.ec2.internal: remote Akka client shutdown
13/06/17 18:07:14 INFO cluster.SparkDeploySchedulerBackend: Executor 11 disconnected, so removing it
13/06/17 18:07:14 ERROR cluster.ClusterScheduler: Lost executor 11 on ip-10-233-27-38.ec2.internal: remote Akka client shutdown\\


any ideas?

Josh Rosen

unread,
Jun 17, 2013, 2:24:11 PM6/17/13
to spark...@googlegroups.com
Are you trying to connect to a remote Spark EC2 master from your local machine?


Another thing we aren't totally clear about in the docs - it's not possible to submit jobs over a WAN. The reason is that the driver also spawns a server and it needs to be able to receive incoming connections from the scheduler.

Jay Yonamine

unread,
Jun 17, 2013, 2:40:20 PM6/17/13
to spark...@googlegroups.com
Hi Josh, I ssh into the master before running the job. 

Here are the steps i follow: 
./spark-ec2 -k file -i file.pem -s 2 -t m1.large -w 360 launch Spark  #spin up cluser
./spark-ec2 -k file -i file.pem login Spark ##ssh into the master

then I grab data from s3 and the scripts from github
$./build package
$./build
> run-main etc......

everything works fine when 
val sc = new SparkContext(args(5), "SparkLRhdfs2", "/root/spark/", List("target/scala-2.9.2/spark-tutorial_2.9.2-0.1.jar"))

args(5)=local , but when args(5)=spark://master:7077,  I get the errors. 

thanks for the help!

Ian O'Connell

unread,
Jun 17, 2013, 3:02:50 PM6/17/13
to spark...@googlegroups.com
You should try ssh'n to one of the worker nodes and take a peek at the logs. I would suspect something is amiss with the configuration and it will report immediate shutdown when it starts up. Possibly some JVM path issue or similar.

You really shouldn't have any cases without loosing nodes where it looses executors.


@Josh I haven't launched too much with standalone over mesos yet, but with the ec2 launch scripts do you know of some command we can give people to gather these sorts of logs to the master for posting/inspection? 

Jay Yonamine

unread,
Jun 17, 2013, 3:57:41 PM6/17/13
to spark...@googlegroups.com, i...@ianoconnell.com
Ian, thanks again for helping me out with this.  I ssh'd into one of the slave nodes, using ssh -i pem.pem ec2-...@ec2-numbers-compute-1.amazonaws.com and that appeared to work just fine.  It's not clear to me where the log files are?  

On the master node, they are in /root/spark/work   but the folders below appear to be the only ones on the slave instance and I can't seem to find the appropriate log file on in any of them. 

bin     dev   lib    lost+found  mnt2  nfs   root     srv  usr
boot    etc   lib64  media       mnt3  opt   sbin     sys  var
cgroup  home  local  mnt         mnt4  proc  selinux  tmp  vol



Thanks, 

-Jay

Ian O'Connell

unread,
Jun 17, 2013, 4:05:49 PM6/17/13
to spark...@googlegroups.com
I normally use it over mesos so this might require correction but, /var/lib/spark/ i think contains the logs by default. fall back approach is look for files called stdout or stderr on / and it'll pop out the location for the work directories.

Reynold Xin

unread,
Jun 17, 2013, 4:08:17 PM6/17/13
to spark...@googlegroups.com
In the standalone mode, the default location for worker logs are in

/path/to/spark/work

--
Reynold Xin, AMPLab, UC Berkeley

Josh Rosen

unread,
Jun 17, 2013, 4:29:53 PM6/17/13
to spark...@googlegroups.com, i...@ianoconnell.com
For Spark EC2 clusters using Spark standalone mode (the default mode for recent spark-ec2 versions), I'm pretty sure that the logs can be found in /root/spark/logs or in /root/spark/work/.

@Ian: it's probably possible to write a shell script that will SSH into the worker nodes and gather all of the Spark logs to the master node.  More generally, it would be cool to write a Spark EC2 diagnostics script that gathers the logs and collects other debugging information, such as the AMI id, Spark / Shark / Java / Scala versions, EC2 instance type, etc.

A fancier solution would be to use a log forwarder to automatically send the worker logs to the master.  In my personal Spark experiments, I've been playing around with using Splunk to gather and analyze the worker logs at the Spark master.  I've written a set of scripts to automate the deployment of Splunk on Spark EC2 (https://github.com/JoshRosen/spark-splunk) and started to build a list of queries that I've found useful (https://github.com/JoshRosen/spark-splunk/wiki).

A few disclaimers about my Splunk script:

- This is my own personal project, not an official Spark subproject.
- There may be good, free alternatives to Splunk, but I haven't tried them.
- I'm a total Splunk beginner, so this may not be the right way to configure or deploy Splunk forwarders.

Jay Yonamine

unread,
Jun 17, 2013, 4:33:13 PM6/17/13
to spark...@googlegroups.com
so in my master node, there are log files in /root/spark/work/
but again, when I ssh into the workers there is nothing in the spark folders. When i use the web interface (master:8080), I can pull up the error files on the workers just fine.  Here is the output of one of them, with the error message coming at the bottom. I also tried increasing the heartbeat in the spark/conf/spark-env.sh

13/06/17 20:21:19 INFO slf4j.Slf4jEventHandler: Slf4jEventHandler started
13/06/17 20:21:20 INFO actor.ActorSystemImpl: RemoteServerStarted@akka://sparkE...@ip-10-232-52-182.ec2.internal:59847
13/06/17 20:21:20 INFO executor.StandaloneExecutorBackend: Connecting to driver: akka://sp...@10.170.9.137:39863/user/StandaloneScheduler
13/06/17 20:21:20 INFO actor.ActorSystemImpl: RemoteClientStarted@akka://sp...@10.170.9.137:39863
13/06/17 20:21:20 INFO executor.StandaloneExecutorBackend: Successfully registered with driver
13/06/17 20:21:20 INFO slf4j.Slf4jEventHandler: Slf4jEventHandler started
13/06/17 20:21:20 INFO actor.ActorSystemImpl: RemoteServerStarted@akka://sp...@ip-10-232-52-182.ec2.internal:57851
13/06/17 20:21:20 INFO spark.SparkEnv: Connecting to BlockManagerMaster: akka://sp...@10.170.9.137:39863/user/BlockManagerMaster
13/06/17 20:21:20 INFO storage.MemoryStore: MemoryStore started with capacity 3.8 GB.
13/06/17 20:21:20 INFO storage.DiskStore: Created local directory at /mnt/spark/spark-local-20130617202120-b19c
13/06/17 20:21:20 INFO storage.DiskStore: Created local directory at /mnt2/spark/spark-local-20130617202120-d058
13/06/17 20:21:20 INFO network.ConnectionManager: Bound socket to port 41050 with id = ConnectionManagerId(ip-10-232-52-182.ec2.internal,41050)
13/06/17 20:21:20 INFO storage.BlockManagerMaster: Trying to register BlockManager
13/06/17 20:21:20 INFO actor.ActorSystemImpl: RemoteClientStarted@akka://sp...@10.170.9.137:39863
13/06/17 20:21:30 WARN storage.BlockManagerMaster: Error sending message to BlockManagerMaster in 1 attempts
java.util.concurrent.TimeoutException: Futures timed out after [10000] milliseconds
	at akka.dispatch.DefaultPromise.ready(Future.scala:870)
	at akka.dispatch.DefaultPromise.result(Future.scala:874)
	at akka.dispatch.Await$.result(Future.scala:74)
	at spark.storage.BlockManagerMaster.askDriverWithReply(BlockManagerMaster.scala:136)
	at spark.storage.BlockManagerMaster.tell(BlockManagerMaster.scala:115)
	at spark.storage.BlockManagerMaster.registerBlockManager(BlockManagerMaster.scala:46)
	at spark.storage.BlockManager.initialize(BlockManager.scala:138)
	at spark.storage.BlockManager.<init>(BlockManager.scala:123)
	at spark.storage.BlockManager.<init>(BlockManager.scala:130)
	at spark.SparkEnv$.createFromSystemProperties(SparkEnv.scala:102)
	at spark.executor.Executor.<init>(Executor.scala:68)
	at spark.executor.StandaloneExecutorBackend$$anonfun$receive$1.apply(StandaloneExecutorBackend.scala:39)
	at spark.executor.StandaloneExecutorBackend$$anonfun$receive$1.apply(StandaloneExecutorBackend.scala:36)
	at akka.actor.Actor$class.apply(Actor.scala:318)
	at spark.executor.StandaloneExecutorBackend.apply(StandaloneExecutorBackend.scala:16)
	at akka.actor.ActorCell.invoke(ActorCell.scala:626)
	at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:197)
	at akka.dispatch.Mailbox.run(Mailbox.scala:179)
	at akka.dispatch.ForkJoinExecutorConfigurator$MailboxExecutionTask.exec(AbstractDispatcher.scala:516)
	at akka.jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:259)
	at akka.jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:975)
	at akka.jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1479)
	at akka.jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
13/06/17 20:21:43 WARN storage.BlockManagerMaster: Error sending message to BlockManagerMaster in 2 attempts
java.util.concurrent.TimeoutException: Futures timed out after [10000] milliseconds
	at akka.dispatch.DefaultPromise.ready(Future.scala:870)
	at akka.dispatch.DefaultPromise.result(Future.scala:874)
	at akka.dispatch.Await$.result(Future.scala:74)
	at spark.storage.BlockManagerMaster.askDriverWithReply(BlockManagerMaster.scala:136)
	at spark.storage.BlockManagerMaster.tell(BlockManagerMaster.scala:115)
	at spark.storage.BlockManagerMaster.registerBlockManager(BlockManagerMaster.scala:46)
	at spark.storage.BlockManager.initialize(BlockManager.scala:138)
	at spark.storage.BlockManager.<init>(BlockManager.scala:123)
	at spark.storage.BlockManager.<init>(BlockManager.scala:130)
	at spark.SparkEnv$.createFromSystemProperties(SparkEnv.scala:102)
	at spark.executor.Executor.<init>(Executor.scala:68)
	at spark.executor.StandaloneExecutorBackend$$anonfun$receive$1.apply(StandaloneExecutorBackend.scala:39)
	at spark.executor.StandaloneExecutorBackend$$anonfun$receive$1.apply(StandaloneExecutorBackend.scala:36)
	at akka.actor.Actor$class.apply(Actor.scala:318)
	at spark.executor.StandaloneExecutorBackend.apply(StandaloneExecutorBackend.scala:16)
	at akka.actor.ActorCell.invoke(ActorCell.scala:626)
	at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:197)
	at akka.dispatch.Mailbox.run(Mailbox.scala:179)
	at akka.dispatch.ForkJoinExecutorConfigurator$MailboxExecutionTask.exec(AbstractDispatcher.scala:516)
	at akka.jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:259)
	at akka.jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:975)
	at akka.jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1479)
	at akka.jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
13/06/17 20:21:56 WARN storage.BlockManagerMaster: Error sending message to BlockManagerMaster in 3 attempts
java.util.concurrent.TimeoutException: Futures timed out after [10000] milliseconds
	at akka.dispatch.DefaultPromise.ready(Future.scala:870)
	at akka.dispatch.DefaultPromise.result(Future.scala:874)
	at akka.dispatch.Await$.result(Future.scala:74)
	at spark.storage.BlockManagerMaster.askDriverWithReply(BlockManagerMaster.scala:136)
	at spark.storage.BlockManagerMaster.tell(BlockManagerMaster.scala:115)
	at spark.storage.BlockManagerMaster.registerBlockManager(BlockManagerMaster.scala:46)
	at spark.storage.BlockManager.initialize(BlockManager.scala:138)
	at spark.storage.BlockManager.<init>(BlockManager.scala:123)
	at spark.storage.BlockManager.<init>(BlockManager.scala:130)
	at spark.SparkEnv$.createFromSystemProperties(SparkEnv.scala:102)
	at spark.executor.Executor.<init>(Executor.scala:68)
	at spark.executor.StandaloneExecutorBackend$$anonfun$receive$1.apply(StandaloneExecutorBackend.scala:39)
	at spark.executor.StandaloneExecutorBackend$$anonfun$receive$1.apply(StandaloneExecutorBackend.scala:36)
	at akka.actor.Actor$class.apply(Actor.scala:318)
	at spark.executor.StandaloneExecutorBackend.apply(StandaloneExecutorBackend.scala:16)
	at akka.actor.ActorCell.invoke(ActorCell.scala:626)
	at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:197)
	at akka.dispatch.Mailbox.run(Mailbox.scala:179)
	at akka.dispatch.ForkJoinExecutorConfigurator$MailboxExecutionTask.exec(AbstractDispatcher.scala:516)
	at akka.jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:259)
	at akka.jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:975)
	at akka.jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1479)
	at akka.jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
13/06/17 20:21:59 ERROR executor.StandaloneExecutorBackend: Error sending message to BlockManagerMaster [message = RegisterBlockManager(BlockManagerId(11, ip-10-232-52-182.ec2.internal, 41050),4081511301,Actor[akka://spark/user/BlockManagerActor1])]
spark.SparkException: Error sending message to BlockManagerMaster [message = RegisterBlockManager(BlockManagerId(11, ip-10-232-52-182.ec2.internal, 41050),4081511301,Actor[akka://spark/user/BlockManagerActor1])]
	at spark.storage.BlockManagerMaster.askDriverWithReply(BlockManagerMaster.scala:150)
	at spark.storage.BlockManagerMaster.tell(BlockManagerMaster.scala:115)
	at spark.storage.BlockManagerMaster.registerBlockManager(BlockManagerMaster.scala:46)
	at spark.storage.BlockManager.initialize(BlockManager.scala:138)
	at spark.storage.BlockManager.<init>(BlockManager.scala:123)
	at spark.storage.BlockManager.<init>(BlockManager.scala:130)
	at spark.SparkEnv$.createFromSystemProperties(SparkEnv.scala:102)
	at spark.executor.Executor.<init>(Executor.scala:68)
	at spark.executor.StandaloneExecutorBackend$$anonfun$receive$1.apply(StandaloneExecutorBackend.scala:39)
	at spark.executor.StandaloneExecutorBackend$$anonfun$receive$1.apply(StandaloneExecutorBackend.scala:36)
	at akka.actor.Actor$class.apply(Actor.scala:318)
	at spark.executor.StandaloneExecutorBackend.apply(StandaloneExecutorBackend.scala:16)
	at akka.actor.ActorCell.invoke(ActorCell.scala:626)
	at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:197)
	at akka.dispatch.Mailbox.run(Mailbox.scala:179)
	at akka.dispatch.ForkJoinExecutorConfigurator$MailboxExecutionTask.exec(AbstractDispatcher.scala:516)
	at akka.jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:259)
	at akka.jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:975)
	at akka.jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1479)
	at akka.jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [10000] milliseconds
	at akka.dispatch.DefaultPromise.ready(Future.scala:870)
	at akka.dispatch.DefaultPromise.result(Future.scala:874)
	at akka.dispatch.Await$.result(Future.scala:74)
	at spark.storage.BlockManagerMaster.askDriverWithReply(BlockManagerMaster.scala:136)
	... 19 more
13/06/17 20:21:59 INFO executor.StandaloneExecutorBackend: Connecting to driver: akka://sp...@10.170.9.137:39863/user/StandaloneScheduler
13/06/17 20:21:59 INFO executor.StandaloneExecutorBackend: Got assigned task 22
13/06/17 20:21:59 ERROR executor.StandaloneExecutorBackend: Received launchTask but executor was null

Ian O'Connell

unread,
Jun 17, 2013, 8:18:36 PM6/17/13
to spark...@googlegroups.com
Thanks for sharking the links, approach.

Your right an expanded script to try grab all those variables would be definitely be useful, mixed versions of things seem to be a common issue.

The cookbook for splunk certainly sounds interesting, the EMR approach for logs might get us around some of the gathering semantics to work on them. i.e. just have an S3 bucket and use a clusterid/nodename tree structure. That would setup for easy querying with spark itself I suppose. Though not much use for the initial launch/early dev issues that crop up often. (and queries probably far more verbose to get in place than your cookbook).

For the new user debugging scenario a script to pair with the Spark EC2 that pulls back the logs off the latest few tasks as a gzip to the master might be good to get a snapshot of what is going on?

Jay Yonamine

unread,
Jun 18, 2013, 2:42:12 PM6/18/13
to spark...@googlegroups.com, i...@ianoconnell.com
Hi guys, 

Thanks again for all the help, it's very much appreciated.  I'm still having the same problem, does anyone have any further suggestions?  Thanks so much. 

Ian O'Connell

unread,
Jun 18, 2013, 3:23:45 PM6/18/13
to Jay Yonamine, spark...@googlegroups.com
Ok to just separate the issue, does any hello world work against your cluster as is? you originally were just looking against localhost. (is it code/use case related or cluster init ?)

Jay Yonamine

unread,
Jun 28, 2013, 4:44:11 PM6/28/13
to spark...@googlegroups.com, Jay Yonamine, i...@ianoconnell.com
hi, sorry for the delay.  in trying to separate the issue, I've tried running this simple code: 

package sparktutorial

import spark.SparkContext
import SparkContext._
import spark._

object WordCount2 {
 def main(args: Array[String]) {
  val sc = new SparkContext(args(1), "Wordcount2", "/root/spark/", List("target/scala-2.9.2/spark-tutorial_2.9.2-0.1.jar"))
  val file = sc.textFile(args(2)).cache()
  val counts = file.flatMap(line => line.split(" "))
    .map(word => (word, 1))
    .reduceByKey(_ + _)
  println(counts)
  }
}

I get this to end my log:

 [success] Total time: 4 s, completed Jun 28, 2013 8:22:31 PM > 13/06/28 20:22:32 
ERROR client.Client$ClientActor: Connection to master failed; stopping client 13/06/28 20:22:32 
ERROR cluster.SparkDeploySchedulerBackend: Disconnected from Spark cluster! 13/06/28 20:22:32 
ERROR cluster.ClusterScheduler: Exiting due to error from cluster scheduler: Disconnected from Spark cluster

any ideas there?

Ian O'Connell

unread,
Jun 28, 2013, 6:35:30 PM6/28/13
to spark...@googlegroups.com, Jay Yonamine
You have no 'action' there only transformations, spark will never actually do anything. At the end you could try a collect or count to force it to do something
Reply all
Reply to author
Forward
0 new messages