SparkTachyonPi example error

29 views
Skip to first unread message

Puja Gupta

unread,
May 3, 2015, 8:41:44 PM5/3/15
to tachyo...@googlegroups.com
Hi,
I am new to spark and tachyon. I was trying to run sparkTachyonPi example. I am getting following error. I did the settings mentioned on tachyon wiki to setup spark+tachyon. any pointers appreaciated.

 ERROR executor.Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.NoSuchMethodError: tachyon.client.TachyonFS.exist(Ljava/lang/String;)Z
        at org.apache.spark.storage.TachyonBlockManager$$anonfun$createTachyonDirs$2.apply(TachyonBlockManager.scala:117)
        at org.apache.spark.storage.TachyonBlockManager$$anonfun$createTachyonDirs$2.apply(TachyonBlockManager.scala:106)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
        at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
        at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)

Thanks!

Haoyuan Li

unread,
May 3, 2015, 9:57:20 PM5/3/15
to Puja Gupta, tachyo...@googlegroups.com
What Spark and Tachyon versions are you running?

Haoyuan

--
You received this message because you are subscribed to the Google Groups "Tachyon Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tachyon-user...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Haoyuan Li

Puja Gupta

unread,
May 3, 2015, 10:01:34 PM5/3/15
to tachyo...@googlegroups.com, pmgup...@gmail.com
I am using Tachyon-0.6.1 ( i am using binary, did not build myself) and spark 1.3. Tachyon tests worked fine and spark examples other than tachyon works, so I think basic setup is fine.

Haoyuan Li

unread,
May 3, 2015, 10:11:38 PM5/3/15
to Puja Gupta, tachyo...@googlegroups.com
I see. By default, Spark 1.3.1 run w/ Tachyon 0.5.0. But the Spark master branch already updated to work to Tachyon 0.6.4.

If you want to use Spark 1.3.1 with Tachyon 0.6.x, you need to recompile Spark and update its pom file with the right Tachyon version.

Best,

Haoyuan

Puja Gupta

unread,
May 4, 2015, 12:44:14 AM5/4/15
to tachyo...@googlegroups.com, pmgup...@gmail.com
Thank you Haoyuan. I have tachyon 0.5 now. One last question, I get following error now

  INFO : Connecting local worker @ ri/127.0.0.1:29998
15/05/04 00:37:08 ERROR : java.net.ConnectException: Connection refused

I know my tachyon worker is not running on this ip but cant figure out which file to make changes. I tried changes to couple of files but still takes localhost.

Calvin Jia

unread,
May 4, 2015, 12:58:08 AM5/4/15
to tachyo...@googlegroups.com, pmgup...@gmail.com
Hi,

I recommend using Spark master with Tachyon 0.6.4 if at all possible. For the worker issue, which addresses did you specify in your conf/workers file?

Thanks,
Calvin

Puja Gupta

unread,
May 4, 2015, 1:12:33 AM5/4/15
to tachyo...@googlegroups.com, pmgup...@gmail.com
Hi Calvin, I am trying master/worker on same node. my conf/worker has ri and /etc/hosts has it resolved to 192.168.140.254 .My tachyon worker logs show

2015-05-04 01:08:41,281 INFO  WORKER_LOGGER (TachyonWorker.java:<init>) - The worker server tries to start @ ri/192.168.140.254:29998
2015-05-04 01:08:41,294 INFO  WORKER_LOGGER (TachyonWorker.java:start) - The worker server started @ ri/192.168.140.254:29998

and jps command shows them running too.
Also my slave file in spark has this address too.

Thanks,
Puja

cc

unread,
May 4, 2015, 5:50:31 AM5/4/15
to tachyo...@googlegroups.com, pmgup...@gmail.com
Hi, Puja

where is your error showing, tachyon master log? spark shell?

在 2015年5月4日星期一 UTC+8下午1:12:33,Puja Gupta写道:

Puja Gupta

unread,
May 4, 2015, 7:36:47 AM5/4/15
to tachyo...@googlegroups.com, pmgup...@gmail.com
I get that error when trying to run spark. I am running this command

./bin/run-example SparkTachyonPi

cc

unread,
May 4, 2015, 10:33:45 AM5/4/15
to tachyo...@googlegroups.com, pmgup...@gmail.com
you can try:

./bin/run-example SparkTachyonPi --conf spark.tachyonStore.url=tachyon://192.168.140.254:19998 

在 2015年5月4日星期一 UTC+8下午7:36:47,Puja Gupta写道:

cc

unread,
May 4, 2015, 10:36:26 AM5/4/15
to tachyo...@googlegroups.com, pmgup...@gmail.com
The problem is spark doesn't know where your tachyon master is since you didn't tell it. The best he can assume is `hostname`:19998, and if you type `hostname`, 
you may get "localhost" or "ri", but "ri" is resolved to "127.0.0.1". 

在 2015年5月4日星期一 UTC+8下午10:33:45,cc写道:

cc

unread,
May 4, 2015, 10:37:03 AM5/4/15
to tachyo...@googlegroups.com, pmgup...@gmail.com
Also, you can find more about spark configuration here: http://spark.apache.org/docs/latest/configuration.html

在 2015年5月4日星期一 UTC+8下午10:36:26,cc写道:

Puja Gupta

unread,
May 4, 2015, 11:21:34 AM5/4/15
to tachyo...@googlegroups.com, pmgup...@gmail.com
I had set it in conf/spark-defaults.conf so I wasn't giving at command line. I tried giving it the way you mentioned, still doesn't work.

15/05/04 11:19:09 INFO netty.NettyBlockTransferService: Server created on 45212
15/05/04 11:19:09 INFO storage.BlockManagerMaster: Trying to register BlockManager
15/05/04 11:19:09 INFO storage.BlockManagerMasterActor: Registering block manager localhost:45212 with 265.4 MB RAM, BlockManagerId(<driver>, localhost, 45212)
15/05/04 11:19:09 INFO storage.BlockManagerMaster: Registered BlockManager
Exception in thread "main" java.lang.NumberFormatException: For input string: "--conf"
        at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
        at java.lang.Integer.parseInt(Integer.java:492)

I appreaciate your help. Thanks!

cc

unread,
May 4, 2015, 11:30:07 AM5/4/15
to tachyo...@googlegroups.com, pmgup...@gmail.com
The error message seems to be different from former one. 

Could you share your conf/spark-defaults.conf, and show more log?  I'm curious to dig into the root, thanks. 

在 2015年5月4日星期一 UTC+8下午11:21:34,Puja Gupta写道:

Puja Gupta

unread,
May 4, 2015, 11:55:33 AM5/4/15
to tachyo...@googlegroups.com, pmgup...@gmail.com
Yes error changed when i give it on command line.
conf/spark-defaults.conf
spark.tachyonStore.url tachyon://ri:19998

conf/spark-env.sh
export SPARK_WORKER_DIR=/home/guptapu/spark-1.3.0-bin-hadoop2.3
export SPARK_CLASSPATH=/home/guptapu/tachyon-0.5.0/core/target/tachyon-0.5.0-jar-with-dependencies.jar
export SPARK_MASTER_IP=192.168.140.254

I tried giving ip to rule out dns issue earlier when it wasnt connecting.

logs:

15/05/04 11:46:55 WARN component.AbstractLifeCycle: FAILED org.spark-project.jetty.server.Server@68db7c81: java.net.BindException: Address already in use
java.net.BindException: Address already in use
        at sun.nio.ch.Net.bind0(Native Method)
        at sun.nio.ch.Net.bind(Net.java:444)
        at sun.nio.ch.Net.bind(Net.java:436)
        at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
        at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
        at org.spark-project.jetty.server.nio.SelectChannelConnector.open(SelectChannelConnector.java:187)
        at org.spark-project.jetty.server.AbstractConnector.doStart(AbstractConnector.java:316)
        at org.spark-project.jetty.server.nio.SelectChannelConnector.doStart(SelectChannelConnector.java:265)
        at org.spark-project.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
        at org.spark-project.jetty.server.Server.doStart(Server.java:293)
        at org.spark-project.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
        at org.apache.spark.ui.JettyUtils$.org$apache$spark$ui$JettyUtils$$connect$1(JettyUtils.scala:199)
        at org.apache.spark.ui.JettyUtils$$anonfun$2.apply(JettyUtils.scala:209)
        at org.apache.spark.ui.JettyUtils$$anonfun$2.apply(JettyUtils.scala:209)
        at org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1832)
        at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
        at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1823)
        at org.apache.spark.ui.JettyUtils$.startJettyServer(JettyUtils.scala:209)
        at org.apache.spark.ui.WebUI.bind(WebUI.scala:102)
        at org.apache.spark.SparkContext$$anonfun$12.apply(SparkContext.scala:307)
        at org.apache.spark.SparkContext$$anonfun$12.apply(SparkContext.scala:307)
        at scala.Option.foreach(Option.scala:236)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:307)
        at org.apache.spark.examples.SparkTachyonPi$.main(SparkTachyonPi.scala:32)
        at org.apache.spark.examples.SparkTachyonPi.main(SparkTachyonPi.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
15/05/04 11:46:55 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/kill,null}
15/05/04 11:46:55 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/,null}
15/05/04 11:46:55 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/static,null}
15/05/04 11:46:55 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump/json,null}
15/05/04 11:46:55 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump,null}
15/05/04 11:46:55 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/json,null}
15/05/04 11:46:55 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors,null}
15/05/04 11:46:55 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment/json,null}
15/05/04 11:46:55 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment,null}
15/05/04 11:46:55 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd/json,null}
15/05/04 11:46:55 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd,null}
15/05/04 11:46:55 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/json,null}
15/05/04 11:46:55 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage,null}
15/05/04 11:46:55 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool/json,null}
15/05/04 11:46:55 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool,null}
15/05/04 11:46:55 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/json,null}
15/05/04 11:46:55 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage,null}
15/05/04 11:46:55 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/json,null}
15/05/04 11:46:55 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages,null}
15/05/04 11:46:55 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job/json,null}
15/05/04 11:46:55 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job,null}
15/05/04 11:46:55 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/json,null}
15/05/04 11:46:55 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs,null}
15/05/04 11:46:55 WARN util.Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
15/05/04 11:46:55 INFO server.Server: jetty-8.y.z-SNAPSHOT
15/05/04 11:46:55 INFO server.AbstractConnector: Started SelectChann...@0.0.0.0:4041
15/05/04 11:46:55 INFO util.Utils: Successfully started service 'SparkUI' on port 4041.
15/05/04 11:46:55 INFO ui.SparkUI: Started SparkUI at http://ri:4041
15/05/04 11:46:57 INFO spark.SparkContext: Added JAR file:/home/guptapu/spark-1.3.0-bin-hadoop2.3/lib/spark-examples-1.3.0-hadoop2.3.0.jar at http://192.168.140.254:57307/jars/spark-examples-1.3.0-hadoop2.3.0.jar with timestamp 1430754417664
15/05/04 11:46:57 INFO executor.Executor: Starting executor ID <driver> on host localhost
15/05/04 11:46:57 INFO util.AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://sparkDriver@ri:51041/user/HeartbeatReceiver
15/05/04 11:46:57 INFO netty.NettyBlockTransferService: Server created on 33815
15/05/04 11:46:57 INFO storage.BlockManagerMaster: Trying to register BlockManager
15/05/04 11:46:57 INFO storage.BlockManagerMasterActor: Registering block manager localhost:33815 with 265.4 MB RAM, BlockManagerId(<driver>, localhost, 33815)
15/05/04 11:46:57 INFO storage.BlockManagerMaster: Registered BlockManager

Exception in thread "main" java.lang.NumberFormatException: For input string: "--conf"
        at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
        at java.lang.Integer.parseInt(Integer.java:492)
        at java.lang.Integer.parseInt(Integer.java:527)
        at scala.collection.immutable.StringLike$class.toInt(StringLike.scala:229)
        at scala.collection.immutable.StringOps.toInt(StringOps.scala:31)
        at org.apache.spark.examples.SparkTachyonPi$.main(SparkTachyonPi.scala:34)
        at org.apache.spark.examples.SparkTachyonPi.main(SparkTachyonPi.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

cc

unread,
May 4, 2015, 12:26:12 PM5/4/15
to tachyo...@googlegroups.com, pmgup...@gmail.com
You mean when you change ri to 192.168.140.254 in conf/spark-defaults.conf, it didn't work too?

but in the former error message, spark tried to connect 127.0.0.1:19998

let's try a ultimate solution: 
modify the last section in bin/run-example from:

exec "$FWDIR"/bin/spark-submit \
  --master $EXAMPLE_MASTER \
  --class $EXAMPLE_CLASS \
  "$SPARK_EXAMPLES_JAR" \
  "$@"

to 

exec "$FWDIR"/bin/spark-submit \
  --master $EXAMPLE_MASTER \
  --class $EXAMPLE_CLASS \
  --conf spark.tachyonStore.url=tachyon://192.168.140.254:19998 \
  "$SPARK_EXAMPLES_JAR" \
  "$@"

then run `./bin/run-example SparkTachyonPi`, if error happens again, please share the log. 

Also, could you share tachyon master's log?

在 2015年5月4日星期一 UTC+8下午11:55:33,Puja Gupta写道:

cc

unread,
May 4, 2015, 12:32:08 PM5/4/15
to tachyo...@googlegroups.com, pmgup...@gmail.com
let's try to modify the last section of bin/run-example from

exec "$FWDIR"/bin/spark-submit \
  --master $EXAMPLE_MASTER \
  --class $EXAMPLE_CLASS \
  "$SPARK_EXAMPLES_JAR" \
  "$@"

to 


exec "$FWDIR"/bin/spark-submit \
  --master $EXAMPLE_MASTER \
  --class $EXAMPLE_CLASS \
  --conf spark.tachyonStore.url=tachyon://192.168.140.254:19998 \
  "$SPARK_EXAMPLES_JAR" \
  "$@"

then run `./bin/run-example TachyonSparkPi`

I'm curious about the log of this command.


在 2015年5月4日星期一 UTC+8下午11:55:33,Puja Gupta写道:
Yes error changed when i give it on command line.

Puja Gupta

unread,
May 4, 2015, 3:18:16 PM5/4/15
to tachyo...@googlegroups.com, pmgup...@gmail.com
yes in earlier message it was trying to connect to 127.0.0.1 which was the problem. I think the ip is correct now but still a little far from working.

tachyon master logs:
2015-05-04 12:23:30,055 INFO  server.AbstractConnector (AbstractConnector.java:doStart) - Started SelectChann...@head.ri.cse.ohio-state.edu:19999
2015-05-04 12:23:30,055 INFO  MASTER_LOGGER (UIWebServer.java:startWebServer) - Tachyon Master Server started @ head.ri.cse.ohio-state.edu/192.168.140.254:19999
2015-05-04 12:23:30,056 INFO  MASTER_LOGGER (TachyonMaster.java:start) - The master server started @ head.ri.cse.ohio-state.edu/192.168.140.254:19998
2015-05-04 12:23:58,460 INFO  MASTER_LOGGER (MasterInfo.java:registerWorker) - registerWorker(): WorkerNetAddress: ri/192.168.140.254:29998
2015-05-04 12:23:58,461 INFO  MASTER_LOGGER (MasterInfo.java:registerWorker) - registerWorker(): MasterWorkerInfo( ID: 1430756000001, ADDRESS: ri/192.168.140.254:29998, TOTAL_BYTES: 1073741824, mUsedBytes: 0, mAvailableBytes: 1073741824, mLastUpdatedTimeMs: 1430756638461, mBlocks: [ ] )

Current error after the change in run-example:
15/05/04 15:15:49 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
15/05/04 15:15:49 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, PROCESS_LOCAL, 1340 bytes)
15/05/04 15:15:49 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, PROCESS_LOCAL, 1397 bytes)
15/05/04 15:15:49 INFO executor.Executor: Running task 1.0 in stage 0.0 (TID 1)
15/05/04 15:15:49 INFO executor.Executor: Running task 0.0 in stage 0.0 (TID 0)
15/05/04 15:15:49 INFO executor.Executor: Fetching http://192.168.140.254:54074/jars/spark-examples-1.3.0-hadoop2.3.0.jar with timestamp 1430766947004
15/05/04 15:15:50 INFO util.Utils: Fetching http://192.168.140.254:54074/jars/spark-examples-1.3.0-hadoop2.3.0.jar to /tmp/spark-9e642526-7b88-4c79-b250-efd0ead5fb18/userFiles-94222478-0b97-4e10-b16b-064e357f7317/fetchFileTemp4691570834654315045.tmp
15/05/04 15:15:50 INFO executor.Executor: Adding file:/tmp/spark-9e642526-7b88-4c79-b250-efd0ead5fb18/userFiles-94222478-0b97-4e10-b16b-064e357f7317/spark-examples-1.3.0-hadoop2.3.0.jar to class loader
15/05/04 15:15:50 INFO spark.CacheManager: Partition rdd_0_0 not found, computing it
15/05/04 15:15:50 INFO spark.CacheManager: Partition rdd_0_1 not found, computing it
15/05/04 15:15:50 INFO : Trying to connect master @ /192.168.140.254:19998
15/05/04 15:15:50 ERROR : Failed to connect (1) to master ri/192.168.140.254:19998 : java.net.ConnectException: Connection refused
15/05/04 15:15:51 ERROR : Failed to connect (2) to master ri/192.168.140.254:19998 : java.net.ConnectException: Connection refused
15/05/04 15:15:52 ERROR : Failed to connect (3) to master ri/192.168.140.254:19998 : java.net.ConnectException: Connection refused
15/05/04 15:15:53 ERROR : Failed to connect (4) to master ri/192.168.140.254:19998 : java.net.ConnectException: Connection refused
15/05/04 15:15:54 ERROR : Failed to connect (5) to master ri/192.168.140.254:19998 : java.net.ConnectException: Connection refused
15/05/04 15:15:55 WARN storage.TachyonBlockManager: Attempt 1 to create tachyon dir null failed
java.io.IOException: Failed to connect to master ri/192.168.140.254:19998 after 5 attempts
        at tachyon.client.TachyonFS.connect(TachyonFS.java:293)
        at tachyon.client.TachyonFS.getFileId(TachyonFS.java:1011)
        at tachyon.client.TachyonFS.exist(TachyonFS.java:633)

        at org.apache.spark.storage.TachyonBlockManager$$anonfun$createTachyonDirs$2.apply(TachyonBlockManager.scala:117)
        at org.apache.spark.storage.TachyonBlockManager$$anonfun$createTachyonDirs$2.apply(TachyonBlockManager.scala:106)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
        at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
        at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
        at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
        at org.apache.spark.storage.TachyonBlockManager.createTachyonDirs(TachyonBlockManager.scala:106)
        at org.apache.spark.storage.TachyonBlockManager.<init>(TachyonBlockManager.scala:57)
        at org.apache.spark.storage.BlockManager.tachyonStore$lzycompute(BlockManager.scala:93)
        at org.apache.spark.storage.BlockManager.tachyonStore(BlockManager.scala:87)
        at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:772)
        at org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:637)
        at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:154)
        at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:79)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:242)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
        at org.apache.spark.scheduler.Task.run(Task.scala:64)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: tachyon.org.apache.thrift.TException: Failed to connect to master ri/192.168.140.254:19998 after 5 attempts
        at tachyon.master.MasterClient.connect(MasterClient.java:178)
        at tachyon.client.TachyonFS.connect(TachyonFS.java:290)
        ... 28 more
Caused by: tachyon.org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused
        at tachyon.org.apache.thrift.transport.TSocket.open(TSocket.java:185)
        at tachyon.org.apache.thrift.transport.TFramedTransport.open(TFramedTransport.java:81)
        at tachyon.master.MasterClient.connect(MasterClient.java:156)
        ... 29 more
Caused by: java.net.ConnectException: Connection refused
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.net.Socket.connect(Socket.java:579)
        at tachyon.org.apache.thrift.transport.TSocket.open(TSocket.java:180)

cc

unread,
May 4, 2015, 7:40:28 PM5/4/15
to tachyo...@googlegroups.com, pmgup...@gmail.com
aha, your master log says: The master server started @ head.ri.cse.ohio-state.edu/192.168.140.254:19998

seems to be the FQDN problem. 192.168.140.254 maps to head.ri.xxx instead of ri. 

I think when you run "./tachyon-start.sh", you haven't added "ri" to "/etc/hosts" right?

If that's the case, please run "./tachyon-stop.sh" then "./tachyon-start.sh all SudoMount" to restart tachyon :)

在 2015年5月5日星期二 UTC+8上午3:18:16,Puja Gupta写道:
yes in earlier message it was trying to connect to 127.0.0.1 which was the problem. I think the ip is correct now but still a little far from working.

tachyon master logs:
2015-05-04 12:23:30,055 INFO  server.AbstractConnector (AbstractConnector.java:doStart) - Started SelectChannelConnector@head.ri.cse.ohio-state.edu:19999

Puja Gupta

unread,
May 4, 2015, 7:48:02 PM5/4/15
to tachyo...@googlegroups.com, pmgup...@gmail.com
My /etc/hosts looks like this
192.168.140.254 ri head head.ri head.ri.cse.ohio-state.edu
127.0.0.1       localhost.localdomain   localhost

ri and head.xx are alias. I should have told this before, sorry

Even if i give hard-coded ip instead of hostname I get connection refused error. I don't have sudo permissions as it is college cluster, so I had done tachyon-start all Mount instead of SudoMount. Does this make difference?

cc

unread,
May 4, 2015, 9:45:34 PM5/4/15
to tachyo...@googlegroups.com, pmgup...@gmail.com
wired... what did you set to TACHYON_MASTER_ADDRESS in conf/tachyon-env.sh ?

btw, can you ping 192.168.140.254? 

also, what's your `ifconfig` or `ip addr`?

在 2015年5月5日星期二 UTC+8上午7:48:02,Puja Gupta写道:
Message has been deleted

Puja Gupta

unread,
May 4, 2015, 11:36:42 PM5/4/15
to tachyo...@googlegroups.com, pmgup...@gmail.com
Ok I moved all masters and workers to another node. It worked for me with similar settings as before. Thanks cc for your help. I really appreciate it :)

cc

unread,
May 5, 2015, 2:38:33 AM5/5/15
to tachyo...@googlegroups.com, pmgup...@gmail.com
Glad to know it works. Really wired. 

在 2015年5月5日星期二 UTC+8上午11:36:42,Puja Gupta写道:
Reply all
Reply to author
Forward
0 new messages