Running Spark on Tachyon (v0.5.0) using spark-ec2 script (spark-1.2.1)

24 views
Skip to first unread message

Junghoon Kang

unread,
Apr 4, 2015, 5:16:30 PM4/4/15
to tachyo...@googlegroups.com
Hello,

I am having trouble following Running Tachyon on a Cluster guide for v0.5.0: I am getting "Failed to connect (#) to master" error
Here is what I did:
  1. I started AWS EC2 cluster of two nodes (1 master, 1 slave) using spark-ec2.sh script (I am using spark-1.2.1).

  2. SSH into the master node.

  3. According to the guide, http://tachyon-project.org/v0.5.0/Running-Tachyon-on-a-Cluster.html,
    it says "If you use Spark to launch an EC2 cluster, Tachyon will be installed and configured by default".
    So I did not change any configuration.

  4. Here are the lines I typed inside master's /root/tachyon/bin diretory and the lines got printed on the terminal:

Any ideas on what I am doing wrong?



Thank you in advance,

Junghoon Kang

cc

unread,
Apr 4, 2015, 11:59:47 PM4/4/15
to tachyo...@googlegroups.com
could you share your logs for master?

在 2015年4月5日星期日 UTC+8上午5:16:30,Junghoon Kang写道:

Junghoon Kang

unread,
Apr 5, 2015, 1:08:13 AM4/5/15
to tachyo...@googlegroups.com
Here is the log I got from the master node ( /root/tachyon/logs/maste...@10.36.114.86_04-05-2015 ):

2015-04-05 05:00:11,235 INFO  MASTER_LOGGER (Image.java:load) - Image /root/tachyon/libexec/../journal/image.data does not exist.
2015-04-05 05:00:11,239 INFO  MASTER_LOGGER (EditLog.java:load) - Edit Log /root/tachyon/libexec/../journal/log.data does not exist.
2015-04-05 05:00:11,239 INFO  MASTER_LOGGER (Image.java:create) - Creating the image file: /root/tachyon/libexec/../journal/image.data.tmp
2015-04-05 05:00:11,762 INFO  MASTER_LOGGER (Image.java:create) - Succefully created the image file: /root/tachyon/libexec/../journal/image.data.tmp
2015-04-05 05:00:11,763 INFO  MASTER_LOGGER (Image.java:create) - Renamed /root/tachyon/libexec/../journal/image.data.tmp to /root/tachyon/libexec/../journal/image.data
2015-04-05 05:00:11,763 INFO  MASTER_LOGGER (EditLog.java:<init>) - Creating edit log file /root/tachyon/libexec/../journal/log.data
2015-04-05 05:00:11,766 INFO  MASTER_LOGGER (EditLog.java:<init>) - Created file /root/tachyon/libexec/../journal/log.data
2015-04-05 05:00:12,009 INFO  server.Server (Server.java:doStart) - jetty-7.x.y-SNAPSHOT
2015-04-05 05:00:12,268 INFO  handler.ContextHandler (ContextHandler.java:startContext) - started o.e.j.w.WebAppContext{/,file:/root/tachyon/core/src/main/webapp/},/root/tachyon/libexec/../core/src/main/webapp
2015-04-05 05:00:12,345 INFO  server.AbstractConnector (AbstractConnector.java:doStart) - Started SelectChannelConnector@localhost:19999
2015-04-05 05:00:12,346 INFO  MASTER_LOGGER (UIWebServer.java:startWebServer) - Tachyon Master Server started @ localhost/127.0.0.1:19999
2015-04-05 05:00:12,346 INFO  MASTER_LOGGER (TachyonMaster.java:start) - The master server started @ localhost/127.0.0.1:19998

myairia

unread,
Apr 5, 2015, 1:17:29 AM4/5/15
to Junghoon Kang, tachyo...@googlegroups.com
The log shows master is started @localhost/127.0.0.1, could you show conf/tachyon-env.sh too, I guess TACHYON_MASTER_ADDRESS is defined as “localhost” there.  

--
You received this message because you are subscribed to the Google Groups "Tachyon Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tachyon-user...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Calvin Jia

unread,
Apr 5, 2015, 1:33:12 AM4/5/15
to tachyo...@googlegroups.com, junghoon....@gmail.com
Hi,

If possible, try deploying the newest version of Tachyon with the script. Also, are you using the newest spark-ec2 script? I've heard reports of this specific issue using it.

Thanks,
Calvin

Junghoon Kang

unread,
Apr 5, 2015, 1:35:42 AM4/5/15
to tachyo...@googlegroups.com, junghoon....@gmail.com
The IP address of the master is different from the log above because I closed the EC2 cluster and started a new one; but with the same version and every other things.
Here is the content of tachyon-env.sh:

#!/usr/bin/env bash

# This file contains environment variables required to run Tachyon. Copy it as tachyon-env.sh and
# edit that to configure Tachyon for your site. At a minimum,
# the following variables should be set:
#
# - JAVA_HOME, to point to your JAVA installation
# - TACHYON_MASTER_ADDRESS, to bind the master to a different IP address or hostname
# - TACHYON_UNDERFS_ADDRESS, to set the under filesystem address.
# - TACHYON_WORKER_MEMORY_SIZE, to set how much memory to use (e.g. 1000mb, 2gb) per worker
# - TACHYON_RAM_FOLDER, to set where worker stores in memory data
#
# The following gives an example:

if [[ `uname -a` == Darwin* ]]; then
  # Assuming Mac OS X
  export JAVA_HOME=$(/usr/libexec/java_home)
  export TACHYON_RAM_FOLDER=/Volumes/ramdisk
  export TACHYON_JAVA_OPTS="-Djava.security.krb5.realm= -Djava.security.krb5.kdc="
else
  # Assuming Linux
  if [ -z "$JAVA_HOME" ]; then
    export JAVA_HOME=/usr/lib/jvm/java-1.7.0
  fi
  export TACHYON_RAM_FOLDER=/mnt/ramdisk
fi

export JAVA="$JAVA_HOME/bin/java"
export TACHYON_MASTER_ADDRESS=ec2-23-22-44-16.compute-1.amazonaws.com
export TACHYON_UNDERFS_ADDRESS=hdfs://ec2-23-22-44-16.compute-1.amazonaws.com:9000
#export TACHYON_UNDERFS_ADDRESS=hdfs://localhost:9000
export TACHYON_WORKER_MEMORY_SIZE=6154MB
export TACHYON_UNDERFS_HDFS_IMPL=org.apache.hadoop.hdfs.DistributedFileSystem

CONF_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"

export TACHYON_JAVA_OPTS+="
  -Dlog4j.configuration=file:$CONF_DIR/log4j.properties
  -Dtachyon.debug=false
  -Dtachyon.underfs.address=$TACHYON_UNDERFS_ADDRESS
  -Dtachyon.underfs.hdfs.impl=$TACHYON_UNDERFS_HDFS_IMPL
  -Dtachyon.data.folder=$TACHYON_UNDERFS_ADDRESS/tachyon/data
  -Dtachyon.workers.folder=$TACHYON_UNDERFS_ADDRESS/tachyon/workers
  -Dtachyon.worker.memory.size=$TACHYON_WORKER_MEMORY_SIZE
  -Dtachyon.worker.data.folder=$TACHYON_RAM_FOLDER/tachyonworker/
  -Dtachyon.master.hostname=$TACHYON_MASTER_ADDRESS
  -Dtachyon.master.journal.folder=$TACHYON_HOME/journal/
  -Dtachyon.master.pinlist=/pinfiles;/pindata
"

It shows that TACHYON_MASTER_ADDRESS is set to the correct public DNS of the master node.

Junghoon Kang

unread,
Apr 5, 2015, 1:43:08 AM4/5/15
to tachyo...@googlegroups.com, junghoon....@gmail.com
Hello Calvin,

Spark-1.2.1 uses Tachyon-0.5.0 by default; even though they are not the most recent version, I assumed that it would run at least the simple test given by Tachyon.
As you suggested, I will install Spark-1.3.0 (most recent version), use its spark-ec2 script to start a EC2 cluster, do the same procedure, and see if it gives a different result.


Thank you,

Junghoon

Junghoon Kang

unread,
Apr 5, 2015, 3:53:15 AM4/5/15
to tachyo...@googlegroups.com, Junghoon Kang
Hello guys,

I used spark-ec2.sh script from spark-1.3.0 this time. It installs Tachyon v0.5.0 by default as well, and it still gave me the same error.

So I downloaded Tachyon v0.6.3 inside the master node.
Since it is a freshly downloaded Tachyon, I configured JAVA_HOME, TACHYON_MASTER_ADDRESS, and TACHYON_UNDERFS_ADDRESS variables inside tachyon/conf/tachyon-env.sh.
Also, I modified tachyon/conf/slaves to have the correct address of the slave node in the cluster and added tachyon/conf/workers file.
Then, I deployed the new tachyon directory to the slave node.

Inside master node's /root/tachyon/bin directory, after typing:
  • $ sudo ./tachyon format

  • $ sudo ./tachyon-start.sh all Mount
    I checked the master log, and it printed out "Tachyon Master Server started @ ip-10-159-26-157.ec2.internal/10.159.26.157:19999" this time instead of printing  Tachyon Master Server started @ localhost/127.0.0.1:19999 like before.

    I ran:
    • $ sudo ./tachyon runTest Basic CACHE_THROUGH
    and it gave me a different error: "The machine does not have any local worker".

    I added localhost line inside master node's tachyon/conf/slaves and tachyon/conf/workers files, and now it passes all the tests.


    Thank you all for your time and help,

    Junghoon

    Calvin Jia

    unread,
    Apr 6, 2015, 2:05:24 PM4/6/15
    to tachyo...@googlegroups.com, junghoon....@gmail.com
    Thanks for the update Junghoon, glad it worked out. 
    Reply all
    Reply to author
    Forward
    0 new messages