Why MR on alluxio and Spark on alluxio both perform worse than not using alluxio?

71 views
Skip to first unread message

Kaiming Wan

unread,
Oct 8, 2016, 9:10:48 AM10/8/16
to Alluxio Users, fanb...@gmail.com
I use alluxio 1.3.0 to perform the experiment successfully to calculate how many lines in a 10GB file. The result is :


MR without alluxio :  59s

MR on alluxio : 2m 01s

Spark without alluxio: 1m 27s

Spark on alluxio : 1m 38s

Pei Sun

unread,
Oct 31, 2016, 3:50:28 PM10/31/16
to Kaiming Wan, Alluxio Users, Bin Fan
Hi Kaiming,
    Do you still have problem with your experiment? If yes, can you provide more details like the machine spec, how you setup the cluster, where is the 10GB file stored etc.

Pei

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to alluxio-users+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Pei Sun

Kaiming Wan

unread,
Nov 3, 2016, 2:50:48 AM11/3/16
to Alluxio Users, wan...@gmail.com, fanb...@gmail.com



Hi, Pei Sun

    Thanks for your response. Sorry for the late response because I am busy in other things this week.


    I have did some experiment today and still got some problem when using spark on alluxio.


Basic experiment  Info:
  • machine spec: 
            alluxio1(sq-hbase1.800best.com): 120GB memory and 40 cores
            alluxio2(sq-hbase2.800best.com): 120GB memory and 40 cores
            alluxio3(sq-hbase3.800best.com): 120GB memory and 40 cores
  • role of hosts
alluxio1: alluxio master ,alluxio worker,namenode
alluxio2: alluxio worker,datanode
alluxio3: alluxio worker,datanode

  • version info 
alluxio:1.3.0
JDK:1.8
hadoop :2.7.2
           spark: 2.0.0

           scala: 2.11.8


Spark job spec:  Spark count job with a 90GB file which is stored evenly and separately on three nodes.


Experiment one(Rich resource for spark with worker memory set to 30G):  

Spark configuration:

spark-env.sh(the only difference is SPARK_LOCAL_HOSTNAME on each nodes):



export JAVA_HOME=/home/appadmin/jdk1.8.0_77
export SPARK_HOME=/home/appadmin/spark-2.0.0-bin-without-hadoop
export HADOOP_HOME=/home/appadmin/hadoop-2.7.2
export SPARK_DIST_CLASSPATH=$(/home/appadmin/hadoop-2.7.2/bin/hadoop classpath)
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
export SPARK_LIBARY_PATH=.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib:$HADOOP_HOME/lib/native
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
SPARK_MASTER_HOST
=10.8.12.16
SPARK_MASTER_WEBUI_PORT
=28686
SPARK_LOCAL_DIRS
=/home/appadmin/spark-2.0.0-bin-without-hadoop/sparkdata/local
SPARK_WORKER_DIR
=/home/appadmin/spark-2.0.0-bin-without-hadoop/sparkdata/work
SPARK_LOG_DIR
=/home/appadmin/spark-2.0.0-bin-without-hadoop/logs


SPARK_WORKER_MEMORY
=30g



SPARK_LOCAL_HOSTNAME
=sq-hbase1.800best.com


spark-default.sh

spark.driver.memory              5g




spark
.executor.instances = 12
spark
.executor.cores = 9
spark
.executor.memory = 30g


spark
.cores.max=30




spark
.driver.extraClassPath=/home/appadmin/alluxio-1.3.0/core/client/target/alluxio-core-client-1.3.0-jar-with-dependencies.jar
spark
.executor.extraClassPath=/home/appadmin/alluxio-1.3.0/core/client/target/alluxio-core-client-1.3.0-jar-with-dependencies.jar






spark
.files /home/appadmin/spark-2.0.0-bin-without-hadoop/conf/hdfs-site.xml,/home/appadmin/spark-2.0.0-bin-without-hadoop/conf/core-site.xml


Running spark-shell with this command:
spark-shell --master  spark://10.8.12.16:7077 --executor-memory 8G


And I also clean the OS cache before running experiment:
sudo sh -c 'free && sync && echo 3 > /proc/sys/vm/drop_caches && free'


Scala code in spark-shell:
   val a = sc.textFile("alluxio://10.8.12.16:19998/90G-spark")
   
   val b
= sc.textFile("hdfs://10.8.12.16:9000/alluxio/data/90G-spark")


   a
.count()
   
   b
.count()




Experiment result:



Spark on alluxio: 12min(Restart the job will get much better result as 3 min but still much slower than spark without alluxio)
Spark without alluxio: 1.3 min




Experiment two(Poor resource for spark with worker memory set to 10G and executor memory and driver memory set with default value ):  
Spark configuration:

spark-env.sh(the only difference is SPARK_LOCAL_HOSTNAME on each nodes):



export JAVA_HOME=/home/appadmin/jdk1.8.0_77
export SPARK_HOME=/home/appadmin/spark-2.0.0-bin-without-hadoop
export HADOOP_HOME=/home/appadmin/hadoop-2.7.2
export SPARK_DIST_CLASSPATH=$(/home/appadmin/hadoop-2.7.2/bin/hadoop classpath)
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
export SPARK_LIBARY_PATH=.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib:$HADOOP_HOME/lib/native
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
SPARK_MASTER_HOST
=10.8.12.16
SPARK_MASTER_WEBUI_PORT
=28686
SPARK_LOCAL_DIRS
=/home/appadmin/spark-2.0.0-bin-without-hadoop/sparkdata/local
SPARK_WORKER_DIR
=/home/appadmin/spark-2.0.0-bin-without-hadoop/sparkdata/work
SPARK_LOG_DIR
=/home/appadmin/spark-2.0.0-bin-without-hadoop/logs


SPARK_WORKER_MEMORY
=10g



SPARK_LOCAL_HOSTNAME
=sq-hbase1.800best.com


spark-default.sh

#spark.driver.memory              5g




#spark.executor.instances = 12
#spark.executor.cores = 9
#spark.executor.memory = 30g


#spark.cores.max=30




spark
.driver.extraClassPath=/home/appadmin/alluxio-1.3.0/core/client/target/alluxio-core-client-1.3.0-jar-with-dependencies.jar
spark
.executor.extraClassPath=/home/appadmin/alluxio-1.3.0/core/client/target/alluxio-core-client-1.3.0-jar-with-dependencies.jar






spark
.files /home/appadmin/spark-2.0.0-bin-without-hadoop/conf/hdfs-site.xml,/home/appadmin/spark-2.0.0-bin-without-hadoop/conf/core-site.xml


Command to start spark-shell:
spark-shell --master  spark://10.8.12.16:7077

Scala code in spark-shell:
   val a = sc.textFile("alluxio://10.8.12.16:19998/90G-spark")
   
   val b
= sc.textFile("hdfs://10.8.12.16:9000/alluxio/data/90G-spark")


   a
.count()
   
   b
.count()


Experiment result:

  • 1. Spark job on alluxio got problem with lack of  java heap space.

I got following output when running the count job:
14:32:00 WARN scheduler.TaskSetManager: Lost task 37.0 in stage 0.0 (TID 36, sq-hbase3.800best.com): java.lang.OutOfMemoryError: Java heap space


16/11/03 14:32:00 WARN server.TransportChannelHandler: Exception in connection from /10.8.12.18:16723
java
.io.IOException: Connection reset by peer
 at sun
.nio.ch.FileDispatcherImpl.read0(Native Method)
 at sun
.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
 at sun
.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
 at sun
.nio.ch.IOUtil.read(IOUtil.java:192)
 at sun
.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
 at io
.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:311)
 at io
.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
 at io
.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
 at io
.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
 at io
.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
 at io
.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
 at io
.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
 at io
.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
 at io
.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
 at java
.lang.Thread.run(Thread.java:745)
16/11/03 14:32:00 ERROR scheduler.TaskSchedulerImpl: Lost executor 2 on sq-hbase3.800best.com: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:00 WARN scheduler.TaskSetManager: Lost task 100.0 in stage 0.0 (TID 146, sq-hbase3.800best.com): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:00 WARN scheduler.TaskSetManager: Lost task 77.1 in stage 0.0 (TID 137, sq-hbase3.800best.com): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:00 WARN scheduler.TaskSetManager: Lost task 59.1 in stage 0.0 (TID 128, sq-hbase3.800best.com): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:00 WARN scheduler.TaskSetManager: Lost task 21.1 in stage 0.0 (TID 131, sq-hbase3.800best.com): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:00 WARN scheduler.TaskSetManager: Lost task 67.1 in stage 0.0 (TID 122, sq-hbase3.800best.com): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:00 WARN scheduler.TaskSetManager: Lost task 87.1 in stage 0.0 (TID 140, sq-hbase3.800best.com): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:00 WARN scheduler.TaskSetManager: Lost task 103.0 in stage 0.0 (TID 149, sq-hbase3.800best.com): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:00 WARN scheduler.TaskSetManager: Lost task 51.1 in stage 0.0 (TID 134, sq-hbase3.800best.com): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:00 WARN scheduler.TaskSetManager: Lost task 37.1 in stage 0.0 (TID 125, sq-hbase3.800best.com): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:00 WARN scheduler.TaskSetManager: Lost task 105.0 in stage 0.0 (TID 151, sq-hbase3.800best.com): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:00 WARN scheduler.TaskSetManager: Lost task 3.1 in stage 0.0 (TID 142, sq-hbase3.800best.com): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:00 WARN scheduler.TaskSetManager: Lost task 1.1 in stage 0.0 (TID 124, sq-hbase3.800best.com): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:00 WARN scheduler.TaskSetManager: Lost task 65.1 in stage 0.0 (TID 133, sq-hbase3.800best.com): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:00 WARN scheduler.TaskSetManager: Lost task 31.1 in stage 0.0 (TID 127, sq-hbase3.800best.com): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:00 WARN scheduler.TaskSetManager: Lost task 45.1 in stage 0.0 (TID 136, sq-hbase3.800best.com): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:00 WARN scheduler.TaskSetManager: Lost task 99.0 in stage 0.0 (TID 145, sq-hbase3.800best.com): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:00 WARN scheduler.TaskSetManager: Lost task 116.0 in stage 0.0 (TID 154, sq-hbase3.800best.com): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:00 WARN scheduler.TaskSetManager: Lost task 79.1 in stage 0.0 (TID 139, sq-hbase3.800best.com): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:00 WARN scheduler.TaskSetManager: Lost task 57.1 in stage 0.0 (TID 130, sq-hbase3.800best.com): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:00 WARN scheduler.TaskSetManager: Lost task 102.0 in stage 0.0 (TID 148, sq-hbase3.800best.com): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:00 WARN scheduler.TaskSetManager: Lost task 95.0 in stage 0.0 (TID 121, sq-hbase3.800best.com): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:00 WARN scheduler.TaskSetManager: Lost task 101.0 in stage 0.0 (TID 147, sq-hbase3.800best.com): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:00 WARN scheduler.TaskSetManager: Lost task 93.1 in stage 0.0 (TID 129, sq-hbase3.800best.com): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:00 WARN scheduler.TaskSetManager: Lost task 7.1 in stage 0.0 (TID 138, sq-hbase3.800best.com): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:00 WARN scheduler.TaskSetManager: Lost task 94.0 in stage 0.0 (TID 120, sq-hbase3.800best.com): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:00 WARN scheduler.TaskSetManager: Lost task 104.0 in stage 0.0 (TID 150, sq-hbase3.800best.com): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:00 WARN scheduler.TaskSetManager: Lost task 17.1 in stage 0.0 (TID 132, sq-hbase3.800best.com): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:00 WARN scheduler.TaskSetManager: Lost task 9.1 in stage 0.0 (TID 141, sq-hbase3.800best.com): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:00 WARN scheduler.TaskSetManager: Lost task 85.0 in stage 0.0 (TID 105, sq-hbase3.800best.com): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:00 WARN scheduler.TaskSetManager: Lost task 43.1 in stage 0.0 (TID 123, sq-hbase3.800best.com): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:00 WARN scheduler.TaskSetManager: Lost task 98.0 in stage 0.0 (TID 144, sq-hbase3.800best.com): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:00 WARN scheduler.TaskSetManager: Lost task 75.0 in stage 0.0 (TID 81, sq-hbase3.800best.com): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:00 WARN scheduler.TaskSetManager: Lost task 109.0 in stage 0.0 (TID 153, sq-hbase3.800best.com): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:00 WARN scheduler.TaskSetManager: Lost task 53.1 in stage 0.0 (TID 126, sq-hbase3.800best.com): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:00 WARN scheduler.TaskSetManager: Lost task 32.0 in stage 0.0 (TID 27, sq-hbase3.800best.com): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:00 WARN scheduler.TaskSetManager: Lost task 81.1 in stage 0.0 (TID 135, sq-hbase3.800best.com): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:00 WARN scheduler.TaskSetManager: Lost task 86.0 in stage 0.0 (TID 108, sq-hbase3.800best.com): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:00 WARN scheduler.TaskSetManager: Lost task 76.0 in stage 0.0 (TID 84, sq-hbase3.800best.com): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:00 WARN scheduler.TaskSetManager: Lost task 108.0 in stage 0.0 (TID 152, sq-hbase3.800best.com): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:00 WARN scheduler.TaskSetManager: Lost task 39.1 in stage 0.0 (TID 143, sq-hbase3.800best.com): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
[Stage 0:>                                                      (0 + 80) / 1440]16/11/03 14:32:02 WARN server.TransportChannelHandler: Exception in connection from /10.8.12.17:45935
java
.io.IOException: Connection reset by peer
 at sun
.nio.ch.FileDispatcherImpl.read0(Native Method)
 at sun
.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
 at sun
.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
 at sun
.nio.ch.IOUtil.read(IOUtil.java:192)
 at sun
.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
 at io
.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:311)
 at io
.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
 at io
.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
 at io
.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
 at io
.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
 at io
.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
 at io
.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
 at io
.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
 at io
.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
 at java
.lang.Thread.run(Thread.java:745)
16/11/03 14:32:02 ERROR scheduler.TaskSchedulerImpl: Lost executor 1 on sq-hbase2.800best.com: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:02 WARN scheduler.TaskSetManager: Lost task 72.1 in stage 0.0 (TID 173, sq-hbase2.800best.com): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:02 WARN scheduler.TaskSetManager: Lost task 14.1 in stage 0.0 (TID 182, sq-hbase2.800best.com): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:02 WARN scheduler.TaskSetManager: Lost task 35.1 in stage 0.0 (TID 164, sq-hbase2.800best.com): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:02 WARN scheduler.TaskSetManager: Lost task 39.2 in stage 0.0 (TID 155, sq-hbase2.800best.com): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:02 WARN scheduler.TaskSetManager: Lost task 38.1 in stage 0.0 (TID 176, sq-hbase2.800best.com): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:02 WARN scheduler.TaskSetManager: Lost task 52.1 in stage 0.0 (TID 185, sq-hbase2.800best.com): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:02 WARN scheduler.TaskSetManager: Lost task 73.1 in stage 0.0 (TID 167, sq-hbase2.800best.com): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:02 WARN scheduler.TaskSetManager: Lost task 11.1 in stage 0.0 (TID 158, sq-hbase2.800best.com): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:02 WARN scheduler.TaskSetManager: Lost task 24.1 in stage 0.0 (TID 184, sq-hbase2.800best.com): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:02 WARN scheduler.TaskSetManager: Lost task 20.1 in stage 0.0 (TID 175, sq-hbase2.800best.com): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:02 WARN scheduler.TaskSetManager: Lost task 115.1 in stage 0.0 (TID 166, sq-hbase2.800best.com): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:02 WARN scheduler.TaskSetManager: Lost task 42.1 in stage 0.0 (TID 169, sq-hbase2.800best.com): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:02 WARN scheduler.TaskSetManager: Lost task 131.1 in stage 0.0 (TID 160, sq-hbase2.800best.com): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:02 WARN scheduler.TaskSetManager: Lost task 2.1 in stage 0.0 (TID 178, sq-hbase2.800best.com): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:02 WARN scheduler.TaskSetManager: Lost task 28.1 in stage 0.0 (TID 187, sq-hbase2.800best.com): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:02 WARN scheduler.TaskSetManager: Lost task 130.1 in stage 0.0 (TID 181, sq-hbase2.800best.com): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:02 WARN scheduler.TaskSetManager: Lost task 119.1 in stage 0.0 (TID 163, sq-hbase2.800best.com): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:02 WARN scheduler.TaskSetManager: Lost task 90.1 in stage 0.0 (TID 172, sq-hbase2.800best.com): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:02 WARN scheduler.TaskSetManager: Lost task 46.0 in stage 0.0 (TID 46, sq-hbase2.800best.com): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:02 WARN scheduler.TaskSetManager: Lost task 111.0 in stage 0.0 (TID 91, sq-hbase2.800best.com): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:02 WARN scheduler.TaskSetManager: Lost task 70.0 in stage 0.0 (TID 67, sq-hbase2.800best.com): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:02 WARN scheduler.TaskSetManager: Lost task 12.0 in stage 0.0 (TID 13, sq-hbase2.800best.com): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:02 WARN scheduler.TaskSetManager: Lost task 118.1 in stage 0.0 (TID 157, sq-hbase2.800best.com): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:02 WARN scheduler.TaskSetManager: Lost task 61.0 in stage 0.0 (TID 61, sq-hbase2.800best.com): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:02 WARN scheduler.TaskSetManager: Lost task 114.1 in stage 0.0 (TID 189, sq-hbase2.800best.com): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:02 WARN scheduler.TaskSetManager: Lost task 110.1 in stage 0.0 (TID 180, sq-hbase2.800best.com): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:02 WARN scheduler.TaskSetManager: Lost task 60.1 in stage 0.0 (TID 183, sq-hbase2.800best.com): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:02 WARN scheduler.TaskSetManager: Lost task 34.1 in stage 0.0 (TID 174, sq-hbase2.800best.com): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:02 WARN scheduler.TaskSetManager: Lost task 107.1 in stage 0.0 (TID 156, sq-hbase2.800best.com): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:02 WARN scheduler.TaskSetManager: Lost task 91.1 in stage 0.0 (TID 165, sq-hbase2.800best.com): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:02 WARN scheduler.TaskSetManager: Lost task 26.1 in stage 0.0 (TID 186, sq-hbase2.800best.com): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:02 WARN scheduler.TaskSetManager: Lost task 10.1 in stage 0.0 (TID 168, sq-hbase2.800best.com): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:02 WARN scheduler.TaskSetManager: Lost task 22.1 in stage 0.0 (TID 177, sq-hbase2.800best.com): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:02 WARN scheduler.TaskSetManager: Lost task 49.1 in stage 0.0 (TID 159, sq-hbase2.800best.com): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:02 WARN scheduler.TaskSetManager: Lost task 63.1 in stage 0.0 (TID 171, sq-hbase2.800best.com): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:02 WARN scheduler.TaskSetManager: Lost task 121.1 in stage 0.0 (TID 162, sq-hbase2.800best.com): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:02 WARN scheduler.TaskSetManager: Lost task 4.1 in stage 0.0 (TID 188, sq-hbase2.800best.com): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:02 WARN scheduler.TaskSetManager: Lost task 56.1 in stage 0.0 (TID 179, sq-hbase2.800best.com): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:02 WARN scheduler.TaskSetManager: Lost task 106.1 in stage 0.0 (TID 170, sq-hbase2.800best.com): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/11/03 14:32:02 WARN scheduler.TaskSetManager: Lost task 53.2 in stage 0.0 (TID 161, sq-hbase2.800best.com): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
[Stage 0:>                                                     (0 + 120) / 1440]16/11/03 14:32:06 WARN server.TransportChannelHandler: Exception in connection from /10.8.12.16:8008
java
.io.IOException: Connection reset by peer
 at sun
.nio.ch.FileDispatcherImpl.read0(Native Method)
 at sun
.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
 at sun
.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
 at sun
.nio.ch.IOUtil.read(IOUtil.java:192)
 at sun
.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
 at io
.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:311)
 at io
.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
 at io
.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
 at io
.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
 at io
.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
 at io
.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
 at io
.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
 at io
.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
 at io
.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
 at java
.lang.Thread.run(Thread.java:745)
16/11/03 14:32:06 ERROR scheduler.TaskSchedulerImpl: Lost executor 0 on sq-hbase1.800best.com: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.


I checked the worker.log and master.log and there is no output after 14:00


  • 2. spark job without alluxio ran successfully and cost 




I am so confused that why spark job without alluxio don't give warn or error about lacking of jvm heap size. It is strange that spark job on alluxio give that warn and error because alluxio doesn't use JVM to store data. And the experiment result doesn't match the alluxio official blog's claim.


在 2016年11月1日星期二 UTC+8上午3:50:28,Pei Sun写道:
Hi Kaiming,
    Do you still have problem with your experiment? If yes, can you provide more details like the machine spec, how you setup the cluster, where is the 10GB file stored etc.

Pei
On Sat, Oct 8, 2016 at 6:10 AM, Kaiming Wan <wan...@gmail.com> wrote:
I use alluxio 1.3.0 to perform the experiment successfully to calculate how many lines in a 10GB file. The result is :


MR without alluxio :  59s

MR on alluxio : 2m 01s

Spark without alluxio: 1m 27s

Spark on alluxio : 1m 38s

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to alluxio-user...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Pei Sun

Pei Sun

unread,
Nov 3, 2016, 1:50:58 PM11/3/16
to Kaiming Wan, Alluxio Users, Bin Fan
HI Kaiming,
    In our experiment, we ran spark in standalone mode. We only tuned spark.executor.memory. You need to make sure that the memory allocated to spark executors and alluxio don't exceed the total machine size.

Pei

...

[Message clipped]  



--
Pei Sun
Reply all
Reply to author
Forward
0 new messages