Spark SQL on Alluxio v/s on HDFS

21 views

Skip to first unread message

Cam Mach

unread,

Dec 13, 2017, 5:30:05 PM12/13/17

to Alluxio Developers

Hello everyone, I am running stress test on Alluxio using Spark. It's supposed to have a better performance than running Spark on HDFS, right? But turn out worse. Is there any configurations in Alluxio that I should turn on or adjust to make it work better? Note, all of my data fully loaded into memory

Here is specific scenario: generate 2GB of arbitrary data on HDFS, and on Alluxio. Then run Spark SQL to read data from HDFS, then filter and count. Do the same for Alluxio, and measure the time taken on both.