Hello everyone, I am running stress test on Alluxio using Spark. It's supposed to have a better performance than running Spark on HDFS, right? But turn out worse. Is there any configurations in Alluxio that I should turn on or adjust to make it work better? Note, all of my data fully loaded into memory
Here is specific scenario: generate 2GB of arbitrary data on HDFS, and on Alluxio. Then run Spark SQL to read data from HDFS, then filter and count. Do the same for Alluxio, and measure the time taken on both.
Appreciate your help