hi all, I'm new to rhadoop.
here is my test data
P =
do.call(
rbind,
rep(
list(
matrix(
rnorm(9000000, sd = 10),
ncol=30000)),
10)) +
matrix(rnorm(90000000), ncol =30000)
and I save it as a .csv.
the following is the changed part of the kmeans
out = list()
## for local mode
# library(bigmemory)
# ID00 = read.big.matrix("/usr/local/AMI/20150422_AMI/ID01_96_analysis_data.csv")
ptm <- proc.time()
for(be in c("hadoop")) {
rmr.options(backend = be)
set.seed(0)
out[[be]] =
'/usr/local/3000x20000.csv',
## for local mode
# to.dfs(ID00[1:3000, 1:20000]),
num.clusters = 3,
num.iter = 1,
combine = FALSE,
in.memory.combine = FALSE)
}
proc.time() - ptm
here is my result
local hadoop
processing time 60s 1000s
Is it normal that the processing time on local much faster than on hadoop?
My hadoop cluster setup is 3 node clusters (memory: 6G, 8G,12G ); hadoop version 2.6.0
thanks
Xu