library(rmr2)
library(rhdfs)
hdfs.init()
rmr.options(backend="hadoop")
Sum = function(.,YY){
keyval(1,list(Reduce('+',YY)))
}
XtX = values(from.dfs(mapreduce(
input = "/data/c.csv",input.format=make.input.format("csv",sep=","),
map=
function(.,Xi){
Xi=as.matrix(Xi)
keyval(1,list(t(Xi)%*% Xi))
},
reduce = Sum ,
combine = TRUE)))[[1]]
My this code work perfect with c.csv, and doesn't work with a.csv. Only difference between these two file is one is with header and one is without header. I removed header of a.csv with excel and saved it as c.csv. How to run a.csv in mapreduce by removing header inside the function.
I have uploaded both files a.csv as well as c.csv.