Problem to execute matrix multiplication of csv file which contain header.

Ohelig Pojke

unread,

Nov 21, 2014, 11:28:13 AM11/21/14

to rha...@googlegroups.com

library(rmr2)
library(rhdfs)
hdfs.init()
rmr.options(backend="hadoop")
Sum = function(.,YY){
keyval(1,list(Reduce('+',YY)))
}
XtX = values(from.dfs(mapreduce(
input = "/data/c.csv",input.format=make.input.format("csv",sep=","),
map=
    function(.,Xi){
      Xi=as.matrix(Xi)
      keyval(1,list(t(Xi)%*% Xi))
    },
reduce = Sum ,
combine = TRUE)))[[1]]
My this code work perfect with c.csv, and doesn't work with a.csv. Only difference between these two file is one is with header and one is without header. I removed header of a.csv with excel and saved it as c.csv. How to run a.csv in mapreduce by removing header inside the function.
I have uploaded both files a.csv as well as c.csv.

a.csv

c.csv

Antonio Piccolboni

unread,

Nov 21, 2014, 11:35:05 AM11/21/14

to rha...@googlegroups.com

It's a more than reasonable request but it's harder to implement then it seems. Each mapper process gets to read a portion of a file. Only the first portion contains the headers. Most processes do not receive the header information. To make it simple, we decided not to support csv files with headers. If anyone wants to contribute this improvement, pull requests are welcome.

Ohelig Pojke

unread,

Nov 21, 2014, 11:57:16 AM11/21/14

to rha...@googlegroups.com

Xi<-as.matrix(setNames(Xi, rep(" ", length(Xi)))) # This is one way to eliminate header i tried this one also as well as
colnames(Xi)<-NULL # This alos i tried.
Xi <- matrix(Xi, ncol = ncol(Xi), dimnames = NULL) #This also remove header and give a matrix without header. But also why my code is not working.

Reply all

Reply to author

Forward