Problem to execute matrix multiplication of csv file which contain header.

37 views
Skip to first unread message

Ohelig Pojke

unread,
Nov 21, 2014, 11:28:13 AM11/21/14
to rha...@googlegroups.com
library(rmr2)
library(rhdfs)
hdfs.init()
rmr.options(backend="hadoop")
Sum = function(.,YY){
  keyval(1,list(Reduce('+',YY)))
}
XtX = values(from.dfs(mapreduce(
  input = "/data/c.csv",input.format=make.input.format("csv",sep=","),
  map=
    function(.,Xi){ 
      Xi=as.matrix(Xi)
      keyval(1,list(t(Xi)%*% Xi))
    },
  reduce = Sum ,
  combine = TRUE)))[[1]]
My this code work perfect with c.csv, and doesn't work with a.csv. Only difference between these two file is one is with header and one is without header. I removed header of a.csv with excel and saved it as c.csv. How to run a.csv in mapreduce by removing header inside the function.
I have uploaded both files a.csv as well as c.csv.
a.csv
c.csv

Antonio Piccolboni

unread,
Nov 21, 2014, 11:35:05 AM11/21/14
to rha...@googlegroups.com
It's a more than reasonable request but it's harder to implement then it seems. Each mapper process gets to read a portion of a file. Only the first portion contains the headers. Most processes do  not receive the header information. To make it simple, we decided not to support csv files with headers. If anyone wants to contribute this improvement, pull requests are welcome.

Ohelig Pojke

unread,
Nov 21, 2014, 11:57:16 AM11/21/14
to rha...@googlegroups.com
Xi<-as.matrix(setNames(Xi, rep(" ", length(Xi)))) # This is one way to eliminate header i tried this one also as well as
colnames(Xi)<-NULL # This alos i tried.
Xi <- matrix(Xi, ncol = ncol(Xi), dimnames = NULL) #This also remove header and give a matrix without header. But also why my code is not working.
Reply all
Reply to author
Forward
0 new messages