Access a subset of data using Rhadoop?

Liang Zhou

unread,

Apr 7, 2014, 5:45:44 PM4/7/14

to rha...@googlegroups.com

I am doing a simple Principal Component transformation on a dataset for each sample, the data looks like

Sample X Y Z

1 ... (some entries for X Y and Z)

1 ...

2 ...

3 ...

When I write the map function, I did

unqsample <- unique(data$Sample)

pca_mapper <- function(k,input) {

generate.sample = function(i) {

select.input = Data[Data$Sample==i,]

keyval(i,select.input)

}

c.keyval(lapply(unqsample, generate.sample))

}

The data is pre-loaded as csv file,

column = c("Sample","X","Y","Z")

pca.input.format =

make.input.format(

"csv",

sep=",",

row.names=NULL,

col.names=column,

na.strings=c("NA"),

colClasses=c(Sample="numeric",

X="numeric",

Y="numeric",

Z="numeric"

)

It reported error when I run

mapreduce(input="rawpca.csv", input.format = pca.input.format, map=pca_mapper)

I wonder if it is because the data is distributed across different data node so that Data[Data$Sample==i,] is not subsetting from the full dataset?

I am new to Hadoop, and wonder what is the best strategy to write the map function so that I can have key being the Sample ID, and its value is the subset of the data corresponding to the specific Sample ID.

Thank you for your help!

Antonio Piccolboni

unread,

Apr 7, 2014, 5:58:50 PM4/7/14

to RHadoop Google Group

Real data, real code, real error information => real help. pseudo-data, pseudo-code, no error details => guesses.

My guess per your specs:

pca_mapper = function(k, input) keyval(input$Sample, input[, -1])

That's all

Antonio

--
post: rha...@googlegroups.com ||
unsubscribe: rhadoop+u...@googlegroups.com ||
web: https://groups.google.com/d/forum/rhadoop?hl=en-US
---
You received this message because you are subscribed to the Google Groups "RHadoop" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rhadoop+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Liang Zhou

unread,

Apr 7, 2014, 10:34:52 PM4/7/14

to rha...@googlegroups.com, ant...@piccolboni.info

Thank you, Antonio. It turns out to work as simple as you pointed out.

Reply all

Reply to author

Forward