How to Partition an RDD by Ket in sparkR?

39 views

Skip to first unread message

Radhika Parik

unread,

Nov 2, 2015, 8:07:07 AM11/2/15

to SparkR Developers

Hi

I am fetching data from Hive in my SparkR program. I use map to convert this dataframe to a Pipelined RDD (which consists of a list of lists). How do i partitoon this RDD by certain elements of the list?

groupInput <- (SparkR:::map(cidNoRegFcstInput,function(x){ list(key1=list(col1=x$col1, col2=x$col2), value1=list(x$col1,x$col2, x$col3))}))

groupPartitioned <- SparkR:::partitionBy(groupInput,2, partitionFunc <- function(x){(as.numeric(x$key1$scol1) + as.numeric(x$key1$rcol2))%%2})

However, this gives me the following error:

Error in res[[bucket]] : wrong arguments for subsetting an environment Calls: source -> withVisible -> eval -> eval -> lapply -> FUN

What is the right way to define a custom partitioner in SparkR?

Regards

Radhika

Reply all

Reply to author

Forward

0 new messages