How to Partition an RDD by Ket in sparkR?

39 views
Skip to first unread message

Radhika Parik

unread,
Nov 2, 2015, 8:07:07 AM11/2/15
to SparkR Developers
Hi


I am fetching data from Hive in my SparkR program. I use map to convert this dataframe to a Pipelined RDD (which consists of a list of lists). How do i partitoon this RDD by certain elements of the list?




groupInput <- (SparkR:::map(cidNoRegFcstInput,function(x){ list(key1=list(col1=x$col1, col2=x$col2), value1=list(x$col1,x$col2, x$col3))}))

groupPartitioned <- SparkR:::partitionBy(groupInput,2, partitionFunc <- function(x){(as.numeric(x$key1$scol1) + as.numeric(x$key1$rcol2))%%2})




However, this gives me the following error:

Error in res[[bucket]] : wrong arguments for subsetting an environment Calls: source -> withVisible -> eval -> eval -> lapply -> FUN

What is the right way to define a custom partitioner in SparkR?

Regards

Radhika
Reply all
Reply to author
Forward
0 new messages