Hi
I am fetching data from Hive in my SparkR program. I use map to convert this dataframe to a Pipelined RDD (which consists of a list of lists). How do i partitoon this RDD by certain elements of the list?
groupInput <- (SparkR:::map(cidNoRegFcstInput,function(x){ list(key1=list(col1=x$col1, col2=x$col2), value1=list(x$col1,x$col2, x$col3))}))
groupPartitioned <- SparkR:::partitionBy(groupInput,2, partitionFunc <- function(x){(as.numeric(x$key1$scol1) + as.numeric(x$key1$rcol2))%%2})
However, this gives me the following error:
Error in res[[bucket]] : wrong arguments for subsetting an environment Calls: source -> withVisible -> eval -> eval -> lapply -> FUN
What is the right way to define a custom partitioner in SparkR?
Regards
Radhika