All,
I achieved reshuffling of elements in the partitions that make up the RDD to the neighbors by writing my own Partitioner.
My Partitioner over-rides Partitioner.getPartition(key: Any): Int
I am modifying the state of my key in the getPartition method which in turn affects the logic in getPartition
eg. If my key is: class Foo extends Partitioner(var currentPartition: Int)
with an initial value of currentPartition of 0,
the call to getPartition modifies currentPartition to some value (say currentParititon + 1), and returns the current value (0)
the next call to getParition does the same
I am iterating thru the RDD
for(int i <- 0 until 10) {
rdd.mapPartitions()
rdd.partitionBy(myPartitioner)
}
--
My question(s)
1. Is this OK?
I am worried that the transforming behaviour of Spark on RDDs might get messed up because of the way I am mutating my keys
2. Is there a way to achieve equi-size paritions (My partitions get lopsided).
How do the sortBy* methods achieve equi-size parittions?
I would appreciate any insights into this matter
cheers
Kumar