--
You received this message because you are subscribed to the Google Groups "Spark Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-users...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
You can do thisval newRDD = oldRDD.mapPartitions { iter =>val rand = new Scala.util.randomiter.map(x => (x, Array.fill(10)(rand.nextDouble _)))
val myAppSeed = 91234
val newRDD = myRDD.mapPartitionsWithIndex { (indx, iter) =>
val rand = new scala.util.Random(indx+myAppSeed)
iter.map(x => (x, Array.fill(10)(rand.nextDouble)))
}
Instead seems like potentially a decent choice to have stable behavior between runs.
Also it should be noted the original post wouldn't work(and hence example's to fix it wouldn't either), there is a second serialization problem in that the
Array.fill(10)(rand.nextDouble _) resulted in a partially applied function so the signature of the RDD ended up being:
spark.RDD[(Int, Array[() => Double])]
Which gives an error serializing Random too
Instances of java.util.Random are threadsafe. However, the concurrent use of the same java.util.Random instance across threads may encounter contention and consequent poor performance. Consider instead using ThreadLocalRandom in multithreaded designs.