How to return the last element of RDD[List[(Int,Double)]] ?

705 views
Skip to first unread message

lk

unread,
Aug 5, 2015, 3:13:42 AM8/5/15
to scala-user
Hi, I have the following code :

import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import math.ceil
import org.apache.spark.rdd.RDD

object test{
 
def main(args: Array[String]) {
   
//settings
    val sc
= new SparkContext("local", "SparkTest")
   
    val
NumIter:Int = 2
   
   
//given list
   
var mlist = List[(Int,Double)]()
   
    mlist
++= List((1, 5.0))
    mlist
++= List((2, 2.0))
    mlist
++= List((3, 3.0))
    mlist
++= List((4, 6.0))
   
   
//given list to rdd
   
var mlistrdd = sc.parallelize(mlist)
   
   
//storing important pairs list
   
var sipairs = List[List[(Int,Double)]]()
   
   
//storing important pairs list to rdd
   
var slist = sc.parallelize(sipairs)
   
   
//store the initial values of the given list
    slist
++= storetPairs(mlistrdd)
   
   
for(i <- 1 to NumIter){
        mlistrdd
=  mlistrdd.map({case (x,y) => mapping(x,y)})
                   
.reduceByKey((a,b) => (a+b))
       
        slist
++= storetPairs(mlistrdd)
   
}
   
   
//println("Stored Pairs")
    slist
.foreach(println)
   
   
System.exit(0)
 
}
 
 
def storetPairs(xs: RDD[(Int,Double)]): RDD[List[(Int,Double)]] =  {
          val
out = xs.flatMap{
             
var t = List[(Int,Double)]();
              x
=> {  t ++= List(x)  }
             
             
List(t)
         
}
         
         
//how to return only the final element ??? (I would like this to be in the RDD if possible)
         
out
     
}
     
 
def mapping(xx: Int,b: Double):(Int, Double) = {
   
var a  = ceil(xx/2.0).toInt
   
(a, b)
 
}

}

The output that I receive is the following (and is stored in the "slist" variable) :
-------------------------------------------------------
List((1,5.0))
List((1,5.0), (2,2.0))
List((1,5.0), (2,2.0), (3,3.0))
List((1,5.0), (2,2.0), (3,3.0), (4,6.0))

List((1,7.0))
List((1,7.0, (2,9.0))

List((1,16.0))
-------------------------------------------------------

Instead I would like to receive :
-------------------------------------------------------
List((1,5.0), (2,2.0), (3,3.0), (4,6.0))

List((1,7.0, (2,9.0))

List((1,16.0))
-------------------------------------------------------

How is it possible in the "storetPairs" function to return only the last element of the  RDD[List[(Int,Double)]] "out" value ?
(by returning the last element I will be able to receive the desired output)

Thanks.

Reply all
Reply to author
Forward
0 new messages