Gaurav Dasgupta
unread,Oct 14, 2012, 4:31:07 PM10/14/12Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to spark...@googlegroups.com
Hi,
I am trying to write a MapReduce code in Spark where my Map function will take the input as String (row from RDD) and should return a set of (K,V) pairs. For instance, following is an example input to my Map function:
1 2:3,3:2,4:5|0|1
Now from this input I want to form the (k,v) pairs like the following:
(1,2:3,3:2,4:5|0|2)
(2,|3|1)
(3,|2|1)
(4,|5|1)
What I am doing is that, from my input row, I am forming a String like this:
2, |3|1@(3, |2|1@4, |5|1@1 2:3,3:2,4:5|0|1
Then in the main function rdd.map(x => myMapFunction(x)) gives me the above output.
Then, rdd.map(x => myMapFunction(x)).flatMap(x => x.split("@")) to transform it to the following:
2 |3|1
3 |2|1
4 |5|1
1 2:3,3:2,4:5|0|2
And finally, rdd.map(x => myMapFunction(x)).flatMap(x => x.split("@")).map(x => (x.split(" ")(0), x.split(" ")(1))) to form this:
(2,|3|1)
(3,|2|1)
(4,|5|1)
(1,2:3,3:2,4:5|0|2)
I am new to both Scala and Spark.
I want to know that what are possibilities by which I can return multiple (k,v) pairs and directly use the map() function on RDD to get the desired output?
I can return the multiple (k,v) pairs as an Array or List (please suggest if I can do the same in any other way). But then how can I use the map() function on RDD to get the desired output?
Thanks,
Gaurav