spark - formatting data in a key-value pair

24 views
Skip to first unread message

Sam

unread,
Jul 22, 2016, 10:51:54 PM7/22/16
to scala-user
I have data as key-value pairs and need write to a text file in the format mentioned below. I could think of using using groupByKey then use foreach to print the key followed by values.
But I couldn't achieve it. Kindly help me.


res6: Array[(String, (String, String))] = Array((85000,(Harvey,Allen)), (85000,(Daniel,Prinz)), (85000,(Robert,Pascale)), (85000,(Donna,Brookes)), (85000,(James,Mackenzie)), (85000,(Robert,Chamberlain)), (85000,(Richard,Cunningham)), (85000,(Bailey,Sewell)), (85000,(Daniel,Marin)), (85001,(Frances,Mendelsohn)))


Output Format:
--- 85000
Harvey,Allen
Daniel,Prinz
Robert,Pascale
Donna,Brookes
.
.
.
.
--- 85001
Frances,Mendelsohn


Vlad Patryshev

unread,
Jul 22, 2016, 10:59:38 PM7/22/16
to Sam, scala-user
groupBy is your friend

Thanks,
-Vlad

--
You received this message because you are subscribed to the Google Groups "scala-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-user+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Sam

unread,
Jul 23, 2016, 10:15:19 AM7/23/16
to scala-user, sam...@gmail.com
Thanks Vlad. I tried using groupByKey, it returned Array[(String, Iterable[(String, String)])] on which I tried iterating like below. But I couldn't achieve the output in the expected format :(

accountsByPCode.mapValues(fields => (fields(3), fields(4))).groupByKey().sortByKey()
.foreach(t => {
  t
._1.foreach(print)
  t
._2.foreach(println)
})

Result:
97234(Rebecca,Wright)
(Michael,Herron)
(Thomas,Telles)
(Rene,McCue)
(Arthur,Bruce)
(Herman,Willett)
(Jim,Stills)
(Terra,Delacruz)
(Jeffrey,Schulz)
(Ned,Goff)
97235(Barbara,Bledsoe)
(Marjorie,Libby)
(John,Stewart)
(Larry,Ramos)
(Herbert,Bacote)
(James,Moore)
(Gloria,Richmond)
(Anthony,Fuhr)
(Cornelia,Sadowski)
(Brad,Major)
.....
Reply all
Reply to author
Forward
0 new messages