[Scalding] groupBy and iterate over group values

383 views
Skip to first unread message

Serega Sheypak

unread,
Aug 31, 2015, 5:21:09 AM8/31/15
to cascading-user
Hi, I need to groupBy some key
Generate "uniqueId" for each group
iterate over each group values one by one and add generate id.

My code returns nothing for some reason, but complies and runs w/o errors. Here is a code:

Here is gist link:

And copy-paste code from gist:
def groupAndGenerateNewSurrogateKey: Pipe = {
pipe.groupBy('naturalKey){ group =>
group.mapStream[Long, (Long, Long)]('someValueField -> ('someValueField, 'newSurrogateKey)) { items: Iterator[Long] =>
val newSurrogateKey = KeyGen.generate()
println(s"new group key:[$newSurrogateKey]") //outputs generated key
println(s"items: ${items.toList}") //correctly outputs grouped items
items.map((_,newSurrogateKey)).toList
}
}.project('someValueField, 'newSurrogateKey)
}
//returns NOTHING... Why?

Serega Sheypak

unread,
Aug 31, 2015, 8:09:56 AM8/31/15
to cascading-user
Ok, I solved it
println(s"items: ${items.toList}") //correctly outputs grouped items
empties times and I can't access it second time. Pretty clear.

What confuses me: does scalding read entire values list for a key? I could hit OOM. I don't need to get all values at one time, I can get them one-by-one.



понедельник, 31 августа 2015 г., 11:21:09 UTC+2 пользователь Serega Sheypak написал:

Oscar Boykin

unread,
Aug 31, 2015, 2:33:24 PM8/31/15
to cascadi...@googlegroups.com
Scalding does not force items into memory unless you ask it to. You can operate on the Iterator without forcing to memory (see filter, map, foldLeft, flatMap etc ... Methods on Iterator).

Iterator can only be used once. So when you did toList that exhausted the Iterator.
--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at http://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/a9f71371-dd1a-4719-a30d-14b3438b7662%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
Oscar Boykin :: @posco :: http://twitter.com/posco

Reply all
Reply to author
Forward
0 new messages