How to split DList to multiple location ?

36 views
Skip to first unread message

Jeff Zhang

unread,
Feb 27, 2013, 9:26:19 PM2/27/13
to scoobi...@googlegroups.com
Hi,

I am trying to split a DList by one key, and store each group into different location. Is there API for this usage ? I know that using filter is a workaround, but it is not so elegant. Thanks for any help.


Eric Torreborre

unread,
Feb 28, 2013, 6:19:22 AM2/28/13
to scoobi...@googlegroups.com
Do you mean something like:

def splitByKey(list: DList[(K, Iterable[V])]): Seq[(K, DList[V])] = {
  val keys = list.map(_._1).materialise.run
  keys map { k => (k, list.filter(_._1 == k)) }
}

That doesn't seem very optimal to me, I'll think if we can do better.

E.

Eric Torreborre

unread,
Mar 3, 2013, 7:45:51 PM3/3/13
to scoobi...@googlegroups.com
Hi Jeff,

I thought about that and I think that you could attempt to write a new DataSink with a specific OutputFormat.

If you look into the existing TextOutputFormat you will see that a "LineWriter" is created once and for all for all the key-values. 

In your case you could instantiate a new LineWriter, using a different DataOutputStream, when you encounter a new key.

If you don't have time to do it, please open an issue on github and we'll try to do something. OTOH, if you have time to do it, please consider making it a pull request :-).

Cheers,

Eric.
Reply all
Reply to author
Forward
Message has been deleted
0 new messages