Using stream.lib in Spark Streaming

97 views
Skip to first unread message

julianke...@gmail.com

unread,
Nov 26, 2015, 8:22:49 AM11/26/15
to stream-lib-user
I'm new to stream-lib and streaming algorithms in general and played around with the HyperLogLog and HyperLogLog+ implementations. I really like the simple API but I want to use stream-lib in a Spark Streaming context. Here is a code snippet which shows how I use the library:

Function2<List<String>, Optional<Long>, Optional<Long>> hllCountFunction = new Function2<List<String>, Optional<Long>, Optional<Long>>() {
@Override
public Optional<Long> call(List<String> values, Optional<Long> state) throws Exception {
values.stream().forEach(value -> hll.offer(value));
long newState = state.isPresent() ? hll.cardinality() : 0;
return Optional.of(newState);
        }
};

I get a '1' every time the function is called (for every RDD in the stream). In the debugger I saw that the HLL object seems to be a new one every time... What am I doing wrong? I think there is a fallacy... 

Thanks in advance for you help.

julianke...@gmail.com

unread,
Dec 10, 2015, 7:10:41 AM12/10/15
to stream-lib-user
I solved the problem by passing the hyperloglog object to the function and returning it after calling offer for each element in the rdd. 

Matt Abrams

unread,
Dec 10, 2015, 7:11:20 AM12/10/15
to stream-...@googlegroups.com
Thanks, sorry for not following up on this. Glad you found a solution.

Matt
> --
> You received this message because you are subscribed to the Google Groups
> "stream-lib-user" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to stream-lib-us...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages