Akka Streams PoC

178 views
Skip to first unread message

BigAl

unread,
Jun 23, 2015, 8:33:42 AM6/23/15
to akka...@googlegroups.com
Hi all,

In my company we've been running some tests to evaluate if Akka Streams would be useful for us to process both the Wikidata and Wikipedia dumps.

I have to say that both PoC were very satisfactory and we found Akka Streams a really nice framework to play with :-).

After that, we wanted to share our findings with the rest of the world :) so we've created a small repo that contains part of the PoC, we have structured in a way that can be explain in a single (and simple) blog post:
Blog post: http://engineering.intenthq.com/2015/06/wikidata-akka-streams/

We would really appreciate any comments about it.

Thanks,

  BigAl

Endre Varga

unread,
Jun 23, 2015, 9:08:25 AM6/23/15
to akka...@googlegroups.com
Hi,

On Tue, Jun 23, 2015 at 12:29 PM, BigAl <albert....@intenthq.com> wrote:
Hi all,

In my company we've been running some tests to evaluate if Akka Streams would be useful for us to process both the Wikidata and Wikipedia dumps.

I have to say that both PoC were very satisfactory and we found Akka Streams a really nice framework to play with :-).

I think what you do is actually a sweet spot for Akka Streams, these kind of use-cases match well with the underlying assumptions.
 

After that, we wanted to share our findings with the rest of the world :) so we've created a small repo that contains part of the PoC, we have structured in a way that can be explain in a single (and simple) blog post:
Blog post: http://engineering.intenthq.com/2015/06/wikidata-akka-streams/

We would really appreciate any comments about it.

There is an InputStreamSource available, so you can use it directly where you read the file: http://doc.akka.io/api/akka-stream-and-http-experimental/1.0-RC3/#akka.stream.io.InputStreamSource$

The above does not do line parsing, but the upcoming RC4 contains a simple line parsing (or other delimiter based parsing) stage: https://github.com/akka/akka/blob/release-2.3-dev/akka-stream/src/main/scala/akka/stream/io/Framing.scala#L53

You can try that combination after RC4 comes out.

-Endre
 

Thanks,

  BigAl

--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Konrad Malawski

unread,
Jun 23, 2015, 9:28:18 AM6/23/15
to Akka User List
Hi there, 
nice blog and example app!

Please use SynchronousFileSource instead of using getLines from the scala io.Source - it will be much faster (much):
--
Cheers,
Konrad 'ktoso' Malawski

BigAl

unread,
Jun 23, 2015, 9:31:22 AM6/23/15
to akka...@googlegroups.com
Thanks a lot!
Will use the InputStreamSource for sure, looking forward for the RC4 :-)

Konrad Malawski

unread,
Jun 23, 2015, 9:32:14 AM6/23/15
to akka...@googlegroups.com, BigAl
SynchronousFileSource will be even better than InputStreamSource :-)

-- 
Cheers,
Konrad 'ktoso’ Malawski
Akka @ Typesafe

Endre Varga

unread,
Jun 23, 2015, 11:11:38 AM6/23/15
to akka...@googlegroups.com
Btw, in the blog you mention " you can use mapConcat instead (note that it works only onimmutable.Seq[T]).". This is no longer true after RC4, since you will be able to emit an immutable.Iterable. In fact you can emit an infinite sequence if that is what you want :)

-Endre

Endre Varga

unread,
Jun 23, 2015, 11:14:41 AM6/23/15
to akka...@googlegroups.com
And one more thing... (Columbo style ;))

There are other ways to exploit paralellisation other than mapAsyncUnordered. This section in the docs explains the most important patterns in detail: http://doc.akka.io/docs/akka-stream-and-http-experimental/1.0-RC3/scala/stream-parallelism.html

-Endre

BigAl

unread,
Jun 23, 2015, 11:30:14 AM6/23/15
to akka...@googlegroups.com
@Konrad, not sure if we understood properly SynchronousFileSource but we can't find a way of making it work with a gzipped input file.

@drewhk, thanks a lot, I will update the blog post with the information and the link you've provided!

Konrad Malawski

unread,
Jun 23, 2015, 11:32:29 AM6/23/15
to akka...@googlegroups.com, BigAl
Oh, my mistake then, sorry :-)
I missed the fact that you're using a GZIPInputStream and not "just the file",
all good then - InputStreamSource should be good for that use case :-)

-- 
Cheers,
Konrad 'ktoso’ Malawski
Akka @ Typesafe

Reply all
Reply to author
Forward
0 new messages