Library for consuming PubSub at scale from Scala/akka-streams

612 views
Skip to first unread message

Josh F

unread,
Jan 16, 2018, 1:46:08 PM1/16/18
to Google Cloud Pub/Sub Discussions
Hi all,

Does anyone have recommendations for a library for consuming high-volume (thousands messages/sec) PubSub subscriptions? 

We are using Scala + akka-streams, so would need either a Java or Scala library.

We've looked at using Alpakka - https://github.com/akka/alpakka/tree/master/google-cloud-pub-sub/src - but the consumer seems primitive, and uses the REST/HTTP API instead of the gRPC API.

We've also looked at using this library - https://github.com/QubitProducts/akka-cloudpubsub - but have had issues with (a) consuming low volume streams, e.g. in a test environment and (b) out of memory exceptions in production.

It would be great to hear any recommendations on this, especially from people consuming PubSub in Production in Java or Scala stream processing applications.

Thanks,
Josh

Kir Titievsky

unread,
Jan 16, 2018, 2:27:15 PM1/16/18
to jof...@gmail.com, Google Cloud Pub/Sub Discussions
Josh, I think you will find our official Java client library to be excellent, especially at high throughput (>> MB/s).  Do take a look at individual guides on publishing and subscribing in the documentation: you may need to adjust batching (buffering), retry and other settings to get the most out of the library for the highest throughput.  

You can test its performance with a framework we publish.

Might I ask what made you chose Akka streams over other stream processing frameworks, such as Apache Beam, Spark, Flink, or Kafka Streams?

Kir
Product Manager
Google Cloud Pub/Sub


--
You received this message because you are subscribed to the Google Groups "Google Cloud Pub/Sub Discussions" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cloud-pubsub-dis...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cloud-pubsub-discuss/2eae69d0-84f8-4d63-b9b3-8880055e6495%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
Kir Titievsky | Product Manager | Google Cloud Pub/Sub 

Josh F

unread,
Jan 16, 2018, 5:25:20 PM1/16/18
to Google Cloud Pub/Sub Discussions
Hi Kir,

Thanks for the reply.

Part of the reason we've been looking at using akka-streams is that we have (and want to build a lot more) jobs which follow this flow:

PubSub => akka-streams => simple transformation of each message => POST to a 3rd party REST API

It's a relatively simple stream processing use case, so we felt we didn't need the capabilities of something like Beam for this (we're not doing any kind of windowing, group by or joins with other streams). Having said that we could probably write our own Sink for doing the REST calls and use Beam for this. Would you recommend using Beam (or one of the other frameworks) for a use case like this?

The other reason for using akka-streams was just that a lot of the team like working in Scala, so we wanted to try something that has a nice Scala API :) One option might be for us to write our own akka-streams connector using the official Java library. But would be good to explore other options, if akka-streams doesn't seem like the best tool for the job!

Josh

On Tuesday, January 16, 2018 at 7:27:15 PM UTC, Kir Titievsky wrote:
Josh, I think you will find our official Java client library to be excellent, especially at high throughput (>> MB/s).  Do take a look at individual guides on publishing and subscribing in the documentation: you may need to adjust batching (buffering), retry and other settings to get the most out of the library for the highest throughput.  

You can test its performance with a framework we publish.

Might I ask what made you chose Akka streams over other stream processing frameworks, such as Apache Beam, Spark, Flink, or Kafka Streams?

Kir
Product Manager
Google Cloud Pub/Sub


On Tue, Jan 16, 2018 at 1:46 PM Josh F <jof...@gmail.com> wrote:
Hi all,

Does anyone have recommendations for a library for consuming high-volume (thousands messages/sec) PubSub subscriptions? 

We are using Scala + akka-streams, so would need either a Java or Scala library.

We've looked at using Alpakka - https://github.com/akka/alpakka/tree/master/google-cloud-pub-sub/src - but the consumer seems primitive, and uses the REST/HTTP API instead of the gRPC API.

We've also looked at using this library - https://github.com/QubitProducts/akka-cloudpubsub - but have had issues with (a) consuming low volume streams, e.g. in a test environment and (b) out of memory exceptions in production.

It would be great to hear any recommendations on this, especially from people consuming PubSub in Production in Java or Scala stream processing applications.

Thanks,
Josh

--
You received this message because you are subscribed to the Google Groups "Google Cloud Pub/Sub Discussions" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cloud-pubsub-discuss+unsub...@googlegroups.com.

Kir Titievsky

unread,
Jan 18, 2018, 4:12:32 PM1/18/18
to Josh F, Google Cloud Pub/Sub Discussions
You can't beat familiarity: if the team is happy with Scala, that's important.  What might be interesting is https://github.com/spotify/scio -- a Scala API for beam. That said, it might be an overkill.

Some interesting alternatives for single-message, non-windowing transformations -- that don't take advantage of your existing expertise, but possibly get you much nicer infastructure and ops -- are Cloud Functions (node) and App Engine standard with Java Runtimes.  Both are perfect for this kind of request-response logic.  You don't get Akka, but you get a very nice admin & scaling experience. 

k

To unsubscribe from this group and stop receiving emails from it, send an email to cloud-pubsub-discuss+unsubscrib...@googlegroups.com.


--
Kir Titievsky | Product Manager | Google Cloud Pub/Sub 

--
You received this message because you are subscribed to the Google Groups "Google Cloud Pub/Sub Discussions" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cloud-pubsub-discuss+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages