Does Dataproc support stream processing?

40 views
Skip to first unread message

Paresh Sahare

unread,
Apr 4, 2024, 7:42:31 AM4/4/24
to Google Cloud Dataproc Discussions
Does Dataproc support stream processing?

Richard Holowczak

unread,
Apr 4, 2024, 2:54:01 PM4/4/24
to Google Cloud Dataproc Discussions
Short answer is Yes through Spark.  I have used examples with Spark Structured Streaming.
Some configuration is required, for example, if you want to consume a Kafka stream.
If you are interested in an example written in PySpark, please let me know.

Rich H.

Alexander Goida

unread,
Apr 5, 2024, 5:24:17 AM4/5/24
to Google Cloud Dataproc Discussions
Hi,
Dataproc by itself is an infrastructure component. Streaming might be incorporated using different approaches. As mentioned before, it could be Spark Streaming. Or it might be polling messages from some messaging system and then utilizing Spark to process received data. For example, in our scenario, we are reading Pub/Sub messages and then processing them with Spark (https://medium.com/p/0e33b046372d).

@Richard Holowczak
It would be very interesting to see your approach. If this won't take to much of your time. Thank you.

Regards,
Reply all
Reply to author
Forward
0 new messages