Pubsub idempotent task processing

1,082 views
Skip to first unread message

mar...@bluecore.com

unread,
May 22, 2018, 11:33:10 AM5/22/18
to Google Cloud Pub/Sub Discussions
Kir Titievsky :

There's been quite a few discussions here and elsewhere on how to use pubsub to handle tasks in an at-most-once execution manner.
For our case, we want to prevent calling external API's twice with the same payload at the end of a task processing chain.
Right now the options seem to be using the datastore and create transaction ids for each pubsub message and use that as a lock.
Once concern there is increase in cost (datastore reads/writes per message and throughput).
Instead of the datastore, using Redis may be another option especially given that Google recently came out with a managed version (although
we'd have to take the risk on implementing production code on top of a product that's in beta).
Another approach may be using a product like Apache Zookeeper but that seems like large undertaking to setup and manage relative to the problem that we're trying to solve.
Google Cloud tasks is still in Alpha it looks like so for my timeframe (~ 6 months) this doesn't seem like an option for us. 

Is there a consensus on the approach you've seen so far as the pubsub product manager?

Thanks much,
Marcel.

Kir Titievsky

unread,
May 22, 2018, 12:25:06 PM5/22/18
to mar...@bluecore.com, Google Cloud Pub/Sub Discussions
Marcel,

"Exactly-once" generally requires a stateful processing step somewhere.  Using a database as the source of truth and Pub/Sub as a notification and triggering mechanism is the simple, tried and true approach. Datastore is a great choices, but by far not the only choice of a GA database service with transactional mechanics.  Cloud SQL is another great choice. 

An interesting alternative might be Dataflow, which allows you to deduplicate messages and offers exactly-once semantics.   That's a great article to read on the subject.  



--
You received this message because you are subscribed to the Google Groups "Google Cloud Pub/Sub Discussions" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cloud-pubsub-dis...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cloud-pubsub-discuss/65c97042-ecb9-42a5-9e1a-1435e4dc2f7d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
Kir Titievsky | Product Manager | Google Cloud Pub/Sub 

mar...@bluecore.com

unread,
May 22, 2018, 12:49:08 PM5/22/18
to Google Cloud Pub/Sub Discussions
Thanks for the guidance Kir,

I'll check out the article mentioned.


On Tuesday, May 22, 2018 at 12:25:06 PM UTC-4, Kir Titievsky wrote:
Marcel,

"Exactly-once" generally requires a stateful processing step somewhere.  Using a database as the source of truth and Pub/Sub as a notification and triggering mechanism is the simple, tried and true approach. Datastore is a great choices, but by far not the only choice of a GA database service with transactional mechanics.  Cloud SQL is another great choice. 

An interesting alternative might be Dataflow, which allows you to deduplicate messages and offers exactly-once semantics.   That's a great article to read on the subject.  



On Tue, May 22, 2018 at 11:33 AM <mar...@bluecore.com> wrote:
Kir Titievsky :

There's been quite a few discussions here and elsewhere on how to use pubsub to handle tasks in an at-most-once execution manner.
For our case, we want to prevent calling external API's twice with the same payload at the end of a task processing chain.
Right now the options seem to be using the datastore and create transaction ids for each pubsub message and use that as a lock.
Once concern there is increase in cost (datastore reads/writes per message and throughput).
Instead of the datastore, using Redis may be another option especially given that Google recently came out with a managed version (although
we'd have to take the risk on implementing production code on top of a product that's in beta).
Another approach may be using a product like Apache Zookeeper but that seems like large undertaking to setup and manage relative to the problem that we're trying to solve.
Google Cloud tasks is still in Alpha it looks like so for my timeframe (~ 6 months) this doesn't seem like an option for us. 

Is there a consensus on the approach you've seen so far as the pubsub product manager?

Thanks much,
Marcel.

--
You received this message because you are subscribed to the Google Groups "Google Cloud Pub/Sub Discussions" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cloud-pubsub-discuss+unsub...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages