Archiving messages

1,955 views
Skip to first unread message

Suman Biswas

unread,
Aug 20, 2017, 8:51:30 PM8/20/17
to Google Cloud Pub/Sub Discussions
Hi,

We want to archive messages. As part of message data we are getting unique id from publisher. Our planning is to add/update archive data with respect to unique id shared by publisher. What approach would you suggest?

Thanks,
Suman

Kir Titievsky

unread,
Aug 21, 2017, 10:58:23 AM8/21/17
to Suman Biswas, Google Cloud Pub/Sub Discussions
Suman, 

Could you tell us how you will use the archive?  The general answer to this is to add your publisher-specified message id either as a field in your message or as a message attribute.  Then use Cloud Dataflow to write Pub/Sub messages, including your publisher-specified ID into BigQuery.  You can then filter, aggregate, etc. by publisher-specified id in BQ. Does that sound like a way forward?

--
You received this message because you are subscribed to the Google Groups "Google Cloud Pub/Sub Discussions" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cloud-pubsub-discuss+unsub...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cloud-pubsub-discuss/714d3f98-8773-4020-8a41-bf41236db3a0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Kir Titievsky | Product Manager | Google Cloud Pub/Sub 

Suman Biswas

unread,
Aug 21, 2017, 8:43:34 PM8/21/17
to Google Cloud Pub/Sub Discussions, suman....@gmail.com
Hi Kir,

We have a upstream which will publish data to topic and there will be multiple downstream systems to process data daily basis for which we will have multiple subscription (pull) processes. If there is a addition of a new downstream system , we need to plan for initial load also. To achieve this we don't want to publish full data set from upstream to topic again that's why we are planning to archive all messages and expose it to newly added downstream system for initial load.  What would be the best approach to achieve this?


Thanks,
Suman

Yannick (Cloud Platform Support)

unread,
Sep 1, 2017, 2:41:26 PM9/1/17
to Google Cloud Pub/Sub Discussions, suman....@gmail.com
Hello Suman, as Kir pointed out you could use a streaming Dataflow pipeline to archive all messages published to your topic to BigQuery, Datastore or another storage medium of your choice.

As for the initial load of a new subscriber, you could start by creating a subscription to the main topic for your new subscriber so it can begin receiving new messages as they are published. You can then use any method you desire for getting the archived data into your new subscriber. The simplest way would probably be to have your new subscriber read your storage media directly and import the archive of old messages, ignoring any duplicates. You could also create a topic for the exclusive purpose of preparing this new subscriber and use another Dataflow pipeline to read from your archive and publish the messages to this new topic, and use that for the initial load of Pub/Sub messages.

I hope this helps.
Reply all
Reply to author
Forward
0 new messages