Lots of duplicate messages using Cloud Pub/Sub

3,960 views
Skip to first unread message

Alexey Duzhiy

unread,
Jan 26, 2016, 3:44:39 AM1/26/16
to Google Cloud Pub/Sub Discussions
Hi,

I have following application configuration:

Cloud Pub/Sub -----> 1 B4 instance (message processor)

I'm using push subscription. I'm testing throughput of this pair by supplying 3000 messages to topic which then pushed to message processor. Message processor performs some computations and save them to datastore. I use Google Memcached and Pus/Sub message_id  to prevent duplicates received from Cloud Pub/Sub. I've read in lot of places that duplicates are rarely possible. But in my case I'm still receiving duplicates after > 16h. I left test to run overnight and here is what monitoring console shows me: 


Ack deadline for this subscription is set to 120 seconds. I see duplicates in logs more than 30000 times. Oldest entity in Memcached is also > 16h so problem not in deduplication. Any ideas why do I observe such weird behavior?

Kamal Aboul-Hosn

unread,
Jan 26, 2016, 12:49:12 PM1/26/16
to Google Cloud Pub/Sub Discussions
Hi, Alexey,

Receiving duplicates should generally be rare assuming the acknowledgements are being sent back to the server correctly and in time. Can you please confirm that you are responding to the received push messages with an appropriate HTTP status code (200-204, 102) in less time than the ack deadline? More information is available in the subscriber documentation under "Receiving push messages."

Thanks!

Kamal

Alexey Duzhiy

unread,
Jan 27, 2016, 3:23:19 AM1/27/16
to Google Cloud Pub/Sub Discussions
Hi, 

Yes I made sure all the request returns 200 response in less than 15 seconds (according to google statistics 99% - 8307 ms, 98% - 2709 ms) but it doesn't seem to help. We wrote simple test - created endpoint that emulates processing by simply sleeping for some time and here is what we've seen:
1. Sent 11 messages to Pub/Sub:
  a. If processing time is 11 sec - no duplicates
  b. If processing time is 12 sec - starting to send duplicates
  c. If we send 12 messages with 11 sec processing time - starting to send duplicates

2. Sent 20 messages to Pub/Sub:
  a. If processing time is 9 sec - no duplicates
  b. If processing time is 10 sec - starting to send duplicates
  c. If we send 21 messages with 9 sec processing time - starting to send duplicates

For processing time 1s we are able to handle around 70 messages - then we will start to receive duplicates. Pub/Sub behavior seems to be kind of unpredictable to me. I've read about slow start algorithm - but I don't observe it in logs. In all cases described above we sent bulk of messages to Pub/Sub and then Pub/Sub sends all messages to our endpoint. Even though we send 200 response for each of message for some reason Pub/Sub thinks that some of messages were undelivered and resends whole bulk again. Could you please clarify a bit what could be reason of such behavior and maybe provide some little insight how Pub/Sub works?

P.s. Ack deadline - 120 seconds

Regards,
Alexey

Dmitriy Avseiytsev

unread,
Feb 7, 2016, 5:34:29 AM2/7/16
to Google Cloud Pub/Sub Discussions
I have the same issue. I set up 1 B4 instance to process messages from Pub/Sub, then I sent 255 messages.
As you can see from the screenshot below these messages were not processed during hours! But when I deployed 4 B4 instances, it took 5 seconds to process them.

Average time of processing single message in common case is about 50ms. But as I can see Pub/Sub sends me all these 255 messages at once.
Single instance can handle up to 10 messages at once because of GAE limitations. So, while this instance processes these 10 requests, all the other
requests hang in a queue. Last requests are processed in about 10 seconds because they wait until all previous requests are processed.

My app respond with 200 code on all the requests but Pub/Sub sends me all the messages over and over.
As I understood from documentation if application processes messages slowly, Pub/Sub has to decrease number of parallel requests.
But I cannot se this in logs.

Can I manually set max number of parallel requests in push mode?



среда, 27 января 2016 г., 10:23:19 UTC+2 пользователь Alexey Duzhiy написал:

Takashi Matsuo

unread,
Feb 16, 2016, 1:45:38 PM2/16/16
to Alexey Duzhiy, Google Cloud Pub/Sub Discussions
Hi Alexey,

You're using 120 ack deadline and you see:

On Wed, Jan 27, 2016 at 12:23 AM, Alexey Duzhiy <ale...@wix.com> wrote:
Hi, 

Yes I made sure all the request returns 200 response in less than 15 seconds (according to google statistics 99% - 8307 ms, 98% - 2709 ms) but it doesn't seem to help. We wrote simple test - created endpoint that emulates processing by simply sleeping for some time and here is what we've seen:
1. Sent 11 messages to Pub/Sub:
  a. If processing time is 11 sec - no duplicates
  b. If processing time is 12 sec - starting to send duplicates
  c. If we send 12 messages with 11 sec processing time - starting to send duplicates

It looks like your backend only processes the requests one by one at a time?

11 * 11 = 121 (I assume this is considered within the ack deadline)
11 * 12 = 132 
12 * 10 = 120
12 * 11 = 132

What kind of server are you using for processing the message?


2. Sent 20 messages to Pub/Sub:
  a. If processing time is 9 sec - no duplicates
  b. If processing time is 10 sec - starting to send duplicates
  c. If we send 21 messages with 9 sec processing time - starting to send duplicates


Numbers here don't match the hypothesis above. How do you send the message?
 
For processing time 1s we are able to handle around 70 messages - then we will start to receive duplicates. Pub/Sub behavior seems to be kind of unpredictable to me. I've read about slow start algorithm - but I don't observe it in logs. In all cases described above we sent bulk of messages to Pub/Sub and then Pub/Sub sends all messages to our endpoint. Even though we send 200 response for each of message for some reason Pub/Sub thinks that some of messages were undelivered and resends whole bulk again. Could you please clarify a bit what could be reason of such behavior and maybe provide some little insight how Pub/Sub works?

I usually don't see such a behavior. I'm interested in reproducing this, so it's great if you can send us the code for reproducing this.

Thanks,
 

P.s. Ack deadline - 120 seconds

Regards,
Alexey


On Tuesday, January 26, 2016 at 7:49:12 PM UTC+2, Kamal Aboul-Hosn wrote:
Hi, Alexey,

Receiving duplicates should generally be rare assuming the acknowledgements are being sent back to the server correctly and in time. Can you please confirm that you are responding to the received push messages with an appropriate HTTP status code (200-204, 102) in less time than the ack deadline? More information is available in the subscriber documentation under "Receiving push messages."

Thanks!

Kamal

On Tuesday, January 26, 2016 at 3:44:39 AM UTC-5, Alexey Duzhiy wrote:
Hi,

I have following application configuration:

Cloud Pub/Sub -----> 1 B4 instance (message processor)

I'm using push subscription. I'm testing throughput of this pair by supplying 3000 messages to topic which then pushed to message processor. Message processor performs some computations and save them to datastore. I use Google Memcached and Pus/Sub message_id  to prevent duplicates received from Cloud Pub/Sub. I've read in lot of places that duplicates are rarely possible. But in my case I'm still receiving duplicates after > 16h. I left test to run overnight and here is what monitoring console shows me: 


Ack deadline for this subscription is set to 120 seconds. I see duplicates in logs more than 30000 times. Oldest entity in Memcached is also > 16h so problem not in deduplication. Any ideas why do I observe such weird behavior?

--
You received this message because you are subscribed to the Google Groups "Google Cloud Pub/Sub Discussions" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cloud-pubsub-dis...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cloud-pubsub-discuss/c86d0ede-1bbc-47f3-9794-8ac845ce0672%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Takashi Matsuo | Developers Programs Engineer | tma...@google.com
Reply all
Reply to author
Forward
0 new messages