Question for users of Pub/Sub triggered cloud functions

1,191 views
Skip to first unread message

Daniel Washko

unread,
Oct 12, 2021, 5:12:04 AM10/12/21
to Google Cloud Developers
We created three Pub/Sub triggered cloud function two months ago. All three functions are showing the same problem A few times a day metrics reports there is an unacked message for 10 minutes. 10 minutes is the timeout for a push subscription to redeliver a message it has not received an ack for. We are wondering if other people are experiencing this problem?

To identify the issue open metrics explorer and select Cloud Pub/Sub Subscription as resource and Oldest Unacked Message Age as Metric. 

We implemented logging in the functions that, upon start, report the messageID and publish time. These messages do not appear in the logs prior to the function starting and all report a 10 minute publish time matching up to metrics. 

We have looked at the number acked messages, non-200 response codes, number of active function processes, etc. before, during and after these 10 minute messages. Nothing correlates consistently. 

What we suspect is happening is that the subscription is pushing the message to the function processor and the function processor is, for some reason, just dropping the message. If the function processor was not able to pass the message to a function instance it should respond with a 500 and log an error. We do not see this error. Furthermore, the 500 should trigger the subscription to push the message again, immediately. We do not see indication of this happening because the message does not appear in the logs for 10 more minutes and there is report of multiple 500 errors at this time. 

Daniel Washko

unread,
Oct 12, 2021, 10:02:24 AM10/12/21
to Google Cloud Developers
That last sentence should read "the message does not appear in the logs for 10 more minutes and there is NO report of multiple 500 errors at this time."

Jaime Martin Agui

unread,
Oct 14, 2021, 8:59:50 AM10/14/21
to Google Cloud Developers
If this issue is happening in a small percentage of the Cloud Function invocations, you can modify the "Acknowledgement deadline" option of the subscription to expect an ack in less that 10 minutes (between 10 and 600 seconds), so that Cloud Pub/Sub re-delivers the message. Have a look at the "Retry policy" option to determine if you would like to use exponential backoff delay or inmediate retries when this situation occurs.

Daniel Washko

unread,
Oct 14, 2021, 1:46:20 PM10/14/21
to Google Cloud Developers
Thank you for the response and tips to mitigate the problem. We are aware that we can make these adjustments, but they don't address the problem, they attempt to mitigate it. The problem here is that the function processor is dropping messages.

Given that we see this in three different functions in our project, I am curious if anyone else using a Pub/Sub trigger backed function is seeing this behavior. 

Google Cloud Developers

unread,
Oct 15, 2021, 8:35:28 AM10/15/21
to Google Cloud Developers

Hi there!

The issue you described looks like a technical issue and considering the amount of information required to analyze it, it would be best for you to contact the GCP support[1]  in order to have a detailed research on your situation.

[1]: https://cloud.google.com/support-hub

Daniel Washko

unread,
Oct 15, 2021, 10:27:23 AM10/15/21
to Google Cloud Developers
We opened a case with google support back in September. This posting is to see if other people are observing this problem. I found a similar situation in April 2020: https://github.com/googleapis/nodejs-pubsub/issues/959 and as you can see the case was closed without addressing the problem. 

I debated which group to post in as you can find a post in the pub/sub group preceding that report above: https://groups.google.com/g/cloud-pubsub-discuss/c/dwvjstrQhpg/m/TyKfWFqlBQAJ. Based upon our research, though, this does not seem to be a pub/sub issue but a cloud function problem. The message is not being acked or nacked. A nack would trigger an immediate retry by returning a non-200 error per the documentation. We don't see this happening. The only explanation I can offer is that the message is being dropped by the function processor. 

Reply all
Reply to author
Forward
0 new messages