We created three Pub/Sub triggered cloud function two months ago. All three functions are showing the same problem A few times a day metrics reports there is an unacked message for 10 minutes. 10 minutes is the timeout for a push subscription to redeliver a message it has not received an ack for. We are wondering if other people are experiencing this problem?
To identify the issue open metrics explorer and select Cloud Pub/Sub Subscription as resource and Oldest Unacked Message Age as Metric.
We implemented logging in the functions that, upon start, report the messageID and publish time. These messages do not appear in the logs prior to the function starting and all report a 10 minute publish time matching up to metrics.
We have looked at the number acked messages, non-200 response codes, number of active function processes, etc. before, during and after these 10 minute messages. Nothing correlates consistently.
What we suspect is happening is that the subscription is pushing the message to the function processor and the function processor is, for some reason, just dropping the message. If the function processor was not able to pass the message to a function instance it should respond with a 500 and log an error. We do not see this error. Furthermore, the 500 should trigger the subscription to push the message again, immediately. We do not see indication of this happening because the message does not appear in the logs for 10 more minutes and there is report of multiple 500 errors at this time.