asyncBatchAnnotate + multi region GCS bucket frequent http 500 errors

67 views
Skip to first unread message

Matthew Lenz

unread,
Jun 6, 2022, 12:01:53 PM6/6/22
to cloud-vision-discuss
When submitting docs stored on a GCS bucket i'm seeing frequent 500 errors which all started in the evening of May 31st and have continued to present.

Has anyone else been experiencing this issue recently?

Eduardo Ortiz Caraveo

unread,
Jun 6, 2022, 4:24:37 PM6/6/22
to cloud-vision-discuss

You can try to follow the next possible solutions for your case and see if it helps you.

  • For HTTP trigger-based functions, have the client implement exponential backoff and retries for requests that must not be dropped.

  • For background / event-driven functions, Cloud Functions supports at least once delivery. Even without explicitly enabling retry, the event is automatically re-delivered and the function execution will be retried. See Retrying Event-Driven Functions for more information.

Matthew Lenz

unread,
Jun 6, 2022, 6:18:35 PM6/6/22
to cloud-vision-discuss
We already have retries built into our application so that isn't a concern.

This feels more like an newly unidentified latency issue with multi region gcs buckets or an issue with gcv.  Since we aren't seeing these same issues with other parts of our application that communicate with gcs buckets my guess is gcv. 

I thought I'd post here to see if others are noticing this issue as well before I report it to the issue tracker (because I've found reporting issues to  be a waste of time).

Eduardo Ortiz Caraveo

unread,
Jun 7, 2022, 12:26:18 PM6/7/22
to cloud-vision-discuss
As you are mentioning and I did some research, I couldn't find any other post or person that is facing this issue, what I would recommend is that you go ahead and raise the issue tracker so this will help future people that might face the same issue you are currently facing. maybe you could file a support ticket.

Matthew Lenz

unread,
Jun 7, 2022, 12:35:16 PM6/7/22
to cloud-vision-discuss
Right.  We don't pay for support and unless that changed you can only get billing issue support.  We tried paid service one month and found it to be just as terrible of an experience as the typical issue tracker responses (sorry).

Eduardo Ortiz Caraveo

unread,
Jun 8, 2022, 5:04:48 PM6/8/22
to cloud-vision-discuss
I understand you have been facing difficulties with the support you have raised so far, but, after doing an internal deep search I found out a similar issue was reported on May 20th and resolved that same day. This points me to believe you are facing an individual issue with your project which can be more efficiently resolved with a support case or an Issue Tracker because there we can inspect your project.

On the other hand, could you please provide a more detailed explanation of the conditions for the error 500 you are facing? A potential solution you could try is to lower the batch size you are sending.

Matthew Lenz

unread,
Jun 9, 2022, 8:43:45 AM6/9/22
to cloud-vision-discuss
Here is a sample of the responses we're seeing from asyncBatchAnnotate:

500 Internal Server Error
{
  "error": {
    "code": 500,
    "message": "Error opening file: gs://bucket/path/file.pdf.",
    "status": ""INTERNAL""
  }
}

I've obfuscated the actual gs path.  We'd never seen these errors start happening until May 31st.  Now we're seeing it a dozen or more times per day. We're seeing this error on roughly 0.5% of the time.  We only submit one doc per batch (so it's not a batch size issue).   Appears to be some kind of communication issue between GCV and GCS.

Brendan Lundy

unread,
Jun 9, 2022, 12:00:59 PM6/9/22
to cloud-vision-discuss
Are the permissions the same for the file that succeeds and the file that doesn't succeed?

Can you double-check if the file is there before you call the api?

For a file that fails, if you retry it, will it always fail with that same error or does it pass on the retry?

In the past did you ever get an error trying to create the long-running operation? If so, do you still get that error as well?

Matthew Lenz

unread,
Jun 9, 2022, 12:27:29 PM6/9/22
to cloud-vision-discuss
Replied directly to Brendan.  Adding responses here as well.

On Thu, Jun 9, 2022 at 11:01 AM 'Brendan Lundy' via cloud-vision-discuss <cloud-visi...@googlegroups.com> wrote:
Are the permissions the same for the file that succeeds and the file that doesn't succeed?

Yes, permissions are all the same.  The file is being placed on GCS as part of a different job that runs prior to this one.  If the file wasn't uploaded successfully it wouldn't even make it this far.
 
Can you double-check if the file is there before you call the api?

I can try but this failure is only happening ~0.5% of the time and see the first question response.
 
For a file that fails, if you retry it, will it always fail with that same error or does it pass on the retry?

It works when attempting to retry it.  Default retry is 3 minutes.  Although I did spot one that failed again after 3 minutes.   Maybe it's some kind of latency issue with GCS. But if that were the case we'd see this in other ways as well.  We are constantly uploading docs into GCS and then they are downloaded shortly after by other processes without error.
 
In the past did you ever get an error trying to create the long-running operation? If so, do you still get that error as well?

If you mean did we ever see this same issue in the past?  Not before May 31st around 7pm us/central.
 
On Thursday, June 9, 2022 at 5:43:45 AM UTC-7 Matthew Lenz wrote:
Here is a sample of the responses we're seeing from asyncBatchAnnotate:

500 Internal Server Error
{
  "error": {
    "code": 500,
    "message": "Error opening file: gs://bucket/path/file.pdf.",
    "status": ""INTERNAL""
  }
}

I've obfuscated the actual gs path.  We'd never seen these errors start happening until May 31st.  Now we're seeing it a dozen or more times per day. We're seeing this error on roughly 0.5% of the time.  We only submit one doc per batch (so it's not a batch size issue).   Appears to be some kind of communication issue between GCV and GCS.


On Wednesday, June 8, 2022 at 4:04:48 PM UTC-5 ortizc...@google.com wrote:
I understand you have been facing difficulties with the support you have raised so far, but, after doing an internal deep search I found out a similar issue was reported on May 20th and resolved that same day. This points me to believe you are facing an individual issue with your project which can be more efficiently resolved with a support case or an Issue Tracker because there we can inspect your project.

On the other hand, could you please provide a more detailed explanation of the conditions for the error 500 you are facing? A potential solution you could try is to lower the batch size you are sending.

On Tuesday, June 7, 2022 at 10:35:16 AM UTC-6 Matthew Lenz wrote:
Right.  We don't pay for support and unless that changed you can only get billing issue support.  We tried paid service one month and found it to be just as terrible of an experience as the typical issue tracker responses (sorry).

--
© 2018 Google Inc. 1600 Amphitheatre Parkway, Mountain View, CA 94043
 
Email preferences: You received this email because you signed up for the Google Cloud Vision Discussion Google Group (cloud-visi...@googlegroups.com) to participate in discussions with other members of the Google Cloud Vision community and the Google Cloud Vision Team.
---
You received this message because you are subscribed to a topic in the Google Groups "cloud-vision-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/cloud-vision-discuss/guvQA5CPBU0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to cloud-vision-dis...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cloud-vision-discuss/c0dd2174-3074-4a81-b880-9c11049f6db6n%40googlegroups.com.

Brendan Lundy

unread,
Jun 21, 2022, 9:12:31 AM6/21/22
to cloud-vision-discuss
Hi Matthew,

We think we found the cause and are rolling out a fix this week. Let us know if you see it happening after this week.

Thanks,

Brendan

Matthew Lenz

unread,
Jun 21, 2022, 9:35:02 AM6/21/22
to cloud-vision-discuss
Thanks! Glad you were able to spot it.   If you need any more details I can get you the project, zone/region, and/or bucket info in a private email.

Reply all
Reply to author
Forward
0 new messages