cloudstorage library and URLFetch quotas - workaround?

49 views
Skip to first unread message

Josh Whelchel (Loudr)

unread,
Jul 29, 2015, 2:09:04 PM7/29/15
to Google App Engine
We use Cloud Storage to store large elasticsearch results (from aggregations - so scan+scroll isn't going to work here).

To handle these large aggregations in parallel, we store them as multiline JSON dumps that is sourced from a managed vm.

As a result, to perform parallel processing, many app engine instances will open this file at once, and as a result, hit the URLFetch rate limit because of this documented limitation:

and the calls count against your URL fetch quota, as the library uses the URL Fetch service to interact with Cloud Storage.



Here's the resulting exception:



Here's the code that opens the file:

    import cloudstorage as gcs

    def open_file(path, mode, **kwargs):
        f = gcs.open(path, mode=mode, **kwargs)
        if not f:
            raise Exception("File could not be opened: %s" % path)

        return f

--

We need a method of communicating with Cloud Storage that bypasses the URLFetch quotas and rate limits, or it becomes impossible for us to effectively execute parallel processing.

Is there a method of reading GCS files from App Engine that does not route through URLFetch, much like the datastore API does not incur url fetch rate limits?




I've detailed this question on Stackoverflow as well:

Nick (Cloud Platform Support)

unread,
Jul 31, 2015, 4:41:55 PM7/31/15
to Google App Engine, jo...@loudr.fm
Hey Josh,

It seems as though you got some pretty good answers in the stackoverflow thread. I'll add on my thoughts:
  • You can make a feature request in the public issue tracker with an explanation of your use-case if you'd like to see something implemented
  • You can also look into the use of Datastore to store the temporary results of your process, since this will have better rate-limiting quotas than cloud storage, which isn't really meant for rapid writes such as this. You could also look into BigTable, or any number of distributed databases such as memcached to resolve your issue of temporary file storage.
I hope this has helped you. Feel free to ask any questions you may have, or to go ahead and create a feature request / quota increase request in the public issue tracker.

Best wishes,

Nick
Reply all
Reply to author
Forward
0 new messages