Working with unicode filenames on cloudstorage?

665 views
Skip to first unread message

Attila-Mihaly Balazs

unread,
Feb 13, 2017, 9:04:38 AM2/13/17
to Google App Engine
Hello,

I'm trying to create / delete files on google cloudstorage with unicode characters in the name. GCS seems to support this alright, however the client APIs seem to have problems.

- Environment: Google Appengine Standard with Python 2.7. Tested locally with dev_server 1.9.50. The library used (both locally and in production) is the latest GoogleAppEngineCloudStorageClient [1] (1.9.22.1)

- Some code snippets:

        fn = u'/%s/á' % GS_BUCKET
        with gcs.open(fn, 'w') as f_out: f_out.write('old')

This gets me the traceback at [2]

- Ok, how about manually encoding it?

        fn = u'/%s/á' % GS_BUCKET
        with gcs.open(fn.encode('utf-8'), 'w') as f_out: f_out.write('old')

This just gets me several retries until the request is aborted [3]

- There is the option of encoding it, however I don't believe this is correct (this is just double quoting the filename):

        fn = u'/%s/á' % GS_BUCKET
        with gcs.open(urllib.quote(fn.encode('utf-8')), 'w') as f_out: f_out.write('old')

- I believe gcs.delete is also affected since I have the following traceback in production [4]

How are unicode filenames supposed to be used with the cloudstorage library? Also, is the cloudstorage library supposed to be used at all? Looking around I found the google-cloud-storage library on PyPi (https://pypi.python.org/pypi/google-cloud-storage) which also seems to be an official Google project and perhaps has better support for unicode?

Attila

[1] https://pypi.python.org/pypi/GoogleAppEngineCloudStorageClient
[2] Traceback with u'...'
/usr/local/google_appengine_1.9.50/google/appengine/dist27/urllib.py:1277: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
  return ''.join(map(quoter, s))
ERROR    2017-02-13 13:52:30,401 webapp2.py:1552] u'\xe1'
Traceback (most recent call last):
  File "/usr/local/google_appengine_1.9.50/lib/webapp2-2.5.2/webapp2.py", line 1535, in __call__
    rv = self.handle_exception(request, response, e)
  File "/usr/local/google_appengine_1.9.50/lib/webapp2-2.5.2/webapp2.py", line 1529, in __call__
    rv = self.router.dispatch(request, response)
  File "/usr/local/google_appengine_1.9.50/lib/webapp2-2.5.2/webapp2.py", line 1278, in default_dispatcher
    return route.handler_adapter(request, response)
  File "/usr/local/google_appengine_1.9.50/lib/webapp2-2.5.2/webapp2.py", line 1102, in __call__
    return handler.dispatch()
  File "/usr/local/google_appengine_1.9.50/lib/webapp2-2.5.2/webapp2.py", line 572, in dispatch
    return self.handle_exception(e, self.app.debug)
  File "/usr/local/google_appengine_1.9.50/lib/webapp2-2.5.2/webapp2.py", line 570, in dispatch
    return method(*args, **kwargs)
  File "/tmp/gcs_test_unicode_names/main.py", line 14, in get
    with gcs.open(fn, 'w') as f_out: f_out.write('old')
  File "/tmp/gcs_test_unicode_names/lib/cloudstorage/cloudstorage_api.py", line 91, in open
    filename = api_utils._quote_filename(filename)
  File "/tmp/gcs_test_unicode_names/lib/cloudstorage/api_utils.py", line 94, in _quote_filename
    return urllib.quote(filename)
  File "/usr/local/google_appengine_1.9.50/google/appengine/dist27/urllib.py", line 1277, in quote
    return ''.join(map(quoter, s))
KeyError: u'\xe1'
----
[3] Traceback with manually encoded filename
---
ERROR    2017-02-13 13:53:44,679 module.py:892] Request to '/_ah/gcs/app_default_bucket/\xc3\xa1' failed
INFO     2017-02-13 13:53:44,680 module.py:806] default: "POST /_ah/gcs/app_default_bucket/%C3%A1 HTTP/1.1" 500 -
ERROR    2017-02-13 13:53:44,790 module.py:892] Request to '/_ah/gcs/app_default_bucket/\xc3\xa1' failed
INFO     2017-02-13 13:53:44,791 module.py:806] default: "POST /_ah/gcs/app_default_bucket/%C3%A1 HTTP/1.1" 500 -
ERROR    2017-02-13 13:53:45,003 module.py:892] Request to '/_ah/gcs/app_default_bucket/\xc3\xa1' failed
INFO     2017-02-13 13:53:45,004 module.py:806] default: "POST /_ah/gcs/app_default_bucket/%C3%A1 HTTP/1.1" 500 -
ERROR    2017-02-13 13:53:45,416 module.py:892] Request to '/_ah/gcs/app_default_bucket/\xc3\xa1' failed
INFO     2017-02-13 13:53:45,416 module.py:806] default: "POST /_ah/gcs/app_default_bucket/%C3%A1 HTTP/1.1" 500 -
ERROR    2017-02-13 13:53:46,255 module.py:892] Request to '/_ah/gcs/app_default_bucket/\xc3\xa1' failed
INFO     2017-02-13 13:53:46,255 module.py:806] default: "POST /_ah/gcs/app_default_bucket/%C3%A1 HTTP/1.1" 500 -
ERROR    2017-02-13 13:53:47,868 module.py:892] Request to '/_ah/gcs/app_default_bucket/\xc3\xa1' failed
INFO     2017-02-13 13:53:47,868 module.py:806] default: "POST /_ah/gcs/app_default_bucket/%C3%A1 HTTP/1.1" 500 -
ERROR    2017-02-13 13:53:51,082 module.py:892] Request to '/_ah/gcs/app_default_bucket/\xc3\xa1' failed
INFO     2017-02-13 13:53:51,082 module.py:806] default: "POST /_ah/gcs/app_default_bucket/%C3%A1 HTTP/1.1" 500 -
ERROR    2017-02-13 13:53:51,084 webapp2.py:1552] Expect status [201] from Google Storage. But got status 500.
Path: '/app_default_bucket/%C3%A1'.
Request headers: {'x-goog-api-version': '2', 'x-goog-resumable': 'start', 'accept-encoding': 'gzip, *'}.
Response headers: {'server': 'Development/2.0', 'date': 'Mon, 13 Feb 2017 13:53:51 GMT', 'transfer-encoding': 'chunked'}.
Body: ''.
Extra info: None.
Traceback (most recent call last):
  File "/usr/local/google_appengine_1.9.50/lib/webapp2-2.5.2/webapp2.py", line 1535, in __call__
    rv = self.handle_exception(request, response, e)
  File "/usr/local/google_appengine_1.9.50/lib/webapp2-2.5.2/webapp2.py", line 1529, in __call__
    rv = self.router.dispatch(request, response)
  File "/usr/local/google_appengine_1.9.50/lib/webapp2-2.5.2/webapp2.py", line 1278, in default_dispatcher
    return route.handler_adapter(request, response)
  File "/usr/local/google_appengine_1.9.50/lib/webapp2-2.5.2/webapp2.py", line 1102, in __call__
    return handler.dispatch()
  File "/usr/local/google_appengine_1.9.50/lib/webapp2-2.5.2/webapp2.py", line 572, in dispatch
    return self.handle_exception(e, self.app.debug)
  File "/usr/local/google_appengine_1.9.50/lib/webapp2-2.5.2/webapp2.py", line 570, in dispatch
    return method(*args, **kwargs)
  File "/tmp/gcs_test_unicode_names/main.py", line 14, in get
    with gcs.open(fn.encode('utf-8'), 'w') as f_out: f_out.write('old')
  File "/tmp/gcs_test_unicode_names/lib/cloudstorage/cloudstorage_api.py", line 95, in open
    return storage_api.StreamingBuffer(api, filename, content_type, options)
  File "/tmp/gcs_test_unicode_names/lib/cloudstorage/storage_api.py", line 699, in __init__
    body=content)
  File "/tmp/gcs_test_unicode_names/lib/cloudstorage/errors.py", line 141, in check_status
    raise ServerError(msg)
ServerError: Expect status [201] from Google Storage. But got status 500.
Path: '/app_default_bucket/%C3%A1'.
Request headers: {'x-goog-api-version': '2', 'x-goog-resumable': 'start', 'accept-encoding': 'gzip, *'}.
Response headers: {'server': 'Development/2.0', 'date': 'Mon, 13 Feb 2017 13:53:51 GMT', 'transfer-encoding': 'chunked'}.
Body: ''.
Extra info: None.
---
[4] Traceback from production for .delete
---
Traceback (most recent call last):
  File "/base/data/home/runtimes/python27_experiment/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 1535, in __call__
    rv = self.handle_exception(request, response, e)
...
    gcs.delete(entry.filename)
...
    filename = api_utils._quote_filename(filename)
...
    return urllib.quote(filename)
  File "/base/data/home/runtimes/python27_experiment/python27_dist/lib/python2.7/urllib.py", line 1277, in quote
    return ''.join(map(quoter, s))
KeyError: u'\u0411'
---

Nick (Cloud Platform Support)

unread,
Feb 13, 2017, 4:37:30 PM2/13/17
to Google App Engine
Hey Attila,

This would be an excellent post to make on Stack Overflow, as this forum isn't quite meant for specific-issue technical support. We monitor our tags on Stack quite actively and there is also a much larger user-based there, so be sure to cross post at least this post, and in future consider simply posting to Stack - as of course, since we're there, there's no downside, only upsides of access to more users and keeping this forum on-topic. This forum is meant for general high level free-form discussion about the platform and services, architecture advice, etc. (basically things that don't fit on Stack Overflow).

Now, that said, I can help a bit. It seems that other users have found this exact issue with attempting to use the appengine gcs client library against the Development Server. From this I can ask two questions right away:

1. Have you tried running this code in production, and does it also fail? (any of the various versions attempted)

2. Have you tried using the other client library, which has been updated far more recently? You had linked it in your original post. Of course this would only be dodging the question of what's going on with the original (and official) library.

I'll be attempting to reproduce this issue while you get back to me on this. Feel free to add in any extra information or questions you have. 

Cheers,

Nick
Cloud Platform Community Support

Nick (Cloud Platform Support)

unread,
Feb 24, 2017, 12:48:47 PM2/24/17
to Google App Engine
Hey Attila,

Through testing, I've determined that in fact the pattern urllib.quote (unicodestr.encode ('utf-8')) does in fact work. Have you seen this on your end? I've attached to this post an example project which illustrates the use. Send it a request like so to see it at work:

curl -v localhost:8080/?method=enc-url

I hope this is helpful in at least comparing what I've done with your case. Let me know if the url-encoding method doesn't work. This is the proper way to send utf-8 data in a URL, to encode it and URL quote it (resulting in the URL encoding of the bytes. In my case, the string u'☸' is encoded to '%E2%98%B8').


Cheers,

Nick
Cloud Platform Community Support

On Monday, February 13, 2017 at 9:04:38 AM UTC-5, Attila-Mihaly Balazs wrote:
pyunicodegcs.tgz
Reply all
Reply to author
Forward
0 new messages