read cloud storage content with "gzip" encoding for "application/octet-stream" type content

279 views
Skip to first unread message

Tom Fishman via StackOverflow

unread,
Jan 2, 2014, 12:13:05 AM1/2/14
to google-appengin...@googlegroups.com

We're using "Google Cloud Storage Client Library" for app engine, with simply "GcsFileOptions.Builder.contentEncoding("gzip")" at file creation time, we got the following problem when reading the file:

com.google.appengine.tools.cloudstorage.NonRetriableException: java.lang.RuntimeException: com.google.appengine.tools.cloudstorage.SimpleGcsInputChannelImpl$1@1c07d21: Unexpected cause of ExecutionException
    at com.google.appengine.tools.cloudstorage.RetryHelper.doRetry(RetryHelper.java:87)
    at com.google.appengine.tools.cloudstorage.RetryHelper.runWithRetries(RetryHelper.java:129)
    at com.google.appengine.tools.cloudstorage.RetryHelper.runWithRetries(RetryHelper.java:123)
    at com.google.appengine.tools.cloudstorage.SimpleGcsInputChannelImpl.read(SimpleGcsInputChannelImpl.java:81)
...


Caused by: java.lang.RuntimeException: com.google.appengine.tools.cloudstorage.SimpleGcsInputChannelImpl$1@1c07d21: Unexpected cause of ExecutionException
    at com.google.appengine.tools.cloudstorage.SimpleGcsInputChannelImpl$1.call(SimpleGcsInputChannelImpl.java:101)
    at com.google.appengine.tools.cloudstorage.SimpleGcsInputChannelImpl$1.call(SimpleGcsInputChannelImpl.java:81)
    at com.google.appengine.tools.cloudstorage.RetryHelper.doRetry(RetryHelper.java:75)
    ... 56 more
Caused by: java.lang.IllegalStateException: com.google.appengine.tools.cloudstorage.oauth.OauthRawGcsService$2@1d8c25d: got 46483 > wanted 19823
    at com.google.common.base.Preconditions.checkState(Preconditions.java:177)
    at com.google.appengine.tools.cloudstorage.oauth.OauthRawGcsService$2.wrap(OauthRawGcsService.java:418)
    at com.google.appengine.tools.cloudstorage.oauth.OauthRawGcsService$2.wrap(OauthRawGcsService.java:398)
    at com.google.appengine.api.utils.FutureWrapper.wrapAndCache(FutureWrapper.java:53)
    at com.google.appengine.api.utils.FutureWrapper.get(FutureWrapper.java:90)
    at com.google.appengine.tools.cloudstorage.SimpleGcsInputChannelImpl$1.call(SimpleGcsInputChannelImpl.java:86)
    ... 58 more

What else should be added to read and write files with "gzip" compression?

This is the code that works for uncompressed object:

  byte[] blobContent = new byte[0];

        try
        {
            GcsFileMetadata metaData = gcsService.getMetadata(fileName);
            int fileSize = (int) metaData.getLength();
            final int chunkSize = BlobstoreService.MAX_BLOB_FETCH_SIZE;

            LOG.info("content encoding: " + metaData.getOptions().getContentEncoding());
            LOG.info("input size " + fileSize); 

            for (long offset = 0; offset < fileSize;)
            {
                if (offset != 0)
                {
                    LOG.info("Handling extra size for " + filePath + " at " + offset); //$NON-NLS-1$ //$NON-NLS-2$
                }

                final int size = Math.min(chunkSize, fileSize);

                ByteBuffer result = ByteBuffer.allocate(size);
                GcsInputChannel readChannel = gcsService.openReadChannel(fileName, offset);
                try
                {
                    readChannel.read(result);   <<<< here the exception was thrown
                }
                finally
                {
                    ......

It was compressed by:

GcsFilename filename = new GcsFilename(bucketName, filePath);
GcsFileOptions.Builder builder = new GcsFileOptions.Builder().mimeType(image_type);

if (compress)
{
    builder = builder.contentEncoding("gzip");
}
...


Please DO NOT REPLY directly to this email but go to StackOverflow:
http://stackoverflow.com/questions/20875301/read-cloud-storage-content-with-gzip-encoding-for-application-octet-stream-t

Tom Fishman via StackOverflow

unread,
Jan 2, 2014, 12:33:07 AM1/2/14
to google-appengin...@googlegroups.com
  byte[] blobContent = new byte[0];

        try
        {
            GcsFileMetadata metaData = gcsService.getMetadata(fileName);
            int fileSize = (int) metaData.getLength();
            final int chunkSize = BlobstoreService.MAX_BLOB_FETCH_SIZE;

            LOG.info("content encoding: " + metaData.getOptions().getContentEncoding()); // "gzip" here
            LOG.info("input size " + fileSize);  // the size is obviously the compressed size!

            for (long offset = 0; offset < fileSize;)
            {
                if (offset != 0)
                {
                    LOG.info("Handling extra size for " + filePath + " at " + offset); 
                }

                final int size = Math.min(chunkSize, fileSize);

                ByteBuffer result = ByteBuffer.allocate(size);
                GcsInputChannel readChannel = gcsService.openReadChannel(fileName, offset);
                try
                {
                    readChannel.read(result);   <<<< here the exception was thrown
                }
                finally
                {
                    ......

markovuksanovic via StackOverflow

unread,
Jan 2, 2014, 3:38:10 AM1/2/14
to google-appengin...@googlegroups.com

Looking at your code it seems like the compression did not happen at creation time. The documentation also specifies that (https://developers.google.com/storage/docs/reference-headers?csw=1#contentencoding).

Also if you look at the implementation of the class that throws the exception (https://code.google.com/p/appengine-gcs-client/source/browse/trunk/java/src/main/java/com/google/appengine/tools/cloudstorage/oauth/OauthRawGcsService.java?r=81&spec=svn134) you will notice that you get the original contents back but you're actually expecting less. Check the method readObjectAsync in the above mentioned class.

Hope this puts you on the right track. To gzip your contents you can use java GZIPInputStream.



Please DO NOT REPLY directly to this email but go to StackOverflow:
http://stackoverflow.com/questions/20875301/read-cloud-storage-content-with-gzip-encoding-for-application-octet-stream-t/20879081#20879081

markovuksanovic via StackOverflow

unread,
Jan 2, 2014, 5:13:12 AM1/2/14
to google-appengin...@googlegroups.com

Looking at your code it seems like there is a mismatch between what is stored and what is read. The documentation specifies that compression is not done for you (https://developers.google.com/storage/docs/reference-headers?csw=1#contentencoding). You will need to do the actual compression manually.

Also if you look at the implementation of the class that throws the exception (https://code.google.com/p/appengine-gcs-client/source/browse/trunk/java/src/main/java/com/google/appengine/tools/cloudstorage/oauth/OauthRawGcsService.java?r=81&spec=svn134) you will notice that you get the original contents back but you're actually expecting compressed content. Check the method readObjectAsync in the above mentioned class.

I'm actually not sure how the actual size of the compressed file was calculated, but if you didn't do it, it must have been done somewhere in the library. I find this personally weird as there are 9 levels of gzip compression that can be used and it doesn't seem like the library can know what level you will choose to use.

Tom Fishman via StackOverflow

unread,
Jan 2, 2014, 8:58:17 AM1/2/14
to google-appengin...@googlegroups.com

It is now compressed by:

GcsFilename filename = new GcsFilename(bucketName, filePath);
GcsFileOptions.Builder builder = new GcsFileOptions.Builder().mimeType(image_type);

    builder = builder.contentEncoding("gzip");

    GcsOutputChannel writeChannel = gcsService.createOrReplace(filename, builder.build());

        ByteArrayOutputStream byteStream = new ByteArrayOutputStream(blob_content.length);
        try
        {
            GZIPOutputStream zipStream = new GZIPOutputStream(byteStream);
            try
            {
                zipStream.write(blob_content);
            }
            finally
            {
                zipStream.close();
            }
        }
        finally
        {
            byteStream.close();
        }

        byte[] compressedData = byteStream.toByteArray();
        writeChannel.write(ByteBuffer.wrap(compressedData));

Tom Fishman via StackOverflow

unread,
Jan 2, 2014, 9:13:17 AM1/2/14
to google-appengin...@googlegroups.com

We're using "Google Cloud Storage Client Library" for app engine, with simply "GcsFileOptions.Builder.contentEncoding("gzip")" at file creation time, we got the following problem when reading the file:

com.google.appengine.tools.cloudstorage.NonRetriableException: java.lang.RuntimeException: com.google.appengine.tools.cloudstorage.SimpleGcsInputChannelImpl$1@1c07d21: Unexpected cause of ExecutionException
    at com.google.appengine.tools.cloudstorage.RetryHelper.doRetry(RetryHelper.java:87)
    at com.google.appengine.tools.cloudstorage.RetryHelper.runWithRetries(RetryHelper.java:129)
    at com.google.appengine.tools.cloudstorage.RetryHelper.runWithRetries(RetryHelper.java:123)
    at com.google.appengine.tools.cloudstorage.SimpleGcsInputChannelImpl.read(SimpleGcsInputChannelImpl.java:81)
...


Caused by: java.lang.RuntimeException: com.google.appengine.tools.cloudstorage.SimpleGcsInputChannelImpl$1@1c07d21: Unexpected cause of ExecutionException
    at com.google.appengine.tools.cloudstorage.SimpleGcsInputChannelImpl$1.call(SimpleGcsInputChannelImpl.java:101)
    at com.google.appengine.tools.cloudstorage.SimpleGcsInputChannelImpl$1.call(SimpleGcsInputChannelImpl.java:81)
    at com.google.appengine.tools.cloudstorage.RetryHelper.doRetry(RetryHelper.java:75)
    ... 56 more
Caused by: java.lang.IllegalStateException: com.google.appengine.tools.cloudstorage.oauth.OauthRawGcsService$2@1d8c25d: got 46483 > wanted 19823
    at com.google.common.base.Preconditions.checkState(Preconditions.java:177)
    at com.google.appengine.tools.cloudstorage.oauth.OauthRawGcsService$2.wrap(OauthRawGcsService.java:418)
    at com.google.appengine.tools.cloudstorage.oauth.OauthRawGcsService$2.wrap(OauthRawGcsService.java:398)
    at com.google.appengine.api.utils.FutureWrapper.wrapAndCache(FutureWrapper.java:53)
    at com.google.appengine.api.utils.FutureWrapper.get(FutureWrapper.java:90)
    at com.google.appengine.tools.cloudstorage.SimpleGcsInputChannelImpl$1.call(SimpleGcsInputChannelImpl.java:86)
    ... 58 more

What else should be added to read files with "gzip" compression to be able to read the content in app engine? ( curl cloud storage URL from client side works fine for both compressed and uncompressed file )

the blob_content is compressed from 46483 bytes to 19823 bytes.

I think it is the google code's bug (https://code.google.com/p/appengine-gcs-client/source/browse/trunk/java/src/main/java/com/google/appengine/tools/cloudstorage/oauth/OauthRawGcsService.java, L418:

 Preconditions.checkState(content.length <= want, "%s: got %s > wanted %s", this, content.length, want);

the HTTPResponse has decoded the blob, so the Precondition is wrong here.

Tom Fishman via StackOverflow

unread,
Jan 2, 2014, 12:23:23 PM1/2/14
to google-appengin...@googlegroups.com

markovuksanovic via StackOverflow

unread,
Jan 2, 2014, 5:53:32 PM1/2/14
to google-appengin...@googlegroups.com

Looking at your code it seems like there is a mismatch between what is stored and what is read. The documentation specifies that compression is not done for you (https://developers.google.com/storage/docs/reference-headers?csw=1#contentencoding). You will need to do the actual compression manually.

Also if you look at the implementation of the class that throws the exception (https://code.google.com/p/appengine-gcs-client/source/browse/trunk/java/src/main/java/com/google/appengine/tools/cloudstorage/oauth/OauthRawGcsService.java?r=81&spec=svn134) you will notice that you get the original contents back but you're actually expecting compressed content. Check the method readObjectAsync in the above mentioned class.

It looks like the content persisted might not be gzipped or the content-length is not set properly. What you should do is verify length of the compressed stream just before writing it into the channel. You should also verify that the content length is set correctly when doing the http request. It would be useful to see the actual http request headers and make sure that content length header matches the actual content length in the http response.

Hope this puts you on the right track. To gzip your contents you can use java GZIPInputStream.



Please DO NOT REPLY directly to this email but go to StackOverflow:
http://stackoverflow.com/questions/20875301/read-cloud-storage-content-with-gzip-encoding-for-application-octet-stream-t/20879081#20879081

markovuksanovic via StackOverflow

unread,
Jan 2, 2014, 7:53:35 PM1/2/14
to google-appengin...@googlegroups.com

Looking at your code it seems like there is a mismatch between what is stored and what is read. The documentation specifies that compression is not done for you (https://developers.google.com/storage/docs/reference-headers?csw=1#contentencoding). You will need to do the actual compression manually.

Also if you look at the implementation of the class that throws the exception (https://code.google.com/p/appengine-gcs-client/source/browse/trunk/java/src/main/java/com/google/appengine/tools/cloudstorage/oauth/OauthRawGcsService.java?r=81&spec=svn134) you will notice that you get the original contents back but you're actually expecting compressed content. Check the method readObjectAsync in the above mentioned class.

It looks like the content persisted might not be gzipped or the content-length is not set properly. What you should do is verify length of the compressed stream just before writing it into the channel. You should also verify that the content length is set correctly when doing the http request. It would be useful to see the actual http request headers and make sure that content length header matches the actual content length in the http response.

Also it looks like contentEncoding is not set correctly. Try using:.contentEncoding("Content-Encoding: gzip") as used in this TCK test.

markovuksanovic via StackOverflow

unread,
Jan 2, 2014, 7:58:35 PM1/2/14
to google-appengin...@googlegroups.com

Looking at your code it seems like there is a mismatch between what is stored and what is read. The documentation specifies that compression is not done for you (https://developers.google.com/storage/docs/reference-headers?csw=1#contentencoding). You will need to do the actual compression manually.

Also if you look at the implementation of the class that throws the exception (https://code.google.com/p/appengine-gcs-client/source/browse/trunk/java/src/main/java/com/google/appengine/tools/cloudstorage/oauth/OauthRawGcsService.java?r=81&spec=svn134) you will notice that you get the original contents back but you're actually expecting compressed content. Check the method readObjectAsync in the above mentioned class.

It looks like the content persisted might not be gzipped or the content-length is not set properly. What you should do is verify length of the compressed stream just before writing it into the channel. You should also verify that the content length is set correctly when doing the http request. It would be useful to see the actual http request headers and make sure that content length header matches the actual content length in the http response.

Also it looks like contentEncoding is not set correctly. Try using:.contentEncoding("Content-Encoding: gzip") as used in this TCK test.

Also you need to make sure that GCSOutputChannel is closed as that's when the file is finalized.

markovuksanovic via StackOverflow

unread,
Jan 2, 2014, 8:03:35 PM1/2/14
to google-appengin...@googlegroups.com

Looking at your code it seems like there is a mismatch between what is stored and what is read. The documentation specifies that compression is not done for you (https://developers.google.com/storage/docs/reference-headers?csw=1#contentencoding). You will need to do the actual compression manually.

Also if you look at the implementation of the class that throws the exception (https://code.google.com/p/appengine-gcs-client/source/browse/trunk/java/src/main/java/com/google/appengine/tools/cloudstorage/oauth/OauthRawGcsService.java?r=81&spec=svn134) you will notice that you get the original contents back but you're actually expecting compressed content. Check the method readObjectAsync in the above mentioned class.

It looks like the content persisted might not be gzipped or the content-length is not set properly. What you should do is verify length of the compressed stream just before writing it into the channel. You should also verify that the content length is set correctly when doing the http request. It would be useful to see the actual http request headers and make sure that content length header matches the actual content length in the http response.

Also it looks like contentEncoding could be set incorrectly. Try using:.contentEncoding("Content-Encoding: gzip") as used in this TCK test. Although still the best thing to do is inspect the HTTP request and response. You can use wireshark to do that easily.

Reply all
Reply to author
Forward
0 new messages