Attention Java MapReduce users

276 views
Skip to first unread message

Tom Kaitchuck

unread,
May 1, 2013, 4:32:51 PM5/1/13
to app-engine-...@googlegroups.com, google-a...@googlegroups.com

If you are using the experimental Java MapReduce library for App Engine, you are strongly encouraged to update to the latest version of the library in the public svn: https://code.google.com/p/appengine-mapreduce/source/checkout



Background:

We are rolling out a fix to a long standing interaction bug between the experimental MapReduce library and the experimental Files API that, in certain circumstances, results in dropped data. Specifically this bug can cause some records emitted by the Map to be excluded from the input to Reduce.


The bugfix involves patches to both the Files API and Java MapReduce. Unfortunately older versions of the Java MapReduce library running against the patched Files API will drop Map output under more common circumstances. The Files API fix will roll out on its own (no action required by you), but in order to avoid dropped data you must update to the latest version of the Java MapReduce library.


We apologize for the trouble. Rest assured we are working aggressively to move MapReduce into a fully supported state.



Tom Kaitchuck on behalf of the Google App Engine Team

Ales Justin

unread,
May 1, 2013, 4:50:14 PM5/1/13
to app-engine-...@googlegroups.com, google-a...@googlegroups.com
Any chance of providing this library in Maven Central?

-Ales

--
You received this message because you are subscribed to the Google Groups "Google App Engine Pipeline API" group.
To unsubscribe from this group and stop receiving emails from it, send an email to app-engine-pipeli...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Tom Kaitchuck

unread,
May 1, 2013, 5:42:53 PM5/1/13
to google-a...@googlegroups.com, app-engine-...@googlegroups.com
This is something we are aware of and are working on for future releases.

For this update we encourage you to download and deploy the new code right away.

Tom Kaitchuck

unread,
May 7, 2013, 3:13:45 PM5/7/13
to google-a...@googlegroups.com, app-engine-...@googlegroups.com
If you have the writable file name you can use the datastore viewer to find the finalized file name. 
Go into the datastore viewer and enter the gql query: SELECT * FROM __BlobFileIndex__
This will show you the mapping. Then you can narrow it down by specifying the ID/Name as the writable file name.


On Sun, May 5, 2013 at 10:00 PM, Eric Jahn <er...@ejahn.net> wrote:
Tom,
This is great news.  I have one lingering problem as a result of the Files API Bug.  Before the Files API fix, I had persisted the file service urls whilst I had been writing to them, and then finalized them successfully.  But, because of this bug I couldn't retrieve a blobstore key by passing these urls to BlobKey.getKeyString().   btw, I'm not using java MapReduce , just the App Engine Files API and Blobstore).  Is there a way I can somehow retrieve my finalized blobstore files which aren't appearing in my App Engine dashboard Blobstore viewer?  If I start with a new file, I see them appear, but this is now after the Files API bug fix, I presume.  Thanks for any thoughts.  -Eric



On Wednesday, May 1, 2013 5:42:53 PM UTC-4, Tom Kaitchuck wrote:
This is something we are aware of and are working on for future releases.

For this update we encourage you to download and deploy the new code right away.

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-appengi...@googlegroups.com.
To post to this group, send email to google-a...@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine?hl=en.

Ronoaldo José de Lana Pereira

unread,
May 8, 2013, 3:30:08 PM5/8/13
to app-engine-...@googlegroups.com, google-a...@googlegroups.com
Tom,

Thanks for the update. We have some old code running with the first version of the Mapper API (the one that hasn't a reducer), specifically, we have fixed into the r218. Since the newer versions aren't API compabitle with the new implementation of Map/Reduce and we we're using only tha mappers to perform our operations, we see no problems so far.

Does this old mapper api also may be affected by the new version of the Files API?

Kind regards,

Tom Kaitchuck

unread,
May 8, 2013, 4:08:58 PM5/8/13
to app-engine-...@googlegroups.com, google-a...@googlegroups.com
Ronoaldo: Correct, the old API version should not be affected.


--
You received this message because you are subscribed to the Google Groups "Google App Engine Pipeline API" group.
To unsubscribe from this group and stop receiving emails from it, send an email to app-engine-pipeli...@googlegroups.com.

Eric Jahn

unread,
May 10, 2013, 11:35:32 AM5/10/13
to google-a...@googlegroups.com, app-engine-...@googlegroups.com
Tom,
The files I'm searching for in datastore viewer were finalized, and I was able to search for and find more recent finalized blobs using SELECT * FROM __BlobFileIndex__ WHERE __key__=KEY('__BlobFileIndex__', 'SoMeKeyHere');
and for still writable this works:
SELECT * FROM __BlobFileIndex__ WHERE __key__=KEY('__BlobFileIndex__', 'writable:SoMeKeyHere');
However, for the keys of the missing file, the query in the datastore viewer returns "name must be under 500 bytes."  Those >500 byte keys were successful for accessing/writing to the original files with my AppEngine code before they were finalized, so why is this length a problem now?  Or was exceedingly long key generation part of the Files API bug?  I still need to access those files, since they took a lot of expensive backend time to generate them.  Thanks!  -Eric

Tom Kaitchuck

unread,
May 10, 2013, 4:28:14 PM5/10/13
to google-a...@googlegroups.com, app-engine-...@googlegroups.com
Ah, yes. If the name writable name is > 500 bytes it is hashed before being stored to avoid the character limitation. This is done as follows: 
Hashing.sha512().hashString(creationHandle, Charsets.US_ASCII).toString()

So the easiest way to access the file would probably to be to use the FilesAPI and just pass it the writable name and let it do this for you, then you'll have a file handle with the finalized name, which you can copy and access via the blobstore API if you like. To do this construct an AppEngineFile with the namepart as the writable file name including the "writable:" prefix, and then invoke FileServiceImpl.getBlobKey()

Tom Kaitchuck

unread,
May 20, 2013, 9:32:09 PM5/20/13
to app-engine-...@googlegroups.com, google-a...@googlegroups.com
This is a bug I am working on. It occurs when the last record to be written encounters a keyOrderingException (Meaning it was already written but an ACK was not received so it was retried.) So it should be rare and the retry of the shuffle that was added should cause it to be harmless.

If you are seeing any broader problems send me a message off-list with your appId.

I'll post an update here when a patch has been pushed out.



On Mon, May 20, 2013 at 4:28 PM, Tim Jones <pala...@gmail.com> wrote:
After upgrading, I'm getting the following NullPointerException in the InMemoryShuffler.  Is this a known issue?

Caused by: java.lang.NullPointerException
at com.google.appengine.tools.mapreduce.impl.InMemoryShuffleJob$1.run(InMemoryShuffleJob.java:234)
at com.google.appengine.tools.mapreduce.impl.InMemoryShuffleJob$1.run(InMemoryShuffleJob.java:231)
at com.google.appengine.tools.mapreduce.impl.util.RetryHelper.doRetry(RetryHelper.java:62)
at com.google.appengine.tools.mapreduce.impl.util.RetryHelper.runWithRetries(RetryHelper.java:101)
at com.google.appengine.tools.mapreduce.impl.InMemoryShuffleJob.closeFinally(InMemoryShuffleJob.java:231)
at com.google.appengine.tools.mapreduce.impl.InMemoryShuffleJob.writeOutput(InMemoryShuffleJob.java:227)
at com.google.appengine.tools.mapreduce.impl.InMemoryShuffleJob.writeOutputs(InMemoryShuffleJob.java:243)
at com.google.appengine.tools.mapreduce.impl.InMemoryShuffleJob.run(InMemoryShuffleJob.java:253)
at com.google.appengine.tools.mapreduce.impl.InMemoryShuffleJob.run(InMemoryShuffleJob.java:42)
... 47 more

--
You received this message because you are subscribed to the Google Groups "Google App Engine Pipeline API" group.
To unsubscribe from this group and stop receiving emails from it, send an email to app-engine-pipeli...@googlegroups.com.

Tom Kaitchuck

unread,
May 22, 2013, 7:46:56 PM5/22/13
to app-engine-...@googlegroups.com, google-a...@googlegroups.com
The issue mentioned above (The NPE when the last item was already written) has been fixed in the version 464 in the public svn. 

Carter Maslan

unread,
May 29, 2013, 4:32:44 PM5/29/13
to app-engine-...@googlegroups.com, google-a...@googlegroups.com
We get compile errors when we ant the latest version of the java mapreduce library with java SDK 1.8.0. Error is "Builder() has private access".
Is there an existing fix for that?
   
    [javac] /mr2/appengine-mapreduce-read-only/java/src/com/google/appengine/tools/mapreduce/outputs/GoogleCloudStorageFileOutputWriter.java:39: Builder() has private access in com.google.appengine.tools.cloudstorage.GcsFileOptions.Builder
    [javac]         GCS_SERVICE.createOrReplace(file, new GcsFileOptions.Builder().mimeType(mimeType).build());
    [javac]                                           ^
    [javac] /mr2/appengine-mapreduce-read-only/java/src/com/google/appengine/tools/mapreduce/outputs/GoogleCloudStorageFileOutputWriter.java:39: cannot find symbol
    [javac] symbol  : method mimeType(java.lang.String)
    [javac] location: class com.google.appengine.tools.cloudstorage.GcsFileOptions.Builder
    [javac]         GCS_SERVICE.createOrReplace(file, new GcsFileOptions.Builder().mimeType(mimeType).build());
    [javac]                                                                       ^





On Wed, May 22, 2013 at 4:48 PM, Tim Jones <pala...@gmail.com> wrote:
Awesome, I downloaded a few minutes ago and have been running against it with no problems.  Thanks!


You received this message because you are subscribed to a topic in the Google Groups "Google App Engine Pipeline API" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/app-engine-pipeline-api/NmjYYLuSizo/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email to app-engine-pipeli...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.
 
 

Tom Kaitchuck

unread,
May 29, 2013, 6:52:20 PM5/29/13
to app-engine-...@googlegroups.com, google-a...@googlegroups.com
My fault. I forgot to update the revision of the GCS client the ant build.xml loads. If you want the fix right away just change the build.xml to have the line where it defines the property "gcsversion" to be "r54".(You will also to need to delete the Jar if it has been previously downloaded into the same workspace.)

I'll push out an update later today.

Tom Kaitchuck

unread,
May 29, 2013, 7:10:48 PM5/29/13
to app-engine-...@googlegroups.com, google-a...@googlegroups.com
You should be able to sync to the SVN and get this update (r466)

Carter Maslan

unread,
May 29, 2013, 7:17:42 PM5/29/13
to app-engine-...@googlegroups.com, google-a...@googlegroups.com
thank you. r466 compiles successfully.
Reply all
Reply to author
Forward
0 new messages