Yes, it's impossible right now to feed a query into the Mapper. Mostly
because it's hard (maybe impossible) to shard query results. If you
are satisfied with having one shard only, I believe it is possible to
implement that in the code.
You'll have to copy and modify three classes:
Modify the query in createIterator() from
https://code.google.com/p/appengine-mapreduce/source/browse/trunk/java/src/com/google/appengine/tools/mapreduce/DatastoreRecordReader.java
Return one fixed split from:
https://code.google.com/p/appengine-mapreduce/source/browse/trunk/java/src/com/google/appengine/tools/mapreduce/DatastoreInputFormat.java
It should be really straightforward. I can take a look at your code if
you upload it to gist/pastebin.
Let me know if you have figured this out.
--
Regards,
Mike
The only caveat is that you need a special index which includes both
__scatter__ and other fields needed to run your special query. The
biggest issue here is that I think you are going to need two special
indices for it to run:
- necessary properties + __scatter__ (to generate splits)
- __key__ + necessary properties (to iterate over those)
But I might be wrong here. Never actually tried it.
--
Regards,
Mike