Is it possible to Map() using the Key() object as property source?

1 view
Skip to first unread message

jose moreira

unread,
Mar 21, 2010, 1:28:56 PM3/21/10
to httpmr-discuss
Is it possible to Map() using the Key() object as property source?

"stupid example":
self.QuickInit("construct_token_index",
mapper=TokenMapper(),
reducer=TokenReducer(),
source=appengine.AppEngineSource(Document.all(),
"__key__"),
sink=appengine.AppEngineSink(),
intermediate_values_set_job_name=False,
intermediate_values_set_nonsense_value=False)

Peter Dolan

unread,
Mar 22, 2010, 11:49:24 AM3/22/10
to httpmr-...@googlegroups.com
Hey Jose,

I don't think that specific example would work.  If you take a look at http://code.google.com/p/httpmr/source/browse/trunk/src/httpmr/appengine.py, you'll see that the AppEngineSource uses the model parameter name in an appengine db query, so it needs to be a queryable parameter.  Unfortunately I don't believe you can execute queries on an AppEngine model's key attribute, rather you can only use keys to do direct, single-entity lookups.

If, however, you know how the keys are distributed, and can generate them without knowing that they're in the database (for instance, if they were numerically increasing integer key names), then you can define a new type of data source that would perform the appropriate queries to retrieve them.  Simply subclass a Source, following the example of the AppEngineSource in the file I linked to above.

- Peter


--
You received this message because you are subscribed to the Google Groups "httpmr-discuss" group.
To post to this group, send email to httpmr-...@googlegroups.com.
To unsubscribe from this group, send email to httpmr-discus...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/httpmr-discuss?hl=en.


José Moreira

unread,
Mar 22, 2010, 12:14:53 PM3/22/10
to httpmr-...@googlegroups.com
Hello,

Yes i tried to hack a new data source that tried to use .fetch() pagination instead of filtering by a property but unfortunately my core Python and Map Reduce skills are lacking at the moment and i haven't quite understood how htttpmr operates exactly.

The goal is to apply a single object calculation and it's re-"put()" operation to a data store that has 6 million records (and no unique property apart from the "key") and i don't even know if the Map Reduce approach is the best in this case.

GAE has limited __key__ query support:

SELECT * FROM User WHERE __key__ < 'well formed hash key'
SELECT * FROM User WHERE __key__ >= 'well formed hash key'

For example, i tried to replace the sharding start/end points with numeric offsets between zero and the record count and break that "delta" into smaller blocks in the Map operation until each block becomes a User... i don't even if it was the right path, i started all this in the weekend and i'm very confused :)


An approach could be to map data by fetching X number of objects :

SELECT * FROM User OFFSET X

and the next data set would be fetched using:

SELECT * FROM User WHERE __key__ > 'last user key from last fetch ' OFFSET X

In each batch the Map() function would return an unique key for the batch and a list of users to be processed by Reduce() (right :-s ?)

The Reduce() would then receive (batchGroupKey, users) and apply the operation the the list of users and return something i don't know at the moment :)

At the moment, and if my theory is correct, i don't know how to implement this, especially the fetching part, in httpmr.

I'm trying a homebrew solution....



Best



2010/3/22 Peter Dolan <peter...@gmail.com>

José Moreira

unread,
Mar 22, 2010, 12:38:50 PM3/22/10
to httpmr-...@googlegroups.com
PS.:

DataStore can also be queried like

SELECT __key__ FROM User WHERE __key__ > Key('agZiZW1tdTNyHAsSBFVzZXIiEnVzZXI6MTI2OTE5OTU1MC45MQw')

SELECT __key__ FROM User WHERE __key__ <= Key('agZiZW1tdTNyHAsSBFVzZXIiEnVzZXI6MTI2OTE5OTU1MC45MQw')

SELECT __key__ FROM User WHERE __key__ = Key('agZiZW1tdTNyHAsSBFVzZXIiEnVzZXI6MTI2OTE5OTU1MC45MQw')

Peter Dolan

unread,
Apr 3, 2010, 3:41:30 PM4/3/10
to httpmr-...@googlegroups.com
Sorry for the delay, that might take some additional hacking of the initial start / limit keys generated (right now they're just on alphanumeric boundaries), but if your homebrew solution works, then solving this case may not be necessary.

2010/3/22 José Moreira <matrix...@gmail.com>

José Moreira

unread,
Apr 3, 2010, 4:26:52 PM4/3/10
to httpmr-...@googlegroups.com
ye i explored taht possibility but since i wasnt understanding all
the gimmicks i homebrewed this

http://github.com/matrixownsyou/MultiTaskBob

still need a lot of work but its on a workable state

best

2010/4/3 Peter Dolan <peter...@gmail.com>:

Reply all
Reply to author
Forward
0 new messages