Kaan Soral
unread,Jan 1, 2012, 5:26:16 PM1/1/12Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to google-a...@googlegroups.com
1) First of all, I make a key_only query and order("__scatter__") and get scattered properties
2) Then I get the first element of the query, start, and I add this to begging of array (1)
3) Than I sort that array of keys ( I use the regular .sort() function on arrays )
so lets say I have keys k1 k2 k3 k4 k5 k6 on my array, k1 being the start_key I got on (1)
Now I can deploy workers (k1,k2) (k2,k3) ...
(k1,k2): starts from key k1, ends when it sees key k2
This way I achieve parallelism. Of course this is now my own creation, I got the idea from gae's mapreduce
But strange things happen, I have been using these for nearly a year now and I started thinking that I may be doing something wrong all these times.
I don't see all the results that I should see after the mapreduce gets completed.
An Initial concern: Can datastore keys be sorted? - To clarify: if a key k1 comes first then a key k2 on DB, is k1<k2 on python ?