permutations vs Cartesian product in exploding indexes documentation

967 views
Skip to first unread message

kostmo

unread,
Mar 22, 2010, 12:50:57 PM3/22/10
to Google App Engine
It is still unclear to me exactly how many index entries will be
produced by including multiple ListProperty's in an entity. In one
post [1], between the inquiring user and the Google rep, the
possibilities for the number of index entries included the power set,
Cartesian product, or the "number of unique combinations" (this was
the rep's answer, without mention of "permutations").

I'd like to verify that the correct combinatorics terminology is being
used in the documentation on exploding indexes [2].

Quote:
"the index table must include a row for every permutation of the
values of every property for the index"
This is followed by the example:
e2 = MyModel()
e2.x = ['red', 'blue']
e2.y = [1, 2]
"the index must store 12 property values: 2 each for the built-in
indexes on x and y, and 2 for each of the 4 permutations of x and y in
the custom index".

The example does not make me completely confident about what type of
combinatoric expression is truly being used, since the sum of the
number of permutations for each property happens to equal the
cardinality of the Cartesian product of the two properties. The
enumeration given in this post [3] seems to suggest that it is
actually the Cartesian product at work.

To clarify this issue, we could use Python 2.6 to illustrate an
example:
>>> from itertools import product, permutations
>>> x = ['red', 'blue', 'green']
>>> y = [1, 2, 3]
>>> list(permutations(x)) + list(permutations(y))
[('red', 'blue', 'green'), ('red', 'green', 'blue'), ('blue', 'red',
'green'), ('blue', 'green', 'red'), ('green', 'red', 'blue'),
('green', 'blue', 'red'), (1, 2, 3), (1, 3, 2), (2, 1, 3), (2, 3, 1),
(3, 1, 2), (3, 2, 1)]
>>> list(product(x, y))
[('red', 1), ('red', 2), ('red', 3), ('blue', 1), ('blue', 2),
('blue', 3), ('green', 1), ('green', 2), ('green', 3)]

Is the number of custom index entries that will be generated by an
entity with the two 3-valued properties given above proportional to
the sum of permutations (there are 12) or the Cartesian product (there
are 9)?

Thanks,
Karl

[1] "Efficient way to structure my data model"
http://groups.google.com/group/google-appengine/browse_thread/thread/326e90024ed53ced/d958fa6df98ba6df
[2] http://code.google.com/appengine/docs/python/datastore/queriesandindexes.html#Big_Entities_and_Exploding_Indexes
[3] "Size of index in case of exploding index"
http://groups.google.com/group/google-appengine/browse_thread/thread/3db5a0338d77d81b/c5f1ea4fa3107b25

kostmo

unread,
Mar 22, 2010, 10:22:45 PM3/22/10
to Google App Engine
OK, I just watched Brett Slatkin's I/O talk [1] and he mentions "cross
product" a couple of times, so it seems that the use of the word
"permutation" in the docs is incorrect; the number of index entries is
indeed proportional to the Cartesian product, rather than
"permutations" which would lead to a factorial (n!) number of index
entries.

I've filed a doco issue here [2].

Karl

[1] http://code.google.com/events/io/2009/sessions/BuildingScalableComplexApps.html
[2] http://code.google.com/p/googleappengine/issues/detail?id=3003

> [1] "Efficient way to structure my data model"http://groups.google.com/group/google-appengine/browse_thread/thread/...
> [2]http://code.google.com/appengine/docs/python/datastore/queriesandinde...
> [3] "Size of index in case of exploding index"http://groups.google.com/group/google-appengine/browse_thread/thread/...

Nick Johnson (Google)

unread,
Mar 23, 2010, 6:58:24 AM3/23/10
to google-a...@googlegroups.com
Hi Karl,

You're correct that it is indeed the cartesian product in this case - it produces one index entry for every unique tuple of values from the indexed columns.

This gets slightly more complicated in the situation where the same column is being indexed multiple times. In that situation, the naive cartesian product is not used - instead, the additional constraints that the values are sorted in non-descending order and that no value occurs more than once in a tuple are added, so as to eliminate unnecessary duplicates. Eg, an index on (foo, foo), with a value of [1,2,3] for foo will result in the following entries:

(1,2),(1,3),(2,3)

Notably omitting the redundant entries:

(2,1),(3,1),(3,2)

And the invalid entries:

(1,1),(2,2),(3,3)

-Nick Johnson


--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.




--
Nick Johnson, Developer Programs Engineer, App Engine
Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration Number: 368047

kostmo

unread,
Mar 23, 2010, 12:14:49 PM3/23/10
to Google App Engine
Reply all
Reply to author
Forward
0 new messages